Predictive Modeling Decision Tree
Predict ‘kicks’ or bad purchases using Carvana – Cleaned and Sampled.jmp file.Create a validation data set with 50% of the data.
Use Decision Tree, Regression and Neural Network approached for building predictive models. Perform a comparative analysis of the three competing models on validation data set. Write down your final conclusions on which model performs the best, what is the best cut-off to use, and what is the ‘value-added’ from conducting predictive modeling?
Upload the saved file with the assignment. I created 6 models for this project, which are DT1, DT2, Reg1, Reg2, Reg3, and NN. After testing, the parameters I used to predict “IsBadBuy” in all my models are: PurchDate, Auction, VehicleAge, Transmission, WheelType, VehOdo, All “MMRs”, VehBCost, IsOnlineSale, and WarrantyCost. Those parameters together can help me get better models (i. e. ROC Area > 0. 7) I used the cut-off of 0. 6, because after trying out other cut-offs such as 0. 5, 0. 7, and 0. , the results were either “I’m eliminating too many Good Buys”, or “I’m accepting too many Bad Buys”. As we know, both of the situations will affect the business (i. e. if we want stronger confident of the model, we will have too many 0s in the result, which means we may accept more Bad Buys in accident). Finally, I decided to use 0. 6 as my cut-off to balance the situation. The best model I chose is Reg2 (Forward regression model). I have two reasons: First, Reg2 has the largest ROC Area in the Logistic Fit compression (Saved as “Lodistic1~6”), which is 0. 478; Second, it has a relatively low (the second smallest) number in the FalseNegative box from the Contingency Table among all models. For my second reason, I didn’t use overall accuracy because I think the FalseNegative will damage the business more than FalsePossitive does. Because accidentally having a BadBuy will cost the company to do all require and fix job. For the Value-added calculation, as we can see in the Contingency tables (Saved as “Contingency 1~6”), the Baseline Accuracy is 49. 89. The accuracy of Reg2 is 82. 49. So the Reg2 provides the lift value of 82. 49/49. 89 = 1. 653.