{"id":548,"date":"2017-12-05T11:21:11","date_gmt":"2017-12-05T11:21:11","guid":{"rendered":"http:\/\/www.codeastar.com\/?p=548"},"modified":"2018-05-09T09:34:57","modified_gmt":"2018-05-09T09:34:57","slug":"data-science-ensemble-modeling","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/data-science-ensemble-modeling\/","title":{"rendered":"To win big in real estate market using data science \u2013 Part 3: Ensemble Modeling"},"content":{"rendered":"
Previously<\/a> on CodeAStar: The data alchemist wannabe opened the first door to “the room for improvement”, where he made better prediction potions. His hunger for the ultimate potions became more and more. He then discovered another door inside the room. The label on the door said, “Ensemble Modeling”.<\/i><\/p>\n This is our last chapter on the “win big in real estate market with data science” series. You can click Part 1<\/a> and Part 2<\/a> for previous chapters respectively. Last time we made better predictions using model params tuning, now we want something “better than better”. And the ensemble technique is the one we are looking for.<\/p>\n <\/p>\n Let’ start with the very beginning, yes, what is ensemble modeling? In machine learning, ensemble is a term of methods running different models, then synthesizing a single and more accurate result. There are several types of ensemble modeling, like bagging, boosting and stacking. Some of the data models are already a form of ensemble as well, like Random Forest and AdaBoost models. But in this post, we will focus on stacking in ensemble.<\/p>\n From our last “episodes”, we got different improved predictions from different models:<\/p>\n They all claim that they are the best models you’ll ever have. In the past, we would run a K-fold Cross Validation<\/a> and decide the best model among models. Likes running a battle royale, you let your finest warriors fight each other, until there is a sole survivor standing in the ring. What if, what if you hire your finest warriors as trainers and let them train a new warrior instead? The new warrior can learn the best parts from his\/her trainers and become the even better warrior!<\/p>\n Does it sound like a good idea? Let’s find out from the following demonstration:<\/p>\n We have our finest Model B’s prediction: [99, 120, 140] Model C’s prediction: [107, 123, 153] Model D’s prediction: [104, 130, 158] When we do our old K-Fold CV way, we can declare model C<\/b> as our champion as it got the best 4.39697<\/b> RMSD score.<\/p>\n But now we use all four models to train a new model by averaging their predictions, then we have:<\/p>\n After ensembling 4 models, the new model got a much better 2.35407<\/b> RMSD score!<\/p>\n Stacking is a type of ensemble, we stack predicted results from different models to form a new data set. Then we use another model to learn from the new data set and make its own prediction.<\/p>\n To illustrate the stacking process:<\/p>\n Talk is cheap, let’s put our models in stacking action.<\/p>\n Last time, we tuned our models with parameters to get better results. This time, we stack our tuned models and a create new data set.<\/p>\n We select 5 out of 6 models as our base models, then we pick the tuned LassoLars model as our meta learning model. i.e. The model that learns from the 5 base models. LassoLars model is selected because of its highest CV score.<\/p>\n meta_model = lalaM<\/p>\n First, we train our base models in 10 folds, get their predictions and use them as training data for our meta model.<\/p>\n After the training of our meta model, we let base models predict with testing data. Then use our meta model to predict from base models’ predictions.<\/p>\n Now we get the prediction from the meta learning model as stacked_predict<\/b>.<\/p>\n The new meta learning kid trained by our top warrior models can predict better than his masters. What if we want an even better performance? Let’s the kid and his masters join force and predict together. We make one step further for ensembling meta learning’s result with our other top models’ results. Their ensembling proportions are based on their CV performances.<\/p>\n Then we submit the stack_n_trainers_prediction<\/i> result to Kaggle. Ding! We got RMSD 0.11847<\/b> from official Kaggle Leaderboard<\/a>, not bad.<\/p>\n Do you remember, on last post<\/a>, we talked about tuning model parameters but without explaining the details? Similar thing happened on this ensembling topic also (although we did explain a bit, just not that deep). The reason behind this is simple, we don’t really need it.<\/p>\n GridSearchCV and ensemble modeling are repeatedly process for machine to find a better match. Since it is “no free lunch<\/a>” for which model parameters and combinations are “best for science”. It turns out we keep validating different parameters and ensembling to find a better solution. For such trial and error task, it is more productive for using machine to handle it. When computing on machine becomes faster and cheaper time after time, we can assume machine will take over repeatedly tasks. Would data science be more like One-Punch Man<\/a> story in the future? i.e. All of the stuff is finished by one It sounds attractive but it is possibly not the case we can see. We expect machine can help us find a better solution by sophisticated models and a lot of tries. It is only valid when you already have a solution for the machine to try and correct its predictions. Remember, it is no free lunch in optimization. So for our data alchemists, let’s try brewing harder!<\/p>\nWhat is Ensemble Modeling?<\/h3>\n
Why Ensemble Modeling matters?<\/h3>\n
[Ridge Tuned Rid_t] Mean: 0.11255269 Std. Dev.: 0.012144\r\n[Lasso Tuned Las_t] Mean: 0.11238117 Std. Dev.: 0.011936\r\n[ELasticNet Tuned ElN_t] Mean: 0.11233786 Std. Dev.: 0.011963\r\n[LassoLars Tuned LaLa_t] Mean: 0.11231273 Std. Dev.: 0.012701\r\n[XGBoost Tuned XGB_t] Mean: 0.11190687 Std. Dev.: 0.015171<\/pre>\n
<\/p>\n
The ground truth is: [100, 123, 150]<\/pre>\n
warriors<\/del> models, and here are their predictions and RMSDs<\/a>:<\/p>\nModel A's prediction: [98, 110, 138]\r\nRMSD: 10.27943<\/pre>\n
\nRMSD: 6.0553<\/p>\n
\nRMSD: 4.39697<\/p>\n
\nRMSD: 6.55744<\/p>\nNew model's prediction: [102, 120.75, 147.25]\r\nRMSD: 2.35407<\/pre>\n
Stacking in action<\/h3>\n
<\/p>\n
#our tuned models\r\nlinearM = LinearRegression()\r\nridM = Ridge(alpha=0.01)\r\nlasM = Lasso(alpha=0.00001)\r\nelnM = ElasticNet(l1_ratio=0.8, alpha=0.00001)\r\nlalaM = LassoLars(alpha=0.000037)\r\nxgbM = xgb.XGBRegressor(n_estimators=470,max_depth=3, min_child_weight=3, \r\n learning_rate=0.042,subsample=0.5, reg_alpha=0.5,reg_lambda=0.8)<\/pre>\n
base_models = []\r\nbase_models.append(lasM)\r\nbase_models.append(ridM)\r\nbase_models.append(xgbM)\r\nbase_models.append(elnM)\r\nbase_models.append(linearM)<\/pre>\n
stack_kfold = KFold(n_splits=10, shuffle=True)\r\n#fill up all zero\r\nkf_predictions = np.zeros((X_learning.shape[0], len(base_models)))\r\n\r\n#get the X, Y values\r\nX_values = X_learning.values\r\nY_values = Y_learning.values\r\n\r\nfor i, model in enumerate(base_models):\r\n for train_index ,test_index in stack_kfold.split(X_values):\r\n model.fit(X_values[train_index], Y_values[train_index])\r\n model_pred = model.predict(X_values[test_index])\r\n kf_predictions[test_index, i] = model_pred\r\n\r\n#teach the meta model\r\nmeta_model.fit(kf_predictions, Y_values)\r\n<\/pre>\n
preds = []\r\nfor model in base_models:\r\n model.fit(X_learning, Y_learning)\r\n pred = model.predict(X_test)\r\n preds.append(pred)\r\n\r\nbase_predictions = np.column_stack(preds)\r\n\r\nstacked_predict = meta_model.predict(base_predictions)\r\n<\/pre>\n
One Step Further<\/h3>\n
stack_n_trainers_prediction = stacked_predict *0.5 + xgb_pred * 0.3 + eln_pred *0.1+ rid_pred *0.1<\/pre>\n
Future of Data Science<\/h3>\n
punch<\/del> click. We pass a data set to a machine, we click a button, sit back, let the machine do those trial and error tasks, and get the result.<\/p>\n<\/p>\n
What have we learnt in this post?<\/h3>\n