{"id":469,"date":"2017-11-07T19:35:26","date_gmt":"2017-11-07T19:35:26","guid":{"rendered":"http:\/\/www.codeastar.com\/?p=469"},"modified":"2018-12-21T04:19:08","modified_gmt":"2018-12-21T04:19:08","slug":"win-big-real-estate-market-data-science","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/","title":{"rendered":"To win big in real estate market using data science – Part 1"},"content":{"rendered":"

Okay, yes, once again, it is a catchy topic. BUT<\/strong>, this post is indeed trying to help people (including me) to gain an upper hand in real estate market, using data science.<\/p>\n

From our last post<\/a> (2 months ago, I will try to update this blog more frequently :]] ), we learn that we can use regression to predict a restaurant tips trend in data science. And now we can apply this finding to a Kaggle competition, House Prices: Advanced Regression Techniques<\/a> , in order to predict a house price trend.<\/p>\n

<\/p>\n

To Win in Real Estate Market<\/h3>\n

Don’t you feel excited by seeing the heading above? Yes! In trading, pricing is the most critical component to maximizing your profit<\/a>. You can make better decision when you know what the right price is.<\/p>\n

From the Kaggle Housing competition, our goal is to predict the right prices of 1450+ properties according to 75+ features.<\/p>\n

First things first, since there are 75+ features on the data sets, it is good to download the data description file from Kaggle to find out their meanings.<\/p>\n

Then we get training and testing data sets, load them into data frames and query their data size.<\/p>\n

import pandas as pd\ndf_train = pd.read_csv(\"train.csv\")\ndf_test = pd.read_csv(\"test.csv\")\nprint(\"Size of training dataset:\", df_train.shape)\nprint(\"Size of testing dataset:\", df_test.shape)\n<\/pre>\n
Size of training dataset: (1460, 81)\nSize of testing dataset: (1459, 80)<\/pre>\n

Last time, we predicted about 400 passengers’ statuses. This time, we raise the bar! We are going to predict 1459 records using just 1460 records.<\/p>\n

Let’s get a feeling of what the training data set looks like:<\/p>\n

df_train.head(5)<\/pre>\n

\"\"<\/p>\n

Next, we go to check if there are null values inside the data set.<\/p>\n

import matplotlib.pyplot as plt\nimport seaborn as sns\ndf_missing = df_train.isnull().sum()\ndf_missing = df_missing[df_missing > 0]\nsns.barplot(x=df_missing.values, y=df_missing.index)\nplt.show()\n<\/pre>\n

\"\"<\/p>\n

Oh, there are many actually.<\/p>\n

Enter the void<\/h3>\n

Don’t worry, according to the data description, null values mean “None” or 0 in related features. E.g. it is “No alley access” for null values in “Alley”, and it is 0 masonry veneer area square feet in “MasVnrArea”. That is why I suggest we need to take a look on the data description file first.<\/p>\n

Here comes the solution. We can add a function to handle null values in the data set, by filling 0, “None” or the most common values in the features.<\/p>\n

def fillNAonDF(df):\n    for feat in ('MSZoning', 'Utilities','Exterior1st', 'Exterior2nd', 'BsmtFinSF1', 'BsmtFinSF2', 'Electrical'):\n        df.loc[:, feat] = df.loc[:, feat].fillna(df[feat].mode()[0])\n    for feat in ('BsmtUnfSF','TotalBsmtSF', 'BsmtFullBath', 'BsmtHalfBath', 'KitchenQual', 'Functional', 'SaleType'):\n        df.loc[:, feat] = df.loc[:, feat].fillna(df[feat].mode()[0])\n    for feat in ('Alley','BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2'):\n        df.loc[:, feat] = df.loc[:, feat].fillna(\"None\")  \n    for feat in ('MasVnrType', 'FireplaceQu','GarageType', 'GarageFinish', 'GarageQual', 'GarageCond'):\n        df.loc[:, feat] = df.loc[:, feat].fillna(\"None\")   \n    for feat in ('MasVnrArea', 'GarageYrBlt', 'GarageArea', 'GarageCars'):\n        df.loc[:, feat] = df.loc[:, feat].fillna(0)    \n    for feat in ('PoolQC','Fence', 'MiscFeature'):\n        df.loc[:, feat] = df.loc[:, feat].fillna(\"None\")\n    df.loc[:, 'LotFrontage'] = df.groupby('Neighborhood')['LotFrontage'].transform(lambda x: x.fillna(x.median()))\n<\/pre>\n

And apply it on both training and testing data sets.<\/p>\n

fillNAonDF(df_train)\nfillNAonDF(df_test)\n<\/pre>\n

Poof! Now the void issue was gone.<\/p>\n

Size does matter<\/h3>\n

There are 2 major factors affecting a property price, location and size. For size, we can get its information from the “GrLivArea” (above ground living area) feature. And plot a chart to show the relationship between size and sale price:<\/p>\n

sns.regplot(x=\"GrLivArea\", y=\"SalePrice\", data=df_train)\nplt.show()\n<\/pre>\n

\"\"<\/p>\n

Well, it looks linear, however there are 2 outliers on the bottom right. Those 2 properties are sized 4000+ square feet but sold with unreasonable low prices. According to the author of the data set, Dr. Dean De Cock, he would recommend removing any houses with more than 4000 square feet<\/a>. In order to eliminate unusual observations.<\/p>\n

So we remove data with living area larger than 4000 square feet and plot the regression chart again.<\/p>\n

df_train = df_train.loc[df_train.GrLivArea < 4000] \nsns.regplot(x=\"GrLivArea\", y=\"SalePrice\", data=df_train)\nplt.show()\n<\/pre>\n

\"\"<\/p>\n

It then looks more linear now.<\/p>\n

Money, Money, Money, again<\/h3>\n

Sale price is the target we are going to predict. First, let’ see how it distributes in the training data set:<\/p>\n

price_dist = sns.distplot(df_train[\"SalePrice\"], color=\"m\", label=\"Skewness : %.2f\"%(df_train[\"SalePrice\"].skew()))\nprice_dist = price_dist.legend(loc=\"best\")\nplt.show()\n<\/pre>\n

\"\"<\/p>\n

We saw a similar chart in our past data science exercise, the Titanic Project<\/a>, when we analyzed passengers’ fare variable. The distribution doesn’t look like a normal distribution as there are a few high sale price records. From our last experience in fare feature, we can apply the same logarithm handling to remove the impact of extreme values.<\/p>\n

df_train.loc[:,'SalePrice_log'] = df_train[\"SalePrice\"].map(lambda i: np.log1p(i) if i > 0 else 0)\nprice_log_dist = sns.distplot(df_train[\"SalePrice_log\"], color=\"m\", label=\"Skewness : %.2f\"%(df_train[\"SalePrice_log\"].skew()))\nprice_log_dist = price_log_dist.legend(loc=\"best\")\nplt.show()\n<\/pre>\n

\"\"<\/p>\n

After that, the skewness of sale price is improved and we have a better distribution of sale price.<\/p>\n

Features Engineering<\/h3>\n

Since we are doing machine learning, we must transform features for machine to read and learn. In short, change all to numerical features. Before we do that, we are going to change “MoSold” and “MSSubClass” features that use numerical values back to categorical features (that is why I say we have to read the data description file first).<\/p>\n

def trxNumericToCategory(df):    \n    df['MSSubClass'] = df['MSSubClass'].apply(str)\n    df['MoSold'] = df['MoSold'].apply(str)\n\ntrxNumericToCategory(df_train)\ntrxNumericToCategory(df_test)\n<\/pre>\n

Our next step, we get categorical features and change them to numerical features by the means of sale price.<\/p>\n

def quantifier(df, feature, df2):\n    new_order = pd.DataFrame()    \n    new_order['value'] = df[feature].unique()\n    new_order.index = new_order.value    \n    new_order['price_mean'] = df[[feature, 'SalePrice_log']].groupby(feature).mean()['SalePrice_log']\n    new_order= new_order.sort_values('price_mean')    \n    new_order = new_order['price_mean'].to_dict()\n    \n    for categorical_value, price_mean in new_order.items():\n        df.loc[df[feature] == categorical_value, feature+'_Q'] = price_mean\n        df2.loc[df2[feature] == categorical_value, feature+'_Q'] = price_mean\n \ncategorical_features = df_train.select_dtypes(include = [\"object\"])\nfor f in categorical_features:  \n    quantifier(df_train, f, df_test)\n<\/pre>\n

After the transformation, we can drop those categorical features. And now we have machine read-able both training and testing data sets.<\/p>\n

Skew Them All<\/h3>\n

We have skewed the sale price feature to obtain a better distribution, we can do the same thing on all other features as well.<\/p>\n

First of all, we combine the training and testing data sets into an “all data” data set.<\/p>\n

df_all_data = pd.concat((df_train, df_test)).reset_index(drop=True)\ntrain_index = df_train.shape[0]\n<\/pre>\n

Then we find and skew features that have greater than 0.75 skewness.<\/p>\n

def skewFeatures(df):\n    skewness = df.skew().sort_values(ascending=False)\n    df_skewness = pd.DataFrame({'Skew' :skewness})\n    df_skewness= df_skewness[abs(df_skewness) > 0.75]\n    df_skewness = df_skewness.dropna(axis=0, how='any')\n    skewed_features = df_skewness.index\n\n    for feat in skewed_features:\n      df[feat] = np.log1p(df[feat])\n\nskewFeatures(df_all_data)\n<\/pre>\n

Done! And it is the time to get our finalized training and testing data sets.<\/p>\n

X_learning = df_all_data[:train_index]\nX_test = df_all_data[train_index:]\nY_learning = df_train['SalePrice_log']\n<\/pre>\n

X data, check. Y data, check. Test data, check. What time is it? It’s clobber… err.. It’s modeling time!<\/p>\n

Modeling Time<\/h3>\n

Do you remember the way we choose a model for machine learning? Yes, we use the k-fold cross validation<\/a> to pick our model.<\/p>\n

We have chosen several common regression models, plus the people’s favorite XGBoost model, for this house price project:<\/p>\n

models = []\nmodels.append((\"LrE\", LinearRegression() ))\nmodels.append((\"RidCV\", RidgeCV() ))\nmodels.append((\"LarCV\", LarsCV() ))\nmodels.append((\"LasCV\", LassoCV() ))\nmodels.append((\"ElNCV\", ElasticNetCV() ))\nmodels.append((\"LaLaCV\", LassoLarsCV() ))\nmodels.append((\"XGB\", xgb.XGBRegressor() ))\n<\/pre>\n

Then we apply the 10-fold cross validation on our models.<\/p>\n

kfold = KFold(n_splits=10)\n\ndef getCVResult(models, X_learning, Y_learning):\n  for name, model in models:\n     cv_results = cross_val_score(model, X_learning, Y_learning, scoring='neg_mean_squared_error', cv=kfold )\n     rmsd_scores = np.sqrt(-cv_results)\n     print(\"\\n[%s] Mean: %.8f Std. Dev.: %8f\" %(name, rmsd_scores.mean(), rmsd_scores.std()))\n\ngetCVResult(models, X_learning, Y_learning)\n<\/pre>\n

And get their mean values of RMSD<\/a>:<\/p>\n

[LrE] Mean: 0.11596371 Std. Dev.: 0.012097\n[RidCV] Mean: 0.11388354 Std. Dev.: 0.012075\n[LarCV] Mean: 0.11630241 Std. Dev.: 0.011665\n[LasCV] Mean: 0.19612691 Std. Dev.: 0.008914\n[ElNCV] Mean: 0.19630787 Std. Dev.: 0.008867\n[LaLaCV] Mean: 0.11258596 Std. Dev.: 0.012750\n[XGB] Mean: 0.11961144 Std. Dev.: 0.016610\n<\/pre>\n

In bar chart:<\/p>\n

\"\"<\/p>\n

The major purpose of this CV is to get a rough idea on what the RMSD value looks like. Since we do not pass any parameter into those models, I believe there is room for improvement to get better scores. Let’s move on to our next topic, parameters tuning (on next post<\/a> :]] ), stay tuned.<\/p>\n

What have we learnt in this post?<\/h3>\n
    \n
  1. Don’t miss the data description file<\/li>\n
  2. Skew features to get better distribution<\/li>\n
  3. Handle categorical features<\/li>\n
  4. Use K-fold Cross Validation to get a rough idea of the final result<\/li>\n
  5. I haven’t posted more than 2 months, doesn’t mean I forget this blog :]]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"

    Okay, yes, once again, it is a catchy topic. BUT, this post is indeed trying to help people (including me) to gain an upper hand in real estate market, using data science. From our last post (2 months ago, I will try to update this blog more frequently :]] ), we learn that we can […]<\/p>\n","protected":false},"author":1,"featured_media":508,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[18],"tags":[19,26,30,8,36,34,24],"jetpack_publicize_connections":[],"yoast_head":"\nTo win big in real estate market using data science - Part 1 ⋆ Code A Star<\/title>\n<meta name=\"description\" content=\"This post is trying to help people to gain an upper hand in real estate market, using regression techniques and data science.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"To win big in real estate market using data science - Part 1 ⋆ Code A Star\" \/>\n<meta property=\"og:description\" content=\"This post is trying to help people to gain an upper hand in real estate market, using regression techniques and data science.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/\" \/>\n<meta property=\"og:site_name\" content=\"Code A Star\" \/>\n<meta property=\"article:publisher\" content=\"codeastar\" \/>\n<meta property=\"article:author\" content=\"codeastar\" \/>\n<meta property=\"article:published_time\" content=\"2017-11-07T19:35:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-12-21T04:19:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/11\/cas_regression_model.png?fit=744%2C524&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"744\" \/>\n\t<meta property=\"og:image:height\" content=\"524\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Raven Hon\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@codeastar\" \/>\n<meta name=\"twitter:site\" content=\"@codeastar\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Raven Hon\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/\"},\"author\":{\"name\":\"Raven Hon\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"headline\":\"To win big in real estate market using data science – Part 1\",\"datePublished\":\"2017-11-07T19:35:26+00:00\",\"dateModified\":\"2018-12-21T04:19:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/\"},\"wordCount\":936,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"keywords\":[\"Data Science\",\"k-fold cross validation\",\"Kaggle\",\"Python\",\"Regression\",\"RMSD\",\"Scikit Learn\"],\"articleSection\":[\"Learn Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/\",\"url\":\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/\",\"name\":\"To win big in real estate market using data science - Part 1 ⋆ Code A Star\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/#website\"},\"datePublished\":\"2017-11-07T19:35:26+00:00\",\"dateModified\":\"2018-12-21T04:19:08+00:00\",\"description\":\"This post is trying to help people to gain an upper hand in real estate market, using regression techniques and data science.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codeastar.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"To win big in real estate market using data science – Part 1\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codeastar.com\/#website\",\"url\":\"https:\/\/www.codeastar.com\/\",\"name\":\"Code A Star\",\"description\":\"We don't wish upon a star, we code a star\",\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codeastar.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\",\"name\":\"Raven Hon\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"width\":70,\"height\":70,\"caption\":\"Raven Hon\"},\"logo\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\"},\"description\":\"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.\",\"sameAs\":[\"https:\/\/www.codeastar.com\",\"codeastar\",\"https:\/\/twitter.com\/codeastar\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"To win big in real estate market using data science - Part 1 ⋆ Code A Star","description":"This post is trying to help people to gain an upper hand in real estate market, using regression techniques and data science.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/","og_locale":"en_US","og_type":"article","og_title":"To win big in real estate market using data science - Part 1 ⋆ Code A Star","og_description":"This post is trying to help people to gain an upper hand in real estate market, using regression techniques and data science.","og_url":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/","og_site_name":"Code A Star","article_publisher":"codeastar","article_author":"codeastar","article_published_time":"2017-11-07T19:35:26+00:00","article_modified_time":"2018-12-21T04:19:08+00:00","og_image":[{"width":744,"height":524,"url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/11\/cas_regression_model.png?fit=744%2C524&ssl=1","type":"image\/png"}],"author":"Raven Hon","twitter_card":"summary_large_image","twitter_creator":"@codeastar","twitter_site":"@codeastar","twitter_misc":{"Written by":"Raven Hon","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#article","isPartOf":{"@id":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/"},"author":{"name":"Raven Hon","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"headline":"To win big in real estate market using data science – Part 1","datePublished":"2017-11-07T19:35:26+00:00","dateModified":"2018-12-21T04:19:08+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/"},"wordCount":936,"commentCount":0,"publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"keywords":["Data Science","k-fold cross validation","Kaggle","Python","Regression","RMSD","Scikit Learn"],"articleSection":["Learn Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/","url":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/","name":"To win big in real estate market using data science - Part 1 ⋆ Code A Star","isPartOf":{"@id":"https:\/\/www.codeastar.com\/#website"},"datePublished":"2017-11-07T19:35:26+00:00","dateModified":"2018-12-21T04:19:08+00:00","description":"This post is trying to help people to gain an upper hand in real estate market, using regression techniques and data science.","breadcrumb":{"@id":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.codeastar.com\/win-big-real-estate-market-data-science\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codeastar.com\/"},{"@type":"ListItem","position":2,"name":"To win big in real estate market using data science – Part 1"}]},{"@type":"WebSite","@id":"https:\/\/www.codeastar.com\/#website","url":"https:\/\/www.codeastar.com\/","name":"Code A Star","description":"We don't wish upon a star, we code a star","publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codeastar.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd","name":"Raven Hon","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/","url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","width":70,"height":70,"caption":"Raven Hon"},"logo":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/"},"description":"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.","sameAs":["https:\/\/www.codeastar.com","codeastar","https:\/\/twitter.com\/codeastar"]}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/11\/cas_regression_model.png?fit=744%2C524&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8PcRO-7z","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/469"}],"collection":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/comments?post=469"}],"version-history":[{"count":31,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/469\/revisions"}],"predecessor-version":[{"id":1587,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/469\/revisions\/1587"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media\/508"}],"wp:attachment":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media?parent=469"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/categories?post=469"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/tags?post=469"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}