{"id":711,"date":"2018-01-22T20:26:58","date_gmt":"2018-01-22T20:26:58","guid":{"rendered":"http:\/\/www.codeastar.com\/?p=711"},"modified":"2018-04-17T03:08:43","modified_gmt":"2018-04-17T03:08:43","slug":"random-random-forest-tutorial","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/","title":{"rendered":"A Beginner Random Random Forest Tutorial"},"content":{"rendered":"

When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is not because of its catchy name and the fact that I always misspell it as Rain Forest<\/span>, it is quick, convenient, easy to understand and, it provides decent results. Isn’t it cool? Yes, it is! So we are going to discuss more Rain<\/span> Random Forest details in this post.<\/p>\n

<\/p>\n

What is Random Forest model?<\/h3>\n

First thing first, from the words of Random Forest, we know that this model is about a lot of trees (so is rain forest, that is why I keep linking rain forest as random forest…). And the “tree” in the Random Forest model is actually a decision tree. Let’s pick our Titanic Survivors<\/a> project as an example, a decision tree should look like:<\/p>\n

\"\"<\/p>\n

The advantage of using decision tree is its simplicity, we can observe the results graphically without\u00a0having a statistics background. It is also very fast to build and test in development environment.<\/p>\n

Now we have a tree, then what’s next? We can’t call a place “forest” with just one tree. So the Random Forest model is a model that consists of many decision trees. You may hear “all men are created equal<\/em><\/a>“, but in Random Forest,<\/p>\n

All trees are created unequal<\/h3>\n

The decision trees in a Random Forest model are created randomly. The model splits a node among a random subset of features, then creates certain number of trees (depending on the model parameter, the default number of trees in sklearn<\/a> Random Forest library is 10).<\/p>\n

\"\"<\/p>\n

The rationales behind this setting are:<\/p>\n

    \n
  1. making weak or uncorrelated sub-models for ensemble<\/li>\n
  2. eliminating an overfitting issue on a tree model<\/li>\n<\/ol>\n

    Advantages of Random Forest<\/h3>\n

    We have tried to use ensemble technique to get a better prediction on the Iowa House Prices<\/a> project. The same thing happens in the Random Forest model, but this time, the model just does the ensemble itself. The RF model creates trees and calculates votes from each tree. Since those trees are less correlated, the RF model is able to reduce variance and find a better prediction through\u00a0majority voting.<\/p>\n

    And a tree based model is considered as a greedy model. As it always aims to find the optimal solution for each smaller instance and reduce training data error. The tree model turns out splitting too many nodes and making the model too specified for certain features. Thus, it increases the testing data set error. The Random Forest model can eliminate overfitting issue because the model is not split by all features in the data set, but random subsets of those features. Every snowflake<\/del> tree is unique in the Random Forest, the ensemble of variance is then smaller than\u00a0the variance of an individual model.<\/p>\n

    Other than that, the Random Forest model can also compute feature importance for\u00a0feature selection process. Let’s use the Titanic project as an example:<\/p>\n

    import pandas as pd\r\nfrom sklearn.ensemble import RandomForestClassifier\r\n\r\n#load the train data set\r\ndf_train = pd.read_csv(\"..\/input\/train.csv\")\r\n\r\n#prepare training data, remove unused fields \r\ntrain_x = df_train.drop(['Survived', 'PassengerId', 'Name', 'Ticket'], axis=1)train_x = df_train.drop(['Survived', 'PassengerId', 'Name', 'Ticket'], axis=1) \r\ntrain_y = df_train['Survived']\r\n\r\n#convert all values to numeric values      \r\ntrain_x_all_num = (train_x.apply(lambda x: pd.factorize(x)[0]))\r\n\r\nmodel = RandomForestClassifier()\r\nmodel.fit(train_x_all_num, train_y)\r\n#get feature importance from Random Forest\r\nimportances = model.feature_importances_\r\n\r\nprint (\"Sorted Feature Importance:\")\r\nsorted_feature_importance = sorted(zip(importances, list(train_x_all_num)), reverse=True)\r\nprint (sorted_feature_importance)\r\n<\/pre>\n

    And get the following results:<\/p>\n

    Sorted Feature Importance:\r\n[(0.24543182417104634, 'Sex'), (0.22380032382910783, 'Age'), (0.20791521450928369, 'Fare'), (0.10294078976088994, 'Cabin'), (0.080212751900751736, 'Pclass'), (0.054506067168341811, 'Parch'), (0.045022270592417118, 'SibSp'), (0.040170758068161436, 'Embarked')]\r\n<\/pre>\n

    Limitations of Random Forest<\/h3>\n

    The Random Forest model is good because it comprises a fair model with randomly generated decision trees. However, its randomness also becomes its badness. We may input the number of trees, number of maximum\u00a0features and the maximum\u00a0depth of a tree for the RF model. But we just don’t know how exactly it plants a tree and comes up with the prediction. We can only know the prediction is a majority of the generated decision trees.<\/p>\n

    \"\"Another limitation for the RF model is, extrapolation. Actually, it is not just a limitation for the RF model, but for all tree-based models. A linear regression model, like the tipping model<\/a> we mentioned, can predict a trend for the tips. While a decision tree model can only predict the outcomes from data previously encountered. As the decision tree makes a decision based on the form of “if input<\/em> > value<\/em> then go left, otherwise go right”, likes the following diagram:<\/p>\n

    \"\"<\/p>\n

    No matter an input is 71 or 70001, the result is always 20%. When the testing data is out of the range of the training data,\u00a0it will be treated in the same manner as the minimal\/maximal value already encountered in the training data set.<\/p>\n

    What have we learnt in this post?<\/h3>\n
      \n
    1. Introduction of the RF model<\/li>\n
    2. Advantages of\u00a0 RF<\/li>\n
    3. Limitation of RF<\/li>\n
    4. Is it just me who always misspell Random Forest as Rain Forest??<\/li>\n<\/ol>\n

       <\/p>\n","protected":false},"excerpt":{"rendered":"

      When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is not because of its catchy name and the fact that I always misspell it as Rain Forest, it is quick, convenient, easy to understand and, it provides decent […]<\/p>\n","protected":false},"author":1,"featured_media":769,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[18],"tags":[4,19,54,41,53,52,45],"jetpack_publicize_connections":[],"yoast_head":"\nA Beginner Random Random Forest Tutorial ⋆ Code A Star<\/title>\n<meta name=\"description\" content=\"When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is quick, convenient, easy to understand and, it provides decent results. Isn't it cool? Yes, it is! So we are going to discuss more Rain Random Forest details in this post.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Beginner Random Random Forest Tutorial ⋆ Code A Star\" \/>\n<meta property=\"og:description\" content=\"When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is quick, convenient, easy to understand and, it provides decent results. Isn't it cool? Yes, it is! So we are going to discuss more Rain Random Forest details in this post.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/\" \/>\n<meta property=\"og:site_name\" content=\"Code A Star\" \/>\n<meta property=\"article:publisher\" content=\"codeastar\" \/>\n<meta property=\"article:author\" content=\"codeastar\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-22T20:26:58+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-04-17T03:08:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/01\/rainforest.png?fit=1002%2C578&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"1002\" \/>\n\t<meta property=\"og:image:height\" content=\"578\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Raven Hon\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@codeastar\" \/>\n<meta name=\"twitter:site\" content=\"@codeastar\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Raven Hon\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/\"},\"author\":{\"name\":\"Raven Hon\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"headline\":\"A Beginner Random Random Forest Tutorial\",\"datePublished\":\"2018-01-22T20:26:58+00:00\",\"dateModified\":\"2018-04-17T03:08:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/\"},\"wordCount\":740,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"keywords\":[\"beginner\",\"Data Science\",\"decision tree\",\"ensemble\",\"Overfitting\",\"Random Forest\",\"tutorial\"],\"articleSection\":[\"Learn Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/\",\"url\":\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/\",\"name\":\"A Beginner Random Random Forest Tutorial ⋆ Code A Star\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/#website\"},\"datePublished\":\"2018-01-22T20:26:58+00:00\",\"dateModified\":\"2018-04-17T03:08:43+00:00\",\"description\":\"When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is quick, convenient, easy to understand and, it provides decent results. Isn't it cool? Yes, it is! So we are going to discuss more Rain Random Forest details in this post.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codeastar.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Beginner Random Random Forest Tutorial\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codeastar.com\/#website\",\"url\":\"https:\/\/www.codeastar.com\/\",\"name\":\"Code A Star\",\"description\":\"We don't wish upon a star, we code a star\",\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codeastar.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\",\"name\":\"Raven Hon\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"width\":70,\"height\":70,\"caption\":\"Raven Hon\"},\"logo\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\"},\"description\":\"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.\",\"sameAs\":[\"https:\/\/www.codeastar.com\",\"codeastar\",\"https:\/\/twitter.com\/codeastar\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Beginner Random Random Forest Tutorial ⋆ Code A Star","description":"When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is quick, convenient, easy to understand and, it provides decent results. Isn't it cool? Yes, it is! So we are going to discuss more Rain Random Forest details in this post.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/","og_locale":"en_US","og_type":"article","og_title":"A Beginner Random Random Forest Tutorial ⋆ Code A Star","og_description":"When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is quick, convenient, easy to understand and, it provides decent results. Isn't it cool? Yes, it is! So we are going to discuss more Rain Random Forest details in this post.","og_url":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/","og_site_name":"Code A Star","article_publisher":"codeastar","article_author":"codeastar","article_published_time":"2018-01-22T20:26:58+00:00","article_modified_time":"2018-04-17T03:08:43+00:00","og_image":[{"width":1002,"height":578,"url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/01\/rainforest.png?fit=1002%2C578&ssl=1","type":"image\/png"}],"author":"Raven Hon","twitter_card":"summary_large_image","twitter_creator":"@codeastar","twitter_site":"@codeastar","twitter_misc":{"Written by":"Raven Hon","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#article","isPartOf":{"@id":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/"},"author":{"name":"Raven Hon","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"headline":"A Beginner Random Random Forest Tutorial","datePublished":"2018-01-22T20:26:58+00:00","dateModified":"2018-04-17T03:08:43+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/"},"wordCount":740,"commentCount":0,"publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"keywords":["beginner","Data Science","decision tree","ensemble","Overfitting","Random Forest","tutorial"],"articleSection":["Learn Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/","url":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/","name":"A Beginner Random Random Forest Tutorial ⋆ Code A Star","isPartOf":{"@id":"https:\/\/www.codeastar.com\/#website"},"datePublished":"2018-01-22T20:26:58+00:00","dateModified":"2018-04-17T03:08:43+00:00","description":"When I have a data project in mind and have no idea on where to start modeling, I will always use the Random Forest model. It is quick, convenient, easy to understand and, it provides decent results. Isn't it cool? Yes, it is! So we are going to discuss more Rain Random Forest details in this post.","breadcrumb":{"@id":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codeastar.com\/random-random-forest-tutorial\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.codeastar.com\/random-random-forest-tutorial\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codeastar.com\/"},{"@type":"ListItem","position":2,"name":"A Beginner Random Random Forest Tutorial"}]},{"@type":"WebSite","@id":"https:\/\/www.codeastar.com\/#website","url":"https:\/\/www.codeastar.com\/","name":"Code A Star","description":"We don't wish upon a star, we code a star","publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codeastar.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd","name":"Raven Hon","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/","url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","width":70,"height":70,"caption":"Raven Hon"},"logo":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/"},"description":"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.","sameAs":["https:\/\/www.codeastar.com","codeastar","https:\/\/twitter.com\/codeastar"]}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/01\/rainforest.png?fit=1002%2C578&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8PcRO-bt","jetpack-related-posts":[{"id":282,"url":"https:\/\/www.codeastar.com\/choose-machine-learning-models-python\/","url_meta":{"origin":711,"position":0},"title":"How to choose a machine learning model in Python?","author":"Raven Hon","date":"July 15, 2017","format":false,"excerpt":"We have tried our first ever Data Science project from last post. Do you feel excited? Yeah, we should! But we have also omitted several details on the Data Science Life Cycle. Do you remember why I picked Support Vector Machine as our machine learning model last time? I give\u2026","rel":"","context":"In "Learn Machine Learning"","block_context":{"text":"Learn Machine Learning","link":"https:\/\/www.codeastar.com\/category\/machine-learning\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/07\/kfold_5.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/07\/kfold_5.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/07\/kfold_5.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/07\/kfold_5.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":815,"url":"https:\/\/www.codeastar.com\/visualize-convolutional-neural-network\/","url_meta":{"origin":711,"position":1},"title":"Visualize a Convolutional Neural Network","author":"Raven Hon","date":"February 21, 2018","format":false,"excerpt":"On last post, we tried our image recognition project\u00a0with handwritten digits. We used a Convolutional Neural Network (CNN) to train our machine and it did pretty well with 99.47% accuracy. We learnt how a CNN works by actually implementing a model. Today, we move one step further to learn more\u2026","rel":"","context":"In "Learn Machine Learning"","block_context":{"text":"Learn Machine Learning","link":"https:\/\/www.codeastar.com\/category\/machine-learning\/"},"img":{"alt_text":"Visualize a CNN","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/02\/cnn_seal.png?fit=1052%2C744&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/02\/cnn_seal.png?fit=1052%2C744&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/02\/cnn_seal.png?fit=1052%2C744&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/02\/cnn_seal.png?fit=1052%2C744&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/02\/cnn_seal.png?fit=1052%2C744&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":1040,"url":"https:\/\/www.codeastar.com\/lgb-winning-gradient-boosting-model\/","url_meta":{"origin":711,"position":2},"title":"LGB, the winning Gradient Boosting model","author":"Raven Hon","date":"June 1, 2018","format":false,"excerpt":"Last time, we tried the Kaggle's TalkingData Click Fraud Detection challenge. And we used limited resources to handle a 200 million records sized\u00a0dataset. Although we can make our classification with Random Forest model, we still want a better scoring result.\u00a0 Inside the Click Fraud Detection challenge's leaderboard, I find that\u2026","rel":"","context":"In "Learn Machine Learning"","block_context":{"text":"Learn Machine Learning","link":"https:\/\/www.codeastar.com\/category\/machine-learning\/"},"img":{"alt_text":"Gradient Boosting","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/06\/gradient.png?fit=1033%2C608&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/06\/gradient.png?fit=1033%2C608&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/06\/gradient.png?fit=1033%2C608&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/06\/gradient.png?fit=1033%2C608&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":990,"url":"https:\/\/www.codeastar.com\/click-fraud-detection\/","url_meta":{"origin":711,"position":3},"title":"Click Fraud Detection with Machine Learning","author":"Raven Hon","date":"May 4, 2018","format":false,"excerpt":"Up to now, we have tried 3 different Kaggle journeys, the Titanic Survivors, the Iowa House Prices and the hand written digits recognition. Those journeys covered popular Machine Learning topics, such as classification, regression, deep learning, and so on. I would suggest fans of Machine Learning to start with those\u2026","rel":"","context":"In "Learn Machine Learning"","block_context":{"text":"Learn Machine Learning","link":"https:\/\/www.codeastar.com\/category\/machine-learning\/"},"img":{"alt_text":"Click Fraud Detection","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/05\/fraud_click2.png?fit=1049%2C419&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/05\/fraud_click2.png?fit=1049%2C419&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/05\/fraud_click2.png?fit=1049%2C419&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/05\/fraud_click2.png?fit=1049%2C419&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":548,"url":"https:\/\/www.codeastar.com\/data-science-ensemble-modeling\/","url_meta":{"origin":711,"position":4},"title":"To win big in real estate market using data science \u2013 Part 3: Ensemble Modeling","author":"Raven Hon","date":"December 5, 2017","format":false,"excerpt":"Previously on CodeAStar: The data alchemist wannabe opened the first door to \"the room for improvement\", where he made better prediction potions. His hunger for the ultimate potions became more and more. He then discovered another door inside the room. The label on the door said, \"Ensemble Modeling\". This is\u2026","rel":"","context":"In "Learn Machine Learning"","block_context":{"text":"Learn Machine Learning","link":"https:\/\/www.codeastar.com\/category\/machine-learning\/"},"img":{"alt_text":"Data Science Technique: Ensemble Modeling","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/ensembling.png?fit=727%2C567&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/ensembling.png?fit=727%2C567&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/ensembling.png?fit=727%2C567&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/ensembling.png?fit=727%2C567&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":2529,"url":"https:\/\/www.codeastar.com\/stable-diffusion-quick-and-easy-guide-for-everyone-part-1\/","url_meta":{"origin":711,"position":5},"title":"Easy Guide for Beginner: Learn how to use Stable Diffusion WebUI – Part 1","author":"Raven Hon","date":"May 29, 2023","format":false,"excerpt":"We have been learning programming and AI since the beginning of this website. Now, we are excited to explore the world of generative AI art using the Stable Diffusion WebUI. This innovative web interface provides us with a simple and efficient way to generate AI art. In one of our\u2026","rel":"","context":"In "Learn Machine Learning"","block_context":{"text":"Learn Machine Learning","link":"https:\/\/www.codeastar.com\/category\/machine-learning\/"},"img":{"alt_text":"Stable Diffusion","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2023\/05\/Stable_Diffusion.png?fit=1024%2C512&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2023\/05\/Stable_Diffusion.png?fit=1024%2C512&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2023\/05\/Stable_Diffusion.png?fit=1024%2C512&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2023\/05\/Stable_Diffusion.png?fit=1024%2C512&ssl=1&resize=700%2C400 2x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/711"}],"collection":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/comments?post=711"}],"version-history":[{"count":31,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/711\/revisions"}],"predecessor-version":[{"id":988,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/711\/revisions\/988"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media\/769"}],"wp:attachment":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media?parent=711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/categories?post=711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/tags?post=711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}