sns.regplot(x=\"total_bill\", y=\"tip\", data=df_tips);\r\nplt.show()<\/pre>\n <\/p>\n
According to the Tip<\/i> and Total Bill<\/i> spots distribution, we can find out the linear relationship between those values. Thus we can predict the amount of tip (output) based on the total bill that customers have paid (dependent variable).<\/p>\nHow good is our regression?<\/h3>\n
We have made a regression model, but how good is the model? Here comes the Root-Mean-Square Deviation (RMSD)<\/b>\u00a0[or Root-Mean-Square Error (RMSE)<\/b>]. The RMSD is an indicator of difference between predicted and actual values. It is calculated by:<\/p>\n
<\/p>\n
where is our predicted value, is the actual value in observation i<\/i> , and n<\/i> is the number of observation.<\/p>\n
So a prefect model means a 0 in RMSD and a less effective model means a larger RMSD.<\/p>\n
Again, let’s try to understand RMSD in a visual learning way.<\/p>\n
RMSD in action<\/h3>\n We keep using the “tips” data set from the above section, get the first 200 records as learning values and the last 44 records as testing values.<\/p>\n
learning_x = df_tips[['total_bill', 'size']].values [:200]\r\nlearning_y = df_tips['tip'].values [:200]\r\ntesting_x = df_tips[['total_bill', 'size']].values[-44:]\r\ntesting_y = df_tips['tip'].values[-44:]<\/pre>\nThen we call out a machine learning model. Since we are doing regression, so we use the Linear Regression<\/i> model.<\/p>\nfrom sklearn.linear_model import LinearRegression\r\nlreg = LinearRegression()\r\nlreg.fit(learning_x, learning_y)\r\nprediction = lreg.predict(testing_x)<\/pre>\nNow we have our predicted output, let’s compare it with the actual output.<\/p>\n
import pandas as pd\r\ndf_output_compare = pd.DataFrame({'predicted':prediction, 'actual':testing_y})\r\nsns.regplot(x=\"actual\", y=\"predicted\", data=df_output_compare)\r\nplt.show()<\/pre>\n <\/p>\n
Because we use only 2 features to predict the tip outcome, the predicted values are hard to correlate to actual ones. We can observe this situation from the graph above, but how hard do these 2 values correlate in term of figures? The sklearn library provides a mean_squared_error<\/i> function which helps us to find the MSE of RMSE(RMSD). Then we can apply a square root on the MSE to get our RMSD.<\/p>\nfrom sklearn.metrics import mean_squared_error\r\nfrom math import sqrt\r\nrmsd = sqrt(mean_squared_error(testing_y, prediction))\r\nprint(rmsd)<\/pre>\n1.189772487870686<\/pre>\nNow we use another learning set, the only first 5 records from the “tips” data set.<\/p>\n
learning_x_5 = df_tips[['total_bill', 'size']].values [:5]\r\nlearning_y_5 = df_tips['tip'].values [:5]<\/pre>\nAnd get our new predication:<\/p>\n
lreg.fit(learning_x_5), learning_y_5)\r\nprediction_5 = lreg.predict(testing_x)<\/pre>\nWe use our new prediction to compare with the actual output.<\/p>\n
df_output_compare_5 = pd.DataFrame({'predicted':prediction_5, 'actual':testing_y})\r\nsns.regplot(x=\"actual\", y=\"predicted\", data=df_output_compare_5, color=\"g\")\r\nplt.show()<\/pre>\n \nThen calculate the new RMSD:<\/p>\n
rmsd_5 = sqrt(mean_squared_error(testing_y, prediction_5))\r\nprint(rmsd_5)<\/pre>\n1.3766746955609788<\/pre>\nAs expected, there is a larger RMSD for a less effective model.<\/p>\n
Congratulation! Now you can spot the effectiveness of a regression model from graphs and figures.<\/p>\n
<\/p>\n
What have we learnt in this post?<\/h3>\n\nthe use of regression<\/li>\n how to rate a model from a scatter chart<\/li>\n the meaning of RMSD \/ RMSE<\/li>\n how to rate a model from its RMSD<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"We have learnt how to use machine learning to find an object’ status, like identifying an iris specie or a Titanic passenger’s condition. It is called classification in machine learning. If we want to use machine learning to predict a trend, like a stock price, then what should we do? We go for regression in […]<\/p>\n","protected":false},"author":1,"featured_media":443,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[18],"tags":[19,36,34,35],"jetpack_publicize_connections":[],"yoast_head":"\n
What are Regression model and RMSD? ⋆ Code A Star<\/title>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\t \n\t \n\t \n \n \n \n \n \n\t \n\t \n\t \n