{"id":764,"date":"2018-02-11T19:12:00","date_gmt":"2018-02-11T19:12:00","guid":{"rendered":"http:\/\/www.codeastar.com\/?p=764"},"modified":"2018-02-12T18:24:14","modified_gmt":"2018-02-12T18:24:14","slug":"convolutional-neural-network-python","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/convolutional-neural-network-python\/","title":{"rendered":"Python Image Recognizer with Convolutional Neural Network"},"content":{"rendered":"
On our data science journey, we have solved classification<\/a> and regression<\/a> problems. What’s next? There is one popular machine learning territory we have not set feet on yet — the image recognition. But now the wait is over, in this post we are going to teach our machine to recognize images by using Convolutional Neural Network (CNN).<\/p>\n <\/p>\n Before we go further to our topic on Convolutional Neural Network, let’s talk about another related term we will see often: Deep Learning.<\/p>\n Deep Learning is a subfield of machine learning which its model consists of multiple layers. The concept of a deep learning model is to use outputs from the previous layer as inputs for the successive layer. The model starts learning from the first layer and use its outputs to learn through the next layer. Eventually, the model goes “deep” by learning layer after layer in order to produce the final outcome.<\/p>\n Convolutional Neural Network is a type of Deep Learning\u00a0architecture. We will use the abbreviation CNN in the post. Please don’t mix up this CNN to a news channel with the same abbreviation. :]]<\/p>\n We will describe a CNN in short here. For in depth CNN explanation, please visit “A\u00a0Beginner’s Guide<\/a>\u00a0To Understanding Convolutional Neural Networks”. This is the best CNN guide I have ever found on the Internet and it is good for readers with no data science background.<\/p>\n Since a CNN is a type of Deep Learning model, it is also constructed with layers. A CNN starts with a convolutional layer as input layer and ends with a classification layer as output layer. There are multiple hidden layers in between the input and output layers, such as\u00a0convolutional layers, pooling layers and fully connected layers. So a typical CNN model should look like:<\/p>\n Feel dizzy for seeing different layers? Don’t worry, we can have short explanations on each layer here. For in-depth details, please refer to the CNN guide I mentioned previously.<\/p>\n I always believe the best way to learn something is to do something. When we started to learn our first ever machine learning project<\/a>, we do the “Hello World” way, by coding the\u00a0iris classification. For image recognition and deep learning, the “Hello World” project for us is, the MNIST Database of Handwritten Digits<\/a>.<\/p>\n This is a dataset of handwritten digits, our objective is to train our model to learn from 42,000 digit images, and recognize another set of 28,000 digit images. Before we actually start our project, we need to install our python deep learning library, Keras<\/a>. Please note that deep learning requires relatively large processing resources and time. If this is your concern, I would suggest you to start a kernel from Kaggle Kernels<\/a>\u00a0for the deep learning project. As related libraries and datasets have already installed in Kaggle Kernels, and we can use Kaggle’s cloud environment to compute our prediction (for maximum 1 hour execution time). As long as we have internet access, we can run a CNN project on its Kernel with a low-end PC \/ laptop. Once the preparation is ready, we are good to set feet on the image recognition territory.<\/p>\n First, let’s import required modules here. We will discuss those models while we use it in our code segments.<\/p>\n We load training and testing data sets (from Kaggle) as usual.<\/p>\n And take a look on the first 5 rows of the training data.<\/p>\n The first column “label” is the value of the hand written digit image. While the other 784 columns are the pixel values of a 28 width x 28 height (i.e. 784) gray-scale digit image.\u00a0A picture is worth a thousand words, and now we are going to make 5 pictures, to visualize our first 5 digits from the testing data set.<\/p>\n And the results are: Libraries, check. Testing data, check. Now, it is the core part of our CNN project:<\/p>\n We use Conv2D() to create our first convolutional layer, with 30 features and 5×5 feature size. And the input shape is the shape of our digit image with height, width and channels. I.e. (28, 28, 1) Since all our digit images are gray-scale images, we can assign 1 to the channel. For color images, you need to assign 3 (R-G-B) to the channel. We activate the hidden layers with ReLU (rectified linear unit) activation. The concept of ReLU activation is quite straight forward, when there is an negative value on the hidden layer(feature can not be found on the input image), it returns zero, otherwise it returns the raw value.<\/p>\n After processing our first convolutional layer, there would be 30 more hidden layers per each digit image. We then use the pooling layer to down sample our layers, for every 2×2 area. Now we have smaller hidden layers as input images for our next convolutional layer. Likes the case we have done in our first convolutional layer, the second convolutional layer generates even more hidden layers for us. We then apply a dropout layer, which remove 20% units in our network to prevent overfitting.<\/p>\n At this moment, our CNN is still processing 2D matrix and we need to convert those units into 1D vector for the final outcome, so we apply a flatten layer here. And we are at the last few steps of our model building. We add 2 fully connected layers to form an Artificial Neural Network, which lets our model to classify our inputs to 50 outputs. Finally, we add the last fully connected layer with the size of output layer and softmax activation to squeeze the probability values of our outputs.<\/p>\n Actually, it is not yet done. :]]\u00a0 We just need to do one more step, compile the model with following parameters: loss<\/em>, metrics<\/em> and optimizer<\/em>. We assign Log Loss (“categorical_crossentropy<\/em>” in Keras) as loss function to measure how good our model is, i.e. how well predicated digit values match actual digit values. And “accuracy<\/em>” as metrics for performance evaluation. Then for the optimizer, which is an algorithm for our model to learn after its each running cycle. I picked RMSprop for its good performance in several trial runs.<\/p>\n We have finally built the CNN model, let’s take a summary of our product. But before doing this, we need to define the size of the digit values. As a human, we know that the handwritten digits should be 0 to 9, i.e. the size of 10. From a machine’s prospective, we need to send it the available outcomes (the dataframe df_train_y<\/em> we created previously) and let it categorize the possible results in binary matrix. We then use the range of the output binary matrix as the size of our model’s output layer.<\/p>\n You might notice there are parameters in certain layers, they are\u00a0trainable variables for our CNN model. On our first convolutional layer (conv2d_1), parameters are come from:<\/p>\n Then on our second convolutional\u00a0layer (conv2d_2), since inputs of this layer are the outputs of previous layer,<\/p>\n More trainable parameters mean more computing needed and in machine learning territory, more calculation doesn’t always mean getting better results. We are good at this setup currently, let’ see how well our model can performance.<\/p>\n We have prepared our model, it is time to put it in action. But first, let’s gather our training material.<\/p>\n We normalize the gray scale data into [0 … 1] values, so our CNN model can run faster. And since our CNN model use 2D matrix as input, we reshape our data into 28 x 28 2D matrix. We further separate 8% of testing data to validation data. Now we have prepared our data sets, there are two extra techniques we can apply to boost our model’s performance.<\/p>\n On our CNN model, the learning rate parameter help us to identify the local minima of loss. Different learning rates produce different loss by running different number of epochs:<\/p>\n We can manage the learning rate while we train our model, by using the\u00a0ReduceLROnPlateau callback<\/a>. In the following setting, we monitor the validation accuracy, reduce the learning rate by factor when there is no improvement after the number of patience (epochs):<\/p>\n Another technique we can apply is the use of image generator<\/a>. The\u00a0ImageDataGenerator from Keras can generate images from our inputs, randomly zoom, rotate and shift them horizontally and vertically. Thus we can have more testing images then the original testing dataset.\u00a0<\/span><\/p>\n Now, let’s put all the things together. We train our model with testing and validation data sets, learning rate reducing callback and image generator in 30 rounds.<\/p>\n Similar results would prompt out:<\/p>\n Our model is now well trained, we can obtain the prediction and save it in a csv file for submission.<\/p>\n Since it is an image recognition project, why don’t we validate our results by our own eyes?<\/p>\n We randomly pick 10 digit images from the testing dataset, then see rather our model can predict them right.<\/p>\n It looks good, doesn’t it?<\/p>\n I submitted the result to Kaggle and scored 0.99471. By using the code on this post, it should be able to help you get at least 99.0% accuracy. Feel free to modify \/ enhance the code to get even better accuracy then.<\/p>\n <\/p>\n The complete source code can be found at: On our data science journey, we have solved classification and regression problems. What’s next? There is one popular machine learning territory we have not set feet on yet — the image recognition. But now the wait is over, in this post we are going to teach our machine to recognize images by using Convolutional Neural […]<\/p>\n","protected":false},"author":1,"featured_media":798,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[18],"tags":[56,55,57,58,30,59,22,8],"class_list":["post-764","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-cnn","tag-convolutional-neural-network","tag-deep-learning","tag-image-recognition","tag-kaggle","tag-keras","tag-machine-learning","tag-python"],"jetpack_publicize_connections":[],"yoast_head":"\nWhat is a Convolutional Neural Network?<\/h3>\n
Conv (Input) -> Pool -> Conv -> Pool -> FC -> FC (Output)\r\n\r\nConv: convolutional layer\r\nPool: pooling layer\r\nFC: fully connected layer<\/pre>\n
\n(image source:\u00a0http:\/\/yann.lecun.com\/exdb\/publis\/pdf\/lecun-98.pdf<\/a>)<\/p>\n\n
CNN in action<\/h3>\n
The line starts here<\/h3>\n
from keras.models import Sequential\r\nfrom keras.layers import Dense, Dropout, Flatten\r\nfrom keras.layers.convolutional import Conv2D, MaxPooling2D\r\nfrom keras.utils import np_utils\r\nfrom keras.optimizers import RMSprop\r\nfrom keras.callbacks import ReduceLROnPlateau\r\nfrom keras.preprocessing.image import ImageDataGenerator\r\nimport matplotlib.pyplot as plt\r\nfrom sklearn.model_selection import train_test_split\r\nfrom random import randrange\r\n<\/pre>\n
df_train = pd.read_csv('..\/input\/train.csv')\r\ndf_test = pd.read_csv('..\/input\/test.csv')\r\n<\/pre>\n
df_train.head()\r\n<\/pre>\n
<\/p>\n
df_train_x = df_train.iloc[:,1:] #get 784 pixel value columns after the first column\r\ndf_train_y = df_train.iloc[:,:1] #get the first label column\r\n\r\n#reshape our training X into 28x28 array and display its label and image using imshow()\r\nax = plt.subplots(1,5) \r\nfor i in range(0,5):\r\n ax[1][i].imshow(df_train_x.values[i].reshape(28,28), cmap='gray')\r\n ax[1][i].set_title(df_train_y.values[i])\r\n\r\n<\/pre>\n
\n<\/p>\n
Build the Model<\/h3>\n
def cnn_model(result_class_size):\r\n model = Sequential()\r\n model.add(Conv2D(30, (5, 5), input_shape=(28,28,1), activation='relu'))\r\n model.add(MaxPooling2D(pool_size=(2, 2)))\r\n model.add(Conv2D(15, (3, 3), activation='relu'))\r\n model.add(Dropout(0.2))\r\n model.add(Flatten())\r\n model.add(Dense(128, activation='relu'))\r\n model.add(Dense(50, activation='relu'))\r\n model.add(Dense(result_class_size, activation='softmax')) \r\n model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])\r\n return model\r\n<\/pre>\n
Our model is done!<\/h3>\n
arr_train_y = np_utils.to_categorical(df_train_y['label'].values)\r\nmodel = cnn_model(arr_train_y.shape[1])\r\nmodel.summary()\r\n<\/pre>\n
Layer (type) Output Shape Param # \r\n=================================================================\r\nconv2d_1 (Conv2D) (None, 24, 24, 30) 780 \r\n_________________________________________________________________\r\nmax_pooling2d_1 (MaxPooling2 (None, 12, 12, 30) 0 \r\n_________________________________________________________________\r\nconv2d_2 (Conv2D) (None, 10, 10, 15) 4065 \r\n_________________________________________________________________\r\ndropout_1 (Dropout) (None, 10, 10, 15) 0 \r\n_________________________________________________________________\r\nflatten_1 (Flatten) (None, 1500) 0 \r\n_________________________________________________________________\r\ndense_1 (Dense) (None, 128) 192128 \r\n_________________________________________________________________\r\ndense_2 (Dense) (None, 50) 6450 \r\n_________________________________________________________________\r\ndense_3 (Dense) (None, 10) 510 \r\n=================================================================\r\nTotal params: 203,933\r\nTrainable params: 203,933\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\n<\/pre>\n
parameters = number of features * ( feature width * feature height ) + bias from feature map\r\ni.e. 30 * ( 5 * 5 ) + 30 = 780\r\n<\/pre>\n
parameters = inputs from previous layer * number of features * ( feature width * feature height ) + bias from feature map\r\ni.e. 30 * 15 * ( 3 * 3 ) + 15 = 4065\r\n<\/pre>\n
Train the model<\/h3>\n
#normalize 255 grey scale to values between 0 and 1 \r\ndf_test = df_test \/ 255 \r\ndf_train_x = df_train_x \/ 255\r\n\r\n#reshape training X and texting X to (number, height, width, channel)\r\narr_train_x_28x28 = np.reshape(df_train_x.values, (df_train_x.values.shape[0], 28, 28, 1))\r\narr_test_x_28x28 = np.reshape(df_test.values, (df_test.values.shape[0], 28, 28, 1))\r\n\r\n#validation package size = 8%\r\nrandom_seed = 7\r\nsplit_train_x, split_val_x, split_train_y, split_val_y, = train_test_split(arr_train_x_28x28, arr_train_y, test_size = 0.08, random_state=random_seed)\r\n<\/pre>\n
Finding the proper learning rate<\/h3>\n
\n(image source:\u00a0http:\/\/cs231n.github.io\/neural-networks-3\/<\/a>)<\/p>\nreduce_lr = ReduceLROnPlateau(monitor='val_acc', \r\n factor=0.3, \r\n patience=3, \r\n min_lr=0.0001)\r\n<\/pre>\n
Self-Generated Images<\/h3>\n
datagen = ImageDataGenerator( rotation_range=10, \r\n zoom_range = 0.1, \r\n width_shift_range=0.1, \r\n height_shift_range=0.1) \r\ndatagen.fit(split_train_x)\r\n<\/pre>\n
model.fit_generator(datagen.flow(split_train_x,split_train_y, batch_size=64),\r\n epochs = 30, validation_data = (split_val_x,split_val_y),\r\n verbose = 2, steps_per_epoch=640, callbacks=[reduce_lr])\r\n<\/pre>\n
Epoch 1\/30\r\n - 37s - loss: 0.4305 - acc: 0.8626 - val_loss: 0.0861 - val_acc: 0.9735\r\nEpoch 2\/30\r\n - 37s - loss: 0.1592 - acc: 0.9500 - val_loss: 0.0907 - val_acc: 0.9735\r\n\r\n........\r\n\r\nEpoch 29\/30\r\n - 39s - loss: 0.0342 - acc: 0.9898 - val_loss: 0.0199 - val_acc: 0.9929\r\nEpoch 30\/30\r\n - 38s - loss: 0.0329 - acc: 0.9902 - val_loss: 0.0221 - val_acc: 0.9914\r\n<\/pre>\n
prediction = model.predict_classes(arr_test_x_28x28, verbose=0)\r\ndata_to_submit = pd.DataFrame({\"ImageId\": list(range(1,len(prediction)+1)), \"Label\": prediction})\r\ndata_to_submit.to_csv(\"result.csv\", header=True, index = False)\r\n<\/pre>\n
start_idx = randrange(df_test.shape[0]-10) \r\nfig, ax = plt.subplots(2,5, figsize=(15,8))\r\nfor j in range(0,2): \r\n for i in range(0,5):\r\n ax[j][i].imshow(df_test.values[start_idx].reshape(28,28), cmap='gray')\r\n ax[j][i].set_title(\"Index:{} \\nPrediction:{}\".format(start_idx, prediction[start_idx]))\r\n start_idx +=1\r\n<\/pre>\n
<\/p>\n
What have we learnt in this post?<\/h3>\n
\n
\nKaggle Kernel: https:\/\/www.kaggle.com\/codeastar\/fast-and-easy-cnn-for-starters-in-keras-0-99471<\/a>
\nGitHub:\u00a0https:\/\/github.com\/codeastar\/digit-recognition-cnn<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"