{"id":1336,"date":"2018-09-26T19:55:06","date_gmt":"2018-09-26T19:55:06","guid":{"rendered":"https:\/\/www.codeastar.com\/?p=1336"},"modified":"2018-09-26T19:55:06","modified_gmt":"2018-09-26T19:55:06","slug":"u-net-object-detection-iou","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/","title":{"rendered":"U-Net and IoU for Object Detection in Image Processing"},"content":{"rendered":"<p>Other than our last\u00a0<span style=\"font-size: 16px;\">\u00a0<\/span><a style=\"font-size: 16px; background-color: #ffffff;\" href=\"https:\/\/www.codeastar.com\/convolutional-neural-network-python\/\">hand writing challenge<\/a><span style=\"font-size: 16px;\">,<\/span> there is another\u00a0<a href=\"https:\/\/www.kaggle.com\" target=\"_blank\" rel=\"noopener\"><span style=\"font-size: 16px; font-style: normal; font-weight: 400;\">Kaggle<\/span><\/a><span style=\"font-size: 16px; font-style: normal; font-weight: 400;\">\u00a0<\/span><span style=\"font-size: 16px;\">challenge featuring image recognition &#8212;\u00a0<\/span><a style=\"font-size: 16px; background-color: #ffffff;\" href=\"https:\/\/www.kaggle.com\/c\/tgs-salt-identification-challenge\" target=\"_blank\" rel=\"noopener\">TGS Salt Identification Challenge<\/a><span style=\"font-size: 16px;\">. But this time, we are going for an &#8220;upgrade&#8221;. As we are dealing with object detection. It is all about salt. In this challenge, our mission is finding geophysical\u00a0images that contain salt. Oh wait, does it sound weird? Actually not. Salt in soil is bad on plant growth and can damage underground infrastructure like pipes and blocks. Salt identification can locate problematic parts, thus people can avoid or apply fix on those areas.\u00a0Unlike the last\u00a0<\/span>hand writing challenge<span style=\"font-size: 16px;\">\u00a0which we needed to find &#8220;what&#8221; the features were. This time, other than the &#8220;what&#8221;, we need to find &#8220;where&#8221; the features are. And the good thing for this challenge is, new requirement comes new knowledge, so we have new knowledge to learn &#8212; U-Net and IoU.<\/span><\/p>\n<p><!--more--><\/p>\n<h3>Understand the Problem<\/h3>\n<p>Like all our previous Data Science projects, the first thing we do is to understand the problem. So we take a look on training data files from Kaggle. There are 2 folders, &#8220;images&#8221; and &#8220;masks&#8221;. &#8220;Images&#8221; is a folder containing input images, while &#8220;masks&#8221; is a folder containing result images, i.e. where the salt is. We can visualize the training set in following lines of code:<\/p>\n<pre lang=\"python\" line=\"1\">train_image_path = \"..\/input\/train\/images\/\"\r\ntrain_mask_path = \"..\/input\/train\/masks\/\"\r\n\r\nimage_array = []\r\n\r\nfor root, dirs, files in os.walk(train_image_path): \r\n   image_array = files\r\n\r\ncol_size = 7\r\nrow_size = 2\r\n\r\nrand_id_array = random.sample(range(0, len(image_array)), col_size)\r\n\r\nfig, ax = plt.subplots(row_size, col_size, figsize=(20,6))  \r\n\r\nfor row in range(0,row_size): \r\n  image_index = 0\r\n  if (row==0): \r\n    da_path= train_image_path\r\n  else: \r\n    da_path= train_mask_path\r\n    \r\n  for col in range(0,col_size):\r\n    img = load_img(da_path+image_array[rand_id_array[image_index]])\r\n    ax[row][col].imshow(img)\r\n    ax[row][col].set_title(image_array[rand_id_array[image_index]])\r\n    image_index += 1\r\n\r\nplt.show() \r\n<\/pre>\n<p>We load 14 images randomly from the training set, the first 7 of them are input images and the last 7 are the salt mask images.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1343\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/train_image\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/train_image.png?fit=829%2C258&amp;ssl=1\" data-orig-size=\"829,258\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"train_image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/train_image.png?fit=300%2C93&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/train_image.png?fit=829%2C258&amp;ssl=1\" class=\"aligncenter wp-image-1343 size-full\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/train_image.png?resize=829%2C258&#038;ssl=1\" alt=\"training images and training masks\" width=\"829\" height=\"258\" srcset=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/train_image.png?w=829&amp;ssl=1 829w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/train_image.png?resize=300%2C93&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/train_image.png?resize=768%2C239&amp;ssl=1 768w\" sizes=\"auto, (max-width: 829px) 100vw, 829px\" \/><\/p>\n<p>The input images are taken from somewhere in our Earth (Kaggle didn&#8217;t expose the location[s]) and the mask images are showing where the salt is (the white part). Our goal is making mask images from testing data. Now our goal is clear, let&#8217;s move to our next usual step in data science projects:<\/p>\n<h3>Features Engineering<\/h3>\n<p>First, we set the image to grayscale and divid it by 255. Thus image channel can be normalized within 0 to 1 range.<\/p>\n<pre lang=\"python\" line=\"1\">df_train = pd.DataFrame({'id':train_id_arr,'image_path':train_image_arr,'mask_path':train_mask_arr})\r\n\r\ndf_train[\"image_array\"] = [np.array(load_img(path=file_path, color_mode=\"grayscale\")) \/ 255 for file_path in tqdm(df_train.image_path)]\r\ndf_train[\"mask_array\"] = [np.array(load_img(path=file_path, color_mode=\"grayscale\")) \/ 255 for file_path in tqdm(df_train.mask_path)]\r\n<\/pre>\n<p>Second, we calculate the coverage of each image and mask pair.<\/p>\n<pre lang=\"python\" line=\"1\">img_width, img_height = img.size\r\ndf_train[\"coverage\"] = df_train.mask_array.map(np.sum) \/ (img_width * img_height)\r\n<\/pre>\n<p>For mask image with more salt (larger white area), the coverage value will trend to be closer to 1. On the other hand, the coverage value will be closer to 0 when there is no much salt (mostly dark in a mask image).<\/p>\n<pre lang=\"python\" line=\"1\">sns.distplot(df_train.coverage, kde=False)<\/pre>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1346\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/coverage\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/coverage.png?fit=372%2C267&amp;ssl=1\" data-orig-size=\"372,267\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"coverage\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/coverage.png?fit=300%2C215&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/coverage.png?fit=372%2C267&amp;ssl=1\" class=\"aligncenter wp-image-1346 size-full\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/coverage.png?resize=372%2C267&#038;ssl=1\" alt=\"Masks and Images coverage\" width=\"372\" height=\"267\" srcset=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/coverage.png?w=372&amp;ssl=1 372w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/coverage.png?resize=300%2C215&amp;ssl=1 300w\" sizes=\"auto, (max-width: 372px) 100vw, 372px\" \/>It is obvious that most of our images are salt-less. Now, we can categorize images into &#8220;coverage class&#8221; according to their &#8220;coverage&#8221; value.<\/p>\n<pre lang=\"python\" line=\"1\">def cov_to_class(val):    \r\n    for i in range(0, 11):\r\n        if val * 10 &lt;= i :\r\n            return i        \r\ndf_train[\"coverage_class\"] = df_train.coverage.map(cov_to_class)\r\n<\/pre>\n<p>We are in the last step of our features engineering &#8212;\u00a0image resizing. Since we are going to use U-net architecture for object detection, we will resize our image from its original size 101 x 101 to 128 x 128, i.e. size with the power of 2.<\/p>\n<pre lang=\"python\" line=\"1\">targe_width = 128\r\ntarge_height = 128\r\n\r\ndef upsample(img_array):\r\n    return resize(img_array, (targe_width, targe_height), mode='constant', preserve_range=True)\r\n<\/pre>\n<h3>Entering the U-Net<\/h3>\n<p>We are about to build our learning model, the U-Net. But before doing this, we would like to prepare our training and validation datasets first.<\/p>\n<pre lang=\"python\" line=\"1\">X_train, X_valid, Y_train, Y_valid = train_test_split(\r\n    np.array(df_train.image_array.map(upsample).tolist()).reshape(-1, targe_width, targe_height, image_channel),#get a image array from df, upsample it\r\n    np.array(df_train.mask_array.map(upsample).tolist()).reshape(-1, targe_width, targe_height, image_channel), #and reshape it to (unknown[image count], upsample_w, upsample_h, 1[channel])\r\n    test_size=0.2,\r\n    stratify=df_train.coverage_class, \r\n    random_state=256)\r\n<\/pre>\n<p>We have 4000 pairs of training images and split 20% of them as validation data, i.e. 3600 pairs as training data, 800 pairs as validation data. You may notice that we use &#8220;stratify=df_train.coverage_class&#8221; to split the images. Thus data is split in a classified fashion, according to the &#8220;coverage_class&#8221;. We apply this rule to ensure we have validation data in each class, otherwise the validation might be filled with the most dominated &#8220;class 0&#8221; data.<\/p>\n<p>Now we are going to the core of this Data Science project &#8212; build the U-Net model! But first things first, what is the U-Net?<\/p>\n<p>The\u00a0U-Net is a\u00a0convolutional neural network (CNN) which it finds features from inputs and provides classifications. It is similar to the model we built in our hand writing recognition project. But, it is also a fully\u00a0convolutional network (FCN). That means there is no dense \/ flatten layer and the model is connected layer by layer. From our hand writing project experience, we used convolutional and pooling layers to find image features and skipped the feature positions. In the U-Net architecture, we are going to get the feature locations as well. So we have transposed convolutional layers.\u00a0Too many jargons? Let&#8217;s have a simpler version: In the U-Net, we use downsampling for getting features, then we use upsampling for getting positions. The U-Net should look like:<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1353\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/unet\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?fit=1555%2C1036&amp;ssl=1\" data-orig-size=\"1555,1036\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"unet\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?fit=300%2C200&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?fit=1024%2C682&amp;ssl=1\" class=\"aligncenter wp-image-1353 size-large\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?resize=1024%2C682&#038;ssl=1\" alt=\"U-Net architecture\" width=\"1024\" height=\"682\" srcset=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?resize=1024%2C682&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?resize=300%2C200&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?resize=768%2C512&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet.png?w=1555&amp;ssl=1 1555w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p style=\"text-align: center;\">(image source:\u00a0<a href=\"https:\/\/lmb.informatik.uni-freiburg.de\/people\/ronneber\/u-net\/\" target=\"_blank\" rel=\"noopener\">Dept. of Computer Science, University of Freiburg, Germany<\/a> )<\/p>\n<p>There are downsampling on the left side, upsampling on the right side and\u00a0concatenating at the bottom. These setups make a U shaped architecture, that is why U-Net is the &#8220;U&#8221;-Net. :]]<\/p>\n<h3>DownSampling and UpSampling in U-Net<\/h3>\n<p>We should be no stranger with down sampling as we did taste it in the <a href=\"https:\/\/www.codeastar.com\/convolutional-neural-network-python\/\">image recognition project<\/a> before. We used the down sampling to locate features of hand writing digits.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1354\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/ds\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/ds.gif?fit=395%2C381&amp;ssl=1\" data-orig-size=\"395,381\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"ds\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/ds.gif?fit=300%2C289&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/ds.gif?fit=395%2C381&amp;ssl=1\" class=\"aligncenter wp-image-1354 size-medium\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/ds.gif?resize=300%2C289&#038;ssl=1\" alt=\"down sampling with convolution\" width=\"300\" height=\"289\" \/><\/p>\n<p style=\"text-align: center;\">(<em>DownSampling<\/em>\u00a0from input layer to locate image feature,\u00a0image source:\u00a0<a href=\"https:\/\/github.com\/vdumoulin\/conv_arithmetic\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/vdumoulin\/conv_arithmetic<\/a> )<\/p>\n<p>The down sampling process creates layers containing image features but omits image positions. So we use up sampling process to put features back into output layers.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1355\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/us\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/us.gif?fit=395%2C449&amp;ssl=1\" data-orig-size=\"395,449\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"us\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/us.gif?fit=264%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/us.gif?fit=395%2C449&amp;ssl=1\" class=\"aligncenter wp-image-1355 size-medium\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/us.gif?resize=264%2C300&#038;ssl=1\" alt=\"up sampling with transposed convolution\" width=\"264\" height=\"300\" \/><\/p>\n<p style=\"text-align: center;\">(<em>UpSampling<\/em>\u00a0from feature layer to output layer)<\/p>\n<h3>Building the U-Net<\/h3>\n<p>Okay, finally we are here, let&#8217;s build the U-Net!<\/p>\n<pre lang=\"python\" line=\"1\">def getUnetOutput(input_layer, features):\r\n    #downsampling 1\r\n    conv1 = Conv2D(features, (3, 3), activation=\"relu\", padding=\"same\")(input_layer)\r\n    conv1 = Conv2D(features, (3, 3), activation=\"relu\", padding=\"same\")(conv1)\r\n    pool1 = MaxPooling2D((2, 2))(conv1)\r\n    pool1 = Dropout(0.25)(pool1)    #drop 25% to 0 to avoid overfit\r\n    \r\n    #downsampling 2\r\n    conv2 = Conv2D(features * 2, (3, 3), activation='relu', padding='same') (pool1)\r\n    conv2 = Conv2D(features * 2, (3, 3), activation='relu', padding='same') (conv2)\r\n    pool2 = MaxPooling2D((2, 2)) (conv2)\r\n    pool2 = Dropout(0.3)(pool2)\r\n    \r\n    #downsampling 3\r\n    conv3 = Conv2D(features * 2**2, (3, 3), activation='relu', padding='same') (pool2)\r\n    conv3 = Conv2D(features * 2**2, (3, 3), activation='relu', padding='same') (conv3)\r\n    pool3 = MaxPooling2D((2, 2)) (conv3)   \r\n    pool3 = Dropout(0.4)(pool3)\r\n    \r\n    #downsampling 4\r\n    conv4 = Conv2D(features * 2**3, (3, 3), activation='relu', padding='same') (pool3)\r\n    conv4 = Conv2D(features * 2**3, (3, 3), activation='relu', padding='same') (conv4)\r\n    pool4 = MaxPooling2D(pool_size=(2, 2)) (conv4)\r\n    pool4 = Dropout(0.5)(pool4)\r\n    \r\n    #middle bridge 5\r\n    conv5 = Conv2D(features * 2**4, (3, 3), activation='relu', padding='same') (pool4)\r\n    conv5 = Conv2D(features * 2**4, (3, 3), activation='relu', padding='same') (conv5)\r\n    \r\n    #upsampling 6\r\n    tran6 = Conv2DTranspose(features * 2**3, (2, 2), strides=(2, 2), padding='same') (conv5)\r\n    tran6 = concatenate([tran6, conv4])   #merge tran6 and conv4 layers into 1 layer\r\n    tran6 = Dropout(0.5)(tran6)\r\n    conv6 = Conv2D(features * 2**3, (3, 3), activation='relu', padding='same') (tran6)\r\n    conv6 = Conv2D(features * 2**3, (3, 3), activation='relu', padding='same') (conv6)\r\n     \r\n    #upsampling 7\r\n    tran7 = Conv2DTranspose(features * 2**2, (2, 2), strides=(2, 2), padding='same') (conv6)\r\n    tran7 = concatenate([tran7, conv3])   #merge 2 layers into 1 \r\n    tran7 = Dropout(0.4)(tran7)\r\n    conv7 = Conv2D(features * 2**2, (3, 3), activation='relu', padding='same') (tran7)\r\n    conv7 = Conv2D(features * 2**2, (3, 3), activation='relu', padding='same') (conv7)\r\n    \r\n    #upsampling 8\r\n    tran8 = Conv2DTranspose(features * 2, (2, 2), strides=(2, 2), padding='same') (conv7)\r\n    tran8 = concatenate([tran8, conv2])   #merge 2 layers into 1 \r\n    tran8 = Dropout(0.3)(tran8)\r\n    conv8 = Conv2D(features * 2, (3, 3), activation='relu', padding='same') (tran8)\r\n    conv8 = Conv2D(features * 2, (3, 3), activation='relu', padding='same') (conv8)\r\n    \r\n    #upsampling 9\r\n    tran9 = Conv2DTranspose(features, (2, 2), strides=(2, 2), padding='same') (conv8)\r\n    tran9 = concatenate([tran9, conv1])   #merge 2 layers into 1 \r\n    tran9 = Dropout(0.25)(tran9)\r\n    conv9 = Conv2D(features, (3, 3), activation='relu', padding='same') (tran9)\r\n    conv9 = Conv2D(features, (3, 3), activation='relu', padding='same') (conv9)\r\n\r\n    output_layer = Conv2D(1, (1,1), padding=\"same\", activation=\"sigmoid\")(conv9)\r\n   \r\n    return output_layer   \r\n<\/pre>\n<p>This is a typical U-Net setting with down sampling, concatenating and up sampling layers.\u00a0 You can see we keep using Conv2D and MaxPooling2D layers on down sampling, in order to get images&#8217; features. We then use Conv2DTranspose layer to transform a output using the input&#8217;s pattern. And use Concatenate on the pairing down \/ up sampling layers. Dropout layers are optional, we use them to avoid overfitting.<\/p>\n<p>We use mostly basic parameters on our U-Net model. Since we are finding &#8220;salt&#8221; and &#8220;no salt&#8221; on images, we use binary classification as our loss function.<\/p>\n<pre lang=\"python\" line=\"1\">input_layer = Input((targe_width, targe_height, image_channel))\r\noutput_layer = getUnetOutput(input_layer, 8)\r\nmodel = Model(input_layer, output_layer)\r\nmodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=[\"accuracy\"])\r\n<\/pre>\n<p>Before we start to train our model, there is one more enhancement we can apply &#8212; data augmentation. That mean we flip, rotate or move images from our training set to generate a new training set. In this project, we flip our training images horizontally. So now we have 3200 training images plus 3200 flipped training images, total 6400 images to train.<\/p>\n<pre lang=\"python\" line=\"1\">X_train = np.append(X_train, [np.fliplr(i) for i in X_train], axis=0)\r\nY_train = np.append(Y_train, [np.fliplr(i) for i in Y_train], axis=0)\r\n<\/pre>\n<h3>It&#8217;s training time!<\/h3>\n<p>What time is it? <del>Adventure Time<\/del> Training Time! Again, we keep using basic parameters in this project. Since it is an image recognition project, before we start to train our project, remember to turn on GPU option if possible. As it can largely reduce the training time.<\/p>\n<pre lang=\"python\" line=\"1\">earlystopper = EarlyStopping(patience=7, verbose=1)\r\ncheckpointer = ModelCheckpoint('unet_model1', verbose=1, save_best_only=True)\r\nreduce_lr = ReduceLROnPlateau(factor=0.1, patience=5, min_lr=0.00001, verbose=1)\r\nepochs = 40\r\nbatch_size = 8\r\nresults = model.fit(X_train, Y_train,\r\n                    validation_data=[X_valid, Y_valid], \r\n                    epochs=epochs,\r\n                    batch_size=batch_size,\r\n                    callbacks=[earlystopper, checkpointer, reduce_lr])\r\n<\/pre>\n<p>The good thing for image recognition project is, sometimes we can validate our predictions by our own eyes. So we get our predictions using validate data (X_valid), then we compare them with actual results (Y_valid).<\/p>\n<pre lang=\"python\" line=\"1\">preds_valid = model.predict(X_valid).reshape(-1, targe_width, targe_height)\r\npreds_valid = np.array([downsample(x) for x in preds_valid])\r\norg_valid = np.array([downsample(x) for x in Y_valid.reshape(-1, targe_width, targe_height)])\r\n\r\nfor row in range(0,row_size): \r\n  image_index = 0 \r\n  if (row==0): \r\n    valid_array = preds_valid\r\n    title_prefix = \"pred\"\r\n  else: \r\n    valid_array = org_valid  \r\n    title_prefix = \"org\"\r\n  for col in range(0,col_size):\r\n    ax[row][col].imshow(valid_array[rand_id_array[image_index]], cmap=\"Greys\")\r\n    ax[row][col].set_title(\"{}-{}\".format(title_prefix, rand_id_array[image_index]))\r\n    image_index += 1\r\n\r\nplt.show() \r\n<\/pre>\n<p>The first row of images are our predicted results, while the images on the second row are the actual &#8220;salt masks&#8221;.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1364\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/pv\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?fit=1040%2C330&amp;ssl=1\" data-orig-size=\"1040,330\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"pv\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?fit=300%2C95&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?fit=1024%2C325&amp;ssl=1\" class=\"aligncenter wp-image-1364 size-large\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?resize=1024%2C325&#038;ssl=1\" alt=\"Validation of predictions\" width=\"1024\" height=\"325\" srcset=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?resize=1024%2C325&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?resize=300%2C95&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?resize=768%2C244&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pv.png?w=1040&amp;ssl=1 1040w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/>\u00a0It seems that we are pretty on track. :]] Now we are moving to our final step &#8212; IoU scoring.<\/p>\n<h3>Scoring by Intersection over Union (IoU)<\/h3>\n<p>Intersection over Union (IoU) is an evaluation metric to measure the accuracy of our object detection. It is defined by:<\/p>\n<pre>IoU = Area of Intersection \/ Area of Union\r\n<\/pre>\n<p>The &#8220;Area of Intersection&#8221; is the area where our predicted image overlaps the ground truth image (the actual &#8220;salt mask&#8221;). While the &#8220;Area of Union&#8221; is the area combining our predicted image and the ground truth image.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1367\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/iou2\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iou2.png?fit=806%2C199&amp;ssl=1\" data-orig-size=\"806,199\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"iou2\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iou2.png?fit=300%2C74&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iou2.png?fit=806%2C199&amp;ssl=1\" class=\"aligncenter wp-image-1367 size-full\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iou2.png?resize=806%2C199&#038;ssl=1\" alt=\"IoU explained\" width=\"806\" height=\"199\" srcset=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iou2.png?w=806&amp;ssl=1 806w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iou2.png?resize=300%2C74&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iou2.png?resize=768%2C190&amp;ssl=1 768w\" sizes=\"auto, (max-width: 806px) 100vw, 806px\" \/><\/p>\n<p style=\"text-align: center;\">(image source,\u00a0<a href=\"https:\/\/www.kaggle.com\/pestipeti\/explanation-of-scoring-metric\" target=\"_blank\" rel=\"noopener\">https:\/\/www.kaggle.com\/pestipeti\/explanation-of-scoring-metric<\/a>\u00a0)<\/p>\n<p>From above images, the IoU is\u00a0 872 \/ 1349 = 0.6464. A more accurate detection will have a value toward 1.<\/p>\n<p>Since our predicted images are stored in single channel and normalized between 0 and 1 values. We can adjust our image threshold to find the best IoU value.<\/p>\n<pre lang=\"python\" line=\"1\">thresholds = np.linspace(0, 1, 50)\r\n# iou_metric_batch src: https:\/\/www.kaggle.com\/aglotero\/another-iou-metric\r\nious = np.array([iou_metric_batch(org_valid, np.int32(preds_valid &gt; threshold)) for threshold in tqdm(thresholds)])\r\n<\/pre>\n<p>We can plot a Threshold and IoU XY chart for the best threshold choice.<\/p>\n<pre lang=\"python\" line=\"1\">threshold_best_index = np.argmax(ious)\r\niou_best = ious[threshold_best_index]\r\nthreshold_best = thresholds[threshold_best_index]\r\n\r\nplt.plot(thresholds, ious)\r\nplt.plot(threshold_best, iou_best, \"xr\", label=\"Best threshold\")\r\nplt.xlabel(\"Threshold\")\r\nplt.ylabel(\"IoU\")\r\nplt.title(\"Threshold vs IoU ({}, {})\".format(threshold_best, iou_best))\r\nplt.legend()\r\n<\/pre>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1370\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/iout\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iout.png?fit=510%2C345&amp;ssl=1\" data-orig-size=\"510,345\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"iout\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iout.png?fit=300%2C203&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iout.png?fit=510%2C345&amp;ssl=1\" class=\"aligncenter wp-image-1370 size-full\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iout.png?resize=510%2C345&#038;ssl=1\" alt=\"Threshold vs IoU\" width=\"510\" height=\"345\" srcset=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iout.png?w=510&amp;ssl=1 510w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/iout.png?resize=300%2C203&amp;ssl=1 300w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/p>\n<h3>Get the Predictions<\/h3>\n<p>Now we can get the predictions from our U-Net model.<\/p>\n<pre lang=\"python\" line=\"1\">X_test = np.array(df_test.image_array.map(upsample).tolist()).reshape(-1, targe_width, targe_height, image_channel)\r\npredict_test = (model.predict(X_test)).reshape(-1, targe_width, targe_height)\r\n<\/pre>\n<p>And visualize our predictions.<\/p>\n<pre lang=\"python\" line=\"1\">for row in range(0,row_size):\r\n  image_index = 0\r\n  for col in range(0,col_size):\r\n    if (row==0):\r\n      img = load_img(test_image_path_array[rand_id_array[image_index]])\r\n      ax[row][col].imshow(img)\r\n      title_prefix = \"input\"\r\n    else:\r\n      ax[row][col].imshow(predict_test[rand_id_array[image_index]], cmap=\"Greys\")\r\n      title_prefix = \"output\"\r\n    ax[row][col].set_title(\"{}-{}\".format(title_prefix, test_id_array[rand_id_array[image_index]]))\r\n    image_index += 1\r\nplt.show()\r\n<\/pre>\n<p>Images on the first row are inputs from testing data set and images on second row are our predictions:<br \/>\n<img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1372\" data-permalink=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/pred\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?fit=1036%2C336&amp;ssl=1\" data-orig-size=\"1036,336\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"pred\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?fit=300%2C97&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?fit=1024%2C332&amp;ssl=1\" class=\"aligncenter wp-image-1372 size-large\" src=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?resize=1024%2C332&#038;ssl=1\" alt=\"Predictions from testing data\" width=\"1024\" height=\"332\" srcset=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?resize=1024%2C332&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?resize=300%2C97&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?resize=768%2C249&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/pred.png?w=1036&amp;ssl=1 1036w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><br \/>\nExport outputs to a csv file using the best threshold value for submission.<\/p>\n<pre lang=\"python\" line=\"1\">predict_set = {idx: RLenc(np.round(predict_test[i]) &gt; threshold_best ) for i, idx in enumerate(tqdm(df_test.id.values))}\r\nsub = pd.DataFrame.from_dict(predict_set,orient='index')\r\nsub.index.names = ['id']\r\nsub.columns = ['rle_mask']\r\nsub.to_csv('submission.csv')\r\n<\/pre>\n<h3>Conclusion<\/h3>\n<p>We can get ~0.71x score (i.e. 71.x% accuracy) using above codes. There is still a lot of room for improvement, likes using pre-trained model,\u00a0residual networks,\u00a0test time augmentation, cross validation, etc. A better model should get at least 0.8 on scoring. The codes in this post won&#8217;t provide you a high scoring model, but it provides you the fundamental knowledge on object detection. So we can understand the basic and learn to apply improvement based on it.<\/p>\n<p>&nbsp;<\/p>\n<h3>What have we learnt in this post?<\/h3>\n<ol>\n<li>Architecture of U-Net<\/li>\n<li>Usage of IoU metric<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Other than our last\u00a0\u00a0hand writing challenge, there is another\u00a0Kaggle\u00a0challenge featuring image recognition &#8212;\u00a0TGS Salt Identification Challenge. But this time, we are going for an &#8220;upgrade&#8221;. As we are dealing with object detection. It is all about salt. In this challenge, our mission is finding geophysical\u00a0images that contain salt. Oh wait, does it sound weird? Actually [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1383,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[18],"tags":[56,55,105,19,103,104,58,107,102,30,106,101],"class_list":["post-1336","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-cnn","tag-convolutional-neural-network","tag-data-argumentation","tag-data-science","tag-fcn","tag-fully-convolutional-network","tag-image-recognition","tag-image-threshold","tag-iou","tag-kaggle","tag-object-detection","tag-u-net"],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>U-Net and IoU for Object Detection in Image Processing &#8902; Code A Star<\/title>\n<meta name=\"description\" content=\"In this post, we are going to use U-Net and IoU to handle Kaggle&#039;s object detection challenge -- TGS Salt Identification Challenge.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"U-Net and IoU for Object Detection in Image Processing &#8902; Code A Star\" \/>\n<meta property=\"og:description\" content=\"In this post, we are going to use U-Net and IoU to handle Kaggle&#039;s object detection challenge -- TGS Salt Identification Challenge.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/\" \/>\n<meta property=\"og:site_name\" content=\"Code A Star\" \/>\n<meta property=\"article:publisher\" content=\"codeastar\" \/>\n<meta property=\"article:author\" content=\"codeastar\" \/>\n<meta property=\"article:published_time\" content=\"2018-09-26T19:55:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"1004\" \/>\n\t<meta property=\"og:image:height\" content=\"661\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Raven Hon\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@codeastar\" \/>\n<meta name=\"twitter:site\" content=\"@codeastar\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Raven Hon\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/\"},\"author\":{\"name\":\"Raven Hon\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"headline\":\"U-Net and IoU for Object Detection in Image Processing\",\"datePublished\":\"2018-09-26T19:55:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/\"},\"wordCount\":1387,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"image\":{\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1\",\"keywords\":[\"CNN\",\"Convolutional Neural Network\",\"data argumentation\",\"Data Science\",\"FCN\",\"Fully Convolutional Network\",\"image recognition\",\"image threshold\",\"IoU\",\"Kaggle\",\"object detection\",\"u-net\"],\"articleSection\":[\"Learn Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/\",\"url\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/\",\"name\":\"U-Net and IoU for Object Detection in Image Processing &#8902; Code A Star\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1\",\"datePublished\":\"2018-09-26T19:55:06+00:00\",\"description\":\"In this post, we are going to use U-Net and IoU to handle Kaggle's object detection challenge -- TGS Salt Identification Challenge.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1\",\"width\":1004,\"height\":661,\"caption\":\"Object Detection\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codeastar.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"U-Net and IoU for Object Detection in Image Processing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codeastar.com\/#website\",\"url\":\"https:\/\/www.codeastar.com\/\",\"name\":\"Code A Star\",\"description\":\"We don&#039;t wish upon a star, we code a star\",\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codeastar.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\",\"name\":\"Raven Hon\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"width\":70,\"height\":70,\"caption\":\"Raven Hon\"},\"logo\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\"},\"description\":\"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.\",\"sameAs\":[\"https:\/\/www.codeastar.com\",\"codeastar\",\"https:\/\/x.com\/codeastar\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"U-Net and IoU for Object Detection in Image Processing &#8902; Code A Star","description":"In this post, we are going to use U-Net and IoU to handle Kaggle's object detection challenge -- TGS Salt Identification Challenge.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/","og_locale":"en_US","og_type":"article","og_title":"U-Net and IoU for Object Detection in Image Processing &#8902; Code A Star","og_description":"In this post, we are going to use U-Net and IoU to handle Kaggle's object detection challenge -- TGS Salt Identification Challenge.","og_url":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/","og_site_name":"Code A Star","article_publisher":"codeastar","article_author":"codeastar","article_published_time":"2018-09-26T19:55:06+00:00","og_image":[{"width":1004,"height":661,"url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1","type":"image\/png"}],"author":"Raven Hon","twitter_card":"summary_large_image","twitter_creator":"@codeastar","twitter_site":"@codeastar","twitter_misc":{"Written by":"Raven Hon","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#article","isPartOf":{"@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/"},"author":{"name":"Raven Hon","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"headline":"U-Net and IoU for Object Detection in Image Processing","datePublished":"2018-09-26T19:55:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/"},"wordCount":1387,"commentCount":0,"publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"image":{"@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1","keywords":["CNN","Convolutional Neural Network","data argumentation","Data Science","FCN","Fully Convolutional Network","image recognition","image threshold","IoU","Kaggle","object detection","u-net"],"articleSection":["Learn Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/","url":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/","name":"U-Net and IoU for Object Detection in Image Processing &#8902; Code A Star","isPartOf":{"@id":"https:\/\/www.codeastar.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage"},"image":{"@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1","datePublished":"2018-09-26T19:55:06+00:00","description":"In this post, we are going to use U-Net and IoU to handle Kaggle's object detection challenge -- TGS Salt Identification Challenge.","breadcrumb":{"@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codeastar.com\/u-net-object-detection-iou\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#primaryimage","url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1","width":1004,"height":661,"caption":"Object Detection"},{"@type":"BreadcrumbList","@id":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codeastar.com\/"},{"@type":"ListItem","position":2,"name":"U-Net and IoU for Object Detection in Image Processing"}]},{"@type":"WebSite","@id":"https:\/\/www.codeastar.com\/#website","url":"https:\/\/www.codeastar.com\/","name":"Code A Star","description":"We don&#039;t wish upon a star, we code a star","publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codeastar.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd","name":"Raven Hon","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/","url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","width":70,"height":70,"caption":"Raven Hon"},"logo":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/"},"description":"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.","sameAs":["https:\/\/www.codeastar.com","codeastar","https:\/\/x.com\/codeastar"]}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/09\/unet-1.png?fit=1004%2C661&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8PcRO-ly","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/1336","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/comments?post=1336"}],"version-history":[{"count":34,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/1336\/revisions"}],"predecessor-version":[{"id":1384,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/1336\/revisions\/1384"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media\/1383"}],"wp:attachment":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media?parent=1336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/categories?post=1336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/tags?post=1336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}