{"id":1336,"date":"2018-09-26T19:55:06","date_gmt":"2018-09-26T19:55:06","guid":{"rendered":"https:\/\/www.codeastar.com\/?p=1336"},"modified":"2018-09-26T19:55:06","modified_gmt":"2018-09-26T19:55:06","slug":"u-net-object-detection-iou","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/u-net-object-detection-iou\/","title":{"rendered":"U-Net and IoU for Object Detection in Image Processing"},"content":{"rendered":"
Other than our last\u00a0\u00a0<\/span>hand writing challenge<\/a>,<\/span> there is another\u00a0Kaggle<\/span><\/a>\u00a0<\/span>challenge featuring image recognition —\u00a0<\/span>TGS Salt Identification Challenge<\/a>. But this time, we are going for an “upgrade”. As we are dealing with object detection. It is all about salt. In this challenge, our mission is finding geophysical\u00a0images that contain salt. Oh wait, does it sound weird? Actually not. Salt in soil is bad on plant growth and can damage underground infrastructure like pipes and blocks. Salt identification can locate problematic parts, thus people can avoid or apply fix on those areas.\u00a0Unlike the last\u00a0<\/span>hand writing challenge\u00a0which we needed to find “what” the features were. This time, other than the “what”, we need to find “where” the features are. And the good thing for this challenge is, new requirement comes new knowledge, so we have new knowledge to learn — U-Net and IoU.<\/span><\/p>\n <\/p>\n Like all our previous Data Science projects, the first thing we do is to understand the problem. So we take a look on training data files from Kaggle. There are 2 folders, “images” and “masks”. “Images” is a folder containing input images, while “masks” is a folder containing result images, i.e. where the salt is. We can visualize the training set in following lines of code:<\/p>\n We load 14 images randomly from the training set, the first 7 of them are input images and the last 7 are the salt mask images.<\/p>\n <\/p>\n The input images are taken from somewhere in our Earth (Kaggle didn’t expose the location[s]) and the mask images are showing where the salt is (the white part). Our goal is making mask images from testing data. Now our goal is clear, let’s move to our next usual step in data science projects:<\/p>\n First, we set the image to grayscale and divid it by 255. Thus image channel can be normalized within 0 to 1 range.<\/p>\n Second, we calculate the coverage of each image and mask pair.<\/p>\n For mask image with more salt (larger white area), the coverage value will trend to be closer to 1. On the other hand, the coverage value will be closer to 0 when there is no much salt (mostly dark in a mask image).<\/p>\n It is obvious that most of our images are salt-less. Now, we can categorize images into “coverage class” according to their “coverage” value.<\/p>\n We are in the last step of our features engineering —\u00a0image resizing. Since we are going to use U-net architecture for object detection, we will resize our image from its original size 101 x 101 to 128 x 128, i.e. size with the power of 2.<\/p>\n We are about to build our learning model, the U-Net. But before doing this, we would like to prepare our training and validation datasets first.<\/p>\n We have 4000 pairs of training images and split 20% of them as validation data, i.e. 3600 pairs as training data, 800 pairs as validation data. You may notice that we use “stratify=df_train.coverage_class” to split the images. Thus data is split in a classified fashion, according to the “coverage_class”. We apply this rule to ensure we have validation data in each class, otherwise the validation might be filled with the most dominated “class 0” data.<\/p>\n Now we are going to the core of this Data Science project — build the U-Net model! But first things first, what is the U-Net?<\/p>\n The\u00a0U-Net is a\u00a0convolutional neural network (CNN) which it finds features from inputs and provides classifications. It is similar to the model we built in our hand writing recognition project. But, it is also a fully\u00a0convolutional network (FCN). That means there is no dense \/ flatten layer and the model is connected layer by layer. From our hand writing project experience, we used convolutional and pooling layers to find image features and skipped the feature positions. In the U-Net architecture, we are going to get the feature locations as well. So we have transposed convolutional layers.\u00a0Too many jargons? Let’s have a simpler version: In the U-Net, we use downsampling for getting features, then we use upsampling for getting positions. The U-Net should look like:<\/p>\n <\/p>\n (image source:\u00a0Dept. of Computer Science, University of Freiburg, Germany<\/a> )<\/p>\n There are downsampling on the left side, upsampling on the right side and\u00a0concatenating at the bottom. These setups make a U shaped architecture, that is why U-Net is the “U”-Net. :]]<\/p>\n We should be no stranger with down sampling as we did taste it in the image recognition project<\/a> before. We used the down sampling to locate features of hand writing digits.<\/p>\n <\/p>\nUnderstand the Problem<\/h3>\n
train_image_path = \"..\/input\/train\/images\/\"\r\ntrain_mask_path = \"..\/input\/train\/masks\/\"\r\n\r\nimage_array = []\r\n\r\nfor root, dirs, files in os.walk(train_image_path): \r\n image_array = files\r\n\r\ncol_size = 7\r\nrow_size = 2\r\n\r\nrand_id_array = random.sample(range(0, len(image_array)), col_size)\r\n\r\nfig, ax = plt.subplots(row_size, col_size, figsize=(20,6)) \r\n\r\nfor row in range(0,row_size): \r\n image_index = 0\r\n if (row==0): \r\n da_path= train_image_path\r\n else: \r\n da_path= train_mask_path\r\n \r\n for col in range(0,col_size):\r\n img = load_img(da_path+image_array[rand_id_array[image_index]])\r\n ax[row][col].imshow(img)\r\n ax[row][col].set_title(image_array[rand_id_array[image_index]])\r\n image_index += 1\r\n\r\nplt.show() \r\n<\/pre>\n
Features Engineering<\/h3>\n
df_train = pd.DataFrame({'id':train_id_arr,'image_path':train_image_arr,'mask_path':train_mask_arr})\r\n\r\ndf_train[\"image_array\"] = [np.array(load_img(path=file_path, color_mode=\"grayscale\")) \/ 255 for file_path in tqdm(df_train.image_path)]\r\ndf_train[\"mask_array\"] = [np.array(load_img(path=file_path, color_mode=\"grayscale\")) \/ 255 for file_path in tqdm(df_train.mask_path)]\r\n<\/pre>\n
img_width, img_height = img.size\r\ndf_train[\"coverage\"] = df_train.mask_array.map(np.sum) \/ (img_width * img_height)\r\n<\/pre>\n
sns.distplot(df_train.coverage, kde=False)<\/pre>\n
def cov_to_class(val): \r\n for i in range(0, 11):\r\n if val * 10 <= i :\r\n return i \r\ndf_train[\"coverage_class\"] = df_train.coverage.map(cov_to_class)\r\n<\/pre>\n
targe_width = 128\r\ntarge_height = 128\r\n\r\ndef upsample(img_array):\r\n return resize(img_array, (targe_width, targe_height), mode='constant', preserve_range=True)\r\n<\/pre>\n
Entering the U-Net<\/h3>\n
X_train, X_valid, Y_train, Y_valid = train_test_split(\r\n np.array(df_train.image_array.map(upsample).tolist()).reshape(-1, targe_width, targe_height, image_channel),#get a image array from df, upsample it\r\n np.array(df_train.mask_array.map(upsample).tolist()).reshape(-1, targe_width, targe_height, image_channel), #and reshape it to (unknown[image count], upsample_w, upsample_h, 1[channel])\r\n test_size=0.2,\r\n stratify=df_train.coverage_class, \r\n random_state=256)\r\n<\/pre>\n
DownSampling and UpSampling in U-Net<\/h3>\n