From a98da8dc4624d0161b7800abc41c13372922c679 Mon Sep 17 00:00:00 2001 From: mike2ox Date: Fri, 18 Oct 2019 17:45:31 +0900 Subject: [PATCH] Change Model, Dataset(#18) - Faster R-CNN to SSD - Deepfashion2 to VOC2012 --- Object_detection_tutorial_by_keras_API.md | 39 +++++++++++++++-------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/Object_detection_tutorial_by_keras_API.md b/Object_detection_tutorial_by_keras_API.md index b757ba7..5630391 100644 --- a/Object_detection_tutorial_by_keras_API.md +++ b/Object_detection_tutorial_by_keras_API.md @@ -3,31 +3,44 @@ ## Introduction

-

+

-__Object Detection__ is one of the most popular computer vision technologies in many areas.(Face detection, Self-driving car etc) Recently, __Deep Learning__ technology has greatly influenced the Object Detection field, such as accuracy, performance improvement. -There are several popular deep learning algorithms. __Faster R-CNN(2015)__, __YOLO(2015)__, __SSD(2015)__ and __RetinaNet(2017)__. In this tutorial, we will __Faster R-CNN(2015)__ to learn what object detection is. +__Object Detection__ is one of the most popular computer vision technologies in many areas.(Face detection, Self-driving car etc) Recently, __Deep Learning__ technology has greatly influenced the Object Detection field, such as accuracy, performance improvement. +There are several popular deep learning algorithms. __Faster R-CNN(2015)__, __YOLO (2015)__, __SSD(2016)__ and __RetinaNet(2017)__. In this tutorial, we will __SSD(2016)__ to learn what object detection is. -#### _What's Faster R-CNN?_ -[Faster R-CNN(2015)](https://arxiv.org/pdf/1506.01497.pdf) is one of the R-CNN models that extracts Region Proposals **used by Region Proposal Network** and classifies them on the basis of CNN models. +#### _Why and what is SSD (Single Shot MultiBox Dectector)?_ -[model picture]() +

+ +

+ +In image above, the models marked by red showed excellent result at object detection field. +Among those models, the reasons for selecting SSD in this tutorial are + - **Fast training** (SSD is 1 stage method and use convolution layer at `Extra Feature Layers`) + - **Getting high detection accuracy** (SSD produce predictions of diffenrent scales from multiple scale feature maps) + - **Providing weights (trained by COCO) of SSD in [Tensorflow github](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md)** + +So, Let's look at the SSD model structure. + +

+ +

-Before 'Faster R-CNN', Region proposals were extracted in raw image(R-CNN) or feature map(Fast R-CNN) using selvective serarch. However, This method is slower than gpu computation and cause to occur bottleneck because they operate on cpu computation outside the CNN model. +First, Look at SSD structure image above. The SSD consists of VGG16 and Extra feature layers and uses input images(300*300*3). -In order to eliminate bottlenecks, 'Faster R-CNN' applied a CNN model(called **Region proposal network(RPN)**) to the algorithm to obtain region proposals. RPN takes as input a small window (3 X 3) of feature map passed by CNN model (just make feature map of raw image). Each window is mapped to a lower-dimensional feature(256 or 512). This feature is used 2 small networks. one is classifying object or none object, the other is regressing bbox locations. -#### _What's Deepfashion2?_ +#### _What Dataset use this tutorial?_

-

+ +

- In fact, many object detection tutorials use famous dataset such as [COCO](http://cocodataset.org/), [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) etc. But, this tutorial uses [DeepFashion2(2019)](https://arxiv.org/pdf/1901.07973.pdf). ~~DeepFashion2 is comprehensive fashion dataset that contains 491k images, each of which is richly labeled with style, scale, occlusion, zooming, viewpoint, bounding box, dense landmarks and pose, pixel-level masks, and pair of images of identical item from consumer and commercial store.~~(To be written by the responsible personnel.) + In fact, many object detection tutorials use famous dataset such as [COCO](http://cocodataset.org/), [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) etc. Among them, we decided to use VOC2012. ~~insert voc2012 description~~ ## Setting up Environments -In order to running this tutorial(object detection based on deep learning), many development packages and environment settings are needed. **BUT, DON'T WORRY.** In this tutorial, you can easily set up a development environment on your computer or server using a docker. **JUST FOLLOW ME** +In order to running this tutorial(object detection based on deep learning), many development packages and environment settings are needed. **BUT, DON'T WORRY.** In this tutorial, you can easily set up a development environment on your computer or server using a docker. **Just follow this tutorial.** -First, we use a Docker (OS : [Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/), [windows](https://docs.docker.com/docker-for-windows/)) to set up the developments package and environments required for deep learning development. ~~And, If you don't have Docker Hub ID, you can't download [our docker image](). so, you need to sign up in [Docker Hub](https://hub.docker.com/).~~ See below. +First, we use a Docker (OS : [Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/), [windows](https://docs.docker.com/docker-for-windows/)) to set up the developments package and environments required for deep learning development. See below. ```bash # yolk/