M.2 About (SSD: SingleShotDetector)

M.2 About (SSD: SingleShotDetector)

M.2 About (SSD: SingleShotDetector)

ddr4 8gb SSD OEM M.2 This paper was selected for ECCV 2016 OralM.2, which is a representative algorithm in the field of object detectionM.2, first author Wei.” After graduating, Liu is now a Lead Machine Learning Researcher at self-driving company Nuro.

this article is written with reference to the latest version on arXiv, submitted on Thu, 29 Dec 2016. Since there are already too many Chinese interpretations on the Internet SSD article, the starting point of this article is to summarize and summarize some of the main ideas of the original text, and more detailed interpretation can refer to the two links at the end of the article.

SSD The starting point is to discretize the output space of the bounding boxes into a series of default boxes (these default boxes are built on each cell position of the texture map, and have different scales and scales) (the conclusion is expressed as The representation allows us to efficiently model the space of possible box shapes.)

simultaneously combines predictions from multiple texture maps of different resolutions (sizes), which is conducive to handling multi-scale targets. Eventually implement Falling different default box shapes in several feature maps let us efficiently discretize the space of possible output box shapes.

 prediction, the output of the network is the individual score for each default box on all categories and the correction value for that box shape.

performance: For 300 × 300 input, SSD achieves 74.3% mAP on VOC2007 test at59 FPS on a Nvidia Titan X, for512 × 512 input, SSD achieves 76.9% mAP

network structure is as follows:

Inputsize: 300 × 300 or 512 × 512 (Fast and Faster R-CNN 600 × 600, while YOLO is 448 × 448).

uses the outputs of conv4_3, conv7, conv8_2, conv9_2, conv10_2, conv11_2 to predict localization and confidence, and the size of the six profile maps is

the scale of the default box on it is

So SSD300 can predict a total of

bounding boxes (where conv4_3, conv10_2, conv11_2 predict 4 default boxes per position on the characteristic map; CONV7, conv8_2, conv9_2 6 predictions).


training, first match these default boxes and ground truth boxes.

for example, the two blue default boxes in figure b above (i.e., the projection of the default box on the original image) both match the cats in the ground truth, and the red default box in C above (that is, the projection of the default box on the original image) matches the dog in the ground truth. All three default boxes are divided into positive samples, and the remaining default boxes are divided into negative samples.

Q: How do you get the classification score and shape correction value of each box by the characteristic map?

answer: each characteristic map is processed with a series of convolutionalfilters (also known as convolutional predictors) with a size of 3 × 3 × the number of default boxes per location on the feature map. The essence is that there is a convolutional relationship between the default box and the characteristic map.

ask: How do I set up a default box?

answer: (Fortunately, within the SSD framework, the default boxes do not necessary need to correspond to the actual receptive fields of each layer. ) We design the tiling of default boxes so that specific feature maps learn to be responsive to particular scales of the objects. 

the final effect: a variety of characteristic map sizes can correspond to the detection of objects of various scales; A multi-scale detection box on a characteristic map corresponds to multiple proportions or deformations of the object.

in practical applications, we can customize and optimize the distribution of the default boxes to better apply to specific datasets to obtain better detection results.

welcome to leave a message to discuss more article details!

with some recommended blogs for reference:

ddr 4 8gbSSD OEMM.2