SSD principle and implementation [ssd hard drives test data]admin
ssd hard drives test data
bulk ssd hard drives SSD OEM ram ddr4
In recent years, object detection has made important progress. Mainstream algorithms are mainly divided into two categories (ssd hard drives): (1) Two-stage methods, such as R-CNN algorithm, the main idea is to first generate a series of sparse candidates through heuristic method (selective search) or CNN network (RPN) boxes, and then classify and regress these candidate boxes. The advantage of the two-stage method is high accuracy; (2) The main idea of the one-stage method, such as Yolo and SSD, is to uniformly perform dense sampling at different positions of the image. It is possible to sample with different scales and aspect ratios, then use a CNN to extract features for direct classification and regression. The whole process only needs one step, so its advantage is Fast, but an important disadvantage of uniform dense sampling is that training is difficult, mainly because the positive samples and negative samples (ssd hard drives) are extremely unbalanced (see Focal Loss), resulting in slightly lower model accuracy. The performance of the different algorithms is shown in Figure 1, where the difference in accuracy and speed between the two methods can be seen.
This article introduces the SSD algorithm. Its full English name is Single Shot MultiBox Detector. Nice name. Single shot means that the SSD algorithm belongs to a single-stage method, and MultiBox means that SSD is a multi-box prediction. In the previous article, we have already talked about the Yolo algorithm. As can be seen from Figure 1, the SSD algorithm is much better than Yolo in both accuracy and speed (except for SSD512). Figure 2 shows the basic framework of the different algorithms. For Faster R-CNN, it first obtains candidate frames through CNN, and then performs classification and regression, while Yolo and SSD can complete detection in one step. Compared with Yolo, SSD uses CNN for direct detection, instead of doing detection after the fully connected layer like Yolo. In fact, using convolution for direct detection is just one of the differences between SSD and Yolo. There are also two important changes. First, SSD extracts feature maps of different scales for detection. Large-scale feature maps (more advanced Feature maps) can be used to detect small objects, while small-scale feature maps (later feature maps) are used to detect large objects; Second, SSD uses prior boxes (Prior boxes, different scales and Aspect ratio default boxes), called anchors in Faster R-CNN). The disadvantage of the Yolo algorithm is that it is difficult to detect small targets and the positioning is inaccurate, but these important improvements allow SSD to overcome these shortcomings to a certain extent. Below we explain the principle of the SDD algorithm in detail, and finally give how to use TensorFlow to implement the SSD algorithm.
design concept ssd
Like Yolo, SSD uses a CNN network for detection, but with multi-scale feature maps. Its basic structure is shown in Figure 3. The core design concept of SSD is summarized in the following three points:
(1) Use multi-scale feature maps for detection
The so-called multi-scale uses feature maps of different sizes. CNN networks generally have larger feature maps in the front, and then gradually use stride=2 convolution or pooling to reduce the feature map size. As shown in Figure 3, a relatively large feature map and a relatively small feature map are used for detection. The benefit of this is that larger feature maps are used to detect relatively small objects, while smaller feature maps are responsible for detecting larger objects. As shown in Figure 4, the 8×8 feature map can be divided into more units, but the priority box scale of each unit is relatively small.
Figure 4 Feature maps at different scales
(2) Detection using convolution
Unlike Yolo, which uses a fully connected layer at the end, SSD directly uses convolution to extract detection results from different feature maps. For feature maps with shape [Formula], only a relatively small convolution kernel such as [Formula] is needed to obtain detection values.
(3) Set a prior box
In Yolo, each unit predicts multiple bounding boxes, but they are all relative to the unit itself (square), but the shape of the real object is variable, and Yolo needs to adapt to the shape of the object during training. SSD borrows the anchor concept from Faster R-CNN. Each cell is set with a priori frame of different scale or aspect ratio. The predicted bounding boxes are based on these prior frames. Reduce the difficulty of training to a certain extent. In general, each cell sets multiple prior boxes, and their scales and aspect ratios are different. As shown in Figure 5, it can be seen that each unit uses 4 different prior boxes. In the figure, cat and Dogs are trained with a priori box that best suits their shape. The a priori box matching principle during training will be explained in detail later.
Figure 5 Box before SSD
The detection value of SSD is also different from Yolo. For each prior box of each unit, output a set of independent detection values, corresponding to a bounding box, which is mainly divided into two parts. The first part is the confidence or score for each category. It’s worth noting that SSD also treats backgrounds as a special category. If the detection target has the [formula] category, SSD actually needs to predict the [formula] confidence value, where the first A confidence is the score that does not contain the object or belongs to the background. When we talk about the [formula] category confidence later, please remember that it contains the special category of background, that is, the real detection category is only [formula]. In the prediction process, the category with the highest confidence is the category to which the bounding box belongs. In particular, when the first confidence value is the highest, it means that the bounding box does not contain the object. The second part is the position of the bounding box, which contains 4 values [formulas], which represent the center coordinates and width and height of the bounding box respectively. But the real prediction value is actually just the transformation value of the bounding box relative to the prior box (the paper says it is offset, but I think transformation is more suitable, see R-CNN). The position of the Prior box is represented by [formula], and the corresponding bounding box is represented by [formula], then the predicted value [formula] of the bounding box is actually the conversion value of [formula] relative to [formula]: