![]() The true positives, false positives, false negatives are calculated using intersection-over-union (IOU) criterion greater than 0.5. The frames are high resolution images 1920x1080 pixels resized to 960x544 pixels before passing to the PeopleNet detection model. ![]() The inference performance of PeopleNet v2.6 model was measured against more than 90000 proprietary images across a variety of environments. If the head and shoulders are not visible please follow the Truncation guidelines in item 5 above.Įach frame is not required to have an object. Truncation for person class: If a truncated person’s head and shoulders are visible and the visible height is approximately 20% or more mark the bounding box around the visible part of the person object. Truncation: For an object other than a person that is at the edge of the frame with visibility of 60% or more visible are marked with the truncation flag for the object. If the head and shoulders are not visible please follow the Occlusion guidelines in item 3 above. Occlusion for person class: If an occluded person’s head and shoulders are visible and the visible height is approximately 20% or more, then these objects are marked by the bounding box around the visible part of the person object. Objects under 60% visibility are not annotated. These objects are marked as partially occluded. Occlusion: For partially occluded objects that do not belong a person class and are visible approximately 60% or are marked as visible objects with bounding box around visible part of the object. that do not alter the silhouette of the pedestrian significantly. For example, exclude a rolling bag if they are pulling it behind them and are distinctly visible as separate object. If a person is carrying an object please mark the bounding-box to include the carried object as long as it doesn’t affect the silhouette of the person. If you are looking to re-train with your own dataset, please follow the guideline below for highest accuracy.Īll objects that fall under one of the three classes (person, face, bag) in the image and are larger than the smallest bounding-box limit for the corresponding class (height >= 10px OR width >= 10px are labeled with the appropriate class label. Following guidelines were used while labelling the training data for NVIDIA PeopleNet model. The training dataset is created by labeling ground-truth bounding-boxes and categories by human labellers. Training Data Ground-truth Labeling Guidelines This dataset included about 200k of "Low Contrast" images, where the people and their clothing blend into the background. We have also added approximately 500 thousand images with low-density scenes with people extending their hands and feet to improve the performance for use-cases where person object detection is followed by pose-estimation. ![]() This content was chosen to improve accuracy of the models for convenience-store retail analytics use-case. For this case, the camera is typically set up at approximately 10 feet height, 45-degree angle and has close field-of-view. Approximately half of the training data consisted of images captured in an indoor office environment. The training dataset consists of a mix of camera heights, crowd-density, and field-of view (FOV). PeopleNet v2.6 model was trained on a proprietary dataset with more than 7.6 million images and more than 71 million objects for person class. In the second phase the network is retrained using quantization-aware training (QAT). In the first phase, the network is trained without regularization. The training is carried out in two phases. The training algorithm optimizes the network to minimize the localization and confidence loss for the objects. This model was trained using the DetectNet_v2 entrypoint in TAO. The raw normalized bounding-box and confidence detections needs to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce final bounding-box coordinates and category labels. Gridbox system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. This model is based on NVIDIA DetectNet_v2 detector with ResNet34 as feature extractor. ![]() Three categories of objects detected by these models are – persons, bags and faces. The models described in this card detect one or more physical objects from three categories within an image and return a box around each object, as well as a category label for each object. ![]()
0 Comments
Leave a Reply. |