Information

Time: 16:10 - 16:45
Attendees: Bob Zhang (Supervisor), Huang Yanzhen, Mai Jiajun

Discussion Summary

1. Undetectable Pedestrians

We introduced a new process of filtering out pedestrians that has low outcome of posture detection. These pedestrians will be ignored by the classifier to leverage performance on the detectable individuals. Currently, we have 2 undetectable pedestrians:

Back (Discussed in previous meetings ): The pedestrian is showing its back to the camera.
- Criteria: The backside ratio is smaller than -0.2 and confidence of the landmarks of the two shoulder is higher than 0.3.
- Backside ratio = $\frac{l_shoulder_x-r_shoulder_x}{width_bbox}$
Out of frame (New): Most of the landmarks of this pedestrian is out of frame.
- Criteria: More than 5 points of the scores of the first 13 landmarks is smaller than 0.3.

We use the binary digits of an integer to denote such filtering states for great extendability.

	Base-10 Integer	Binary Integer
`OK_CLASSIFY`	0	`0000, 0000`
`OUT_OF_FRAME`	1	`0000, 0001`
`BACK_SIDE`	2	`0000, 0010`
Possible future filtering state 1	4	`0000, 0100`
Possible future filtering state 2	8	`0000, 1000`
By `AND`-ing the classification state with the filtering state, we could efficiently determine whether to classify one person or not: Perform classification if and only if the `AND` result is 0.

P.S. OUT_OF_FRAME has higher priority over BACK_SIDE. That means, if both of them is fulfilled, by the current mechanism, only OUT_OF_FRAME will be reported. This is done to save performance from the algorithm giving un-useful determinations.

2. YOLO11

2.1 Model Training (In Progress)

Python package of ultralytics and roboflow is used. Roboflow's online workspace is used to manage training data to train a pre-trained YOLOv11 model to detect specifically for phones. (Training is done locally, we added a step03_yolo_phone_detection directory in the yolo_dev branch of posture_mmpose).

Data are currently from Google and Baidu. Original data has about 200 pictures. Around 600 individual training data is obtained after various image augmentations.

The performance of the model is still under optimization.

2.2 Interface Exposure

We expose the interface for the YOLOv11 model by cropping the sub-pictures of a person's hand and feed them into the model as inputs.

The center of the sub-frame is determined by:

The landmark of ones wrists, and
the width of the bounding box:
$l_center = lm_l_wrist+(lm_l_wrist-lm_l_elbow)\times\times0.5$
The center is obtained by offsetting the landmark of the wrist by half of the vector pointing from the pedestrian's elbow to its wrist. The width & height of the sub-frame is determined by the width & height of the bounding box.
$hand_w = \frac{bbox_w}{2}$
$hand_h=\frac{bbox_h}{2}$

We added frame-rate monitoring, and the model's performance on a single person without YOLO is around 25 FPS, and the one with YOLO is around 10 FPS.

3. Additional Dataset

We added additional dataset for the conditions when pedestrians are wearing thick clothes. This is done because that we discovered the issues that the posture detection model's performance is weak on pedestrians with thick clothes. We inspected that the MMPose inference result is quite different for people with thick clothes. Therefore, we recorded a full set of dataset with a thick clothes worn by the model.

Posture Estimation Model	Equipment (GPU)	Data Num	Epoch Num	LR
RTMPose Medium	RTX-4060 Laptop	8221 +, 5100 -	Early Stopped at 211	5e-6

The model recovered its performance after the additional data is given.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20241128.md

20241128.md

Information

Discussion Summary

1. Undetectable Pedestrians

2. YOLO11

2.1 Model Training (In Progress)

2.2 Interface Exposure

3. Additional Dataset

Files

20241128.md

Latest commit

History

20241128.md

File metadata and controls

Information

Discussion Summary

1. Undetectable Pedestrians

2. YOLO11

2.1 Model Training (In Progress)

2.2 Interface Exposure

3. Additional Dataset