Information

Time: 13:20 - 13:50
Attendees: Bob Zhang (Supervisor), Mai Jiajun, Huang Yanzhen

Discussion Summary

Config for MMPose

We discovered the exact config for MMPose to run on cuda.
Introduce new repo posture_mmpose, which is the mmpose version of the old posture repo.
The required environments and config steps is listed in README.md of repository posture_mmpose.

Performance of MMPose

Lower process time per frame compared to Mediapipe.
- MMPose uses configuration files to determine which key landmarks to recognize for each person.
- Originally, we used the 133-point config, recognizing all the possible landmarks, including the facial details of each person.
- However, the real-time performance is bad, coming with a high processing time per frame under stress testing. Moreover, facial detection performs not as good as the body one.
- Therefore, we omit the facial detection part, leaving only the 17-point body detection.
MMPose comes with built-in object detection, so we don't need to use a dedicated YOLOv5s model to perform image cropping.
The original posture will be archived as a backup.

Image Annotation

Recall from the first meeting: Scores of angles

Geometric average of scores of three points of the angle
$angle_score = (score_1\times score_2\times score_3)^{\dfrac{1}{3}}$
Backup calculation plan: Harmonic Mean (Harder for computers to compute, lower accuracy).

Neural Network:

Brief

Input: Each person flattened, a [num_people x (2 x num_angles)] matrix stored in .npy file format.
- Each Person:
  - $9$ key angles, along with the angle score
  - Flattened into a $18$-d vector.
- Input matrix is of shape $(num_test\times 18)$
- Normalize Value:
  - Normalized angle value into $[0,1]$ by dividing $180$.
  - Normalized entire input matrix using Z-value approach, i.e., a normal distribution with $\mu=0$ and $\sigma=1$.
Output: 2 classes
Structure:
- 4 fully-connected layers
- use ReLU as activation function
Training Parameters:
- input size = $18$, which is the dimension of data samples.
- hidden size = $100$
- learning rate = $0.01$
- number of epochs = $100$
We introduced how to use several target angles and their corresponding confidence levels calculated using site information as input for deep learning model training.

Issues

Loss don't come below $0.5$ since around the $160^{th}$ epoch.
Training accuracy is around $0.61$ and test accuracy is $0.16$. Data sample too few.
- Add more samples by capturing videos of ourselves.
- Convert video to images, and crop person from video to get dataset.
- Video should include all camera visual perspectives.
Potential need for multi-pattern classification
- "Calling"
- "Using"
- "Not using"
- etc.

Web Streaming

Added Pause function to frontend.

Agenda for the next meeting

Significant Extension in training data by using videos. Goal is up to 2000, the more the better.
Explore more internal functionalities of the MLP (multi-layer perceptrons).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20240913.md

20240913.md

Information

Discussion Summary

Config for MMPose

Performance of MMPose

Image Annotation

Recall from the first meeting: Scores of angles

Neural Network:

Brief

Issues

Web Streaming

Agenda for the next meeting

Files

20240913.md

Latest commit

History

20240913.md

File metadata and controls

Information

Discussion Summary

Config for MMPose

Performance of MMPose

Image Annotation

Recall from the first meeting: Scores of angles

Neural Network:

Brief

Issues

Web Streaming

Agenda for the next meeting