Skip to content

Latest commit

 

History

History
71 lines (57 loc) · 3.03 KB

20240913.md

File metadata and controls

71 lines (57 loc) · 3.03 KB

Information

  • Time: 13:20 - 13:50
  • Attendees: Bob Zhang (Supervisor), Mai Jiajun, Huang Yanzhen

Discussion Summary

Config for MMPose

  • We discovered the exact config for MMPose to run on cuda.
  • Introduce new repo posture_mmpose, which is the mmpose version of the old posture repo.
  • The required environments and config steps is listed in README.md of repository posture_mmpose.

Performance of MMPose

  • Lower process time per frame compared to Mediapipe.
    • MMPose uses configuration files to determine which key landmarks to recognize for each person.
    • Originally, we used the 133-point config, recognizing all the possible landmarks, including the facial details of each person.
    • However, the real-time performance is bad, coming with a high processing time per frame under stress testing. Moreover, facial detection performs not as good as the body one.
    • Therefore, we omit the facial detection part, leaving only the 17-point body detection.
  • MMPose comes with built-in object detection, so we don't need to use a dedicated YOLOv5s model to perform image cropping.
  • The original posture will be archived as a backup.

Image Annotation

Recall from the first meeting: Scores of angles

  • Geometric average of scores of three points of the angle
  • $angle_score = (score_1\times score_2\times score_3)^{\dfrac{1}{3}}$
  • Backup calculation plan: Harmonic Mean (Harder for computers to compute, lower accuracy).

Neural Network:

Brief

  • Input: Each person flattened, a [num_people x (2 x num_angles)] matrix stored in .npy file format.
    • Each Person:
      • $9$ key angles, along with the angle score
      • Flattened into a $18$-d vector.
    • Input matrix is of shape $(num_test\times 18)$
    • Normalize Value:
      • Normalized angle value into $[0,1]$ by dividing $180$.
      • Normalized entire input matrix using Z-value approach, i.e., a normal distribution with $\mu=0$ and $\sigma=1$.
  • Output: 2 classes
  • Structure:
    • 4 fully-connected layers
    • use ReLU as activation function
  • Training Parameters:
    • input size = $18$, which is the dimension of data samples.
    • hidden size = $100$
    • learning rate = $0.01$
    • number of epochs = $100$
  • We introduced how to use several target angles and their corresponding confidence levels calculated using site information as input for deep learning model training.

Issues

  • Loss don't come below $0.5$ since around the $160^{th}$ epoch.

  • Training accuracy is around $0.61$ and test accuracy is $0.16$. Data sample too few.

    • Add more samples by capturing videos of ourselves.
    • Convert video to images, and crop person from video to get dataset.
    • Video should include all camera visual perspectives.
  • Potential need for multi-pattern classification

    • "Calling"
    • "Using"
    • "Not using"
    • etc.

Web Streaming

  • Added Pause function to frontend.

Agenda for the next meeting

  • Significant Extension in training data by using videos. Goal is up to 2000, the more the better.
  • Explore more internal functionalities of the MLP (multi-layer perceptrons).