Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 2.34 KB

20250226.md

File metadata and controls

29 lines (21 loc) · 2.34 KB

Information

  • Time: 13:30 - 14:00
  • Attendees: Bob Zhang (Supervisor), Huang Yanzhen, Mai Jiajun

Discussion Summary

Revision State

By now, the entire project is near a rough complete. We have successfully trained our own CNN and YOLO11 to realize real-time posture recognition and object detection to detect pedestrian cell phones.

The project had progressed into its revision state, where we're looking back to our previous steps and refine our work. At the mean time, some excessive progress is made to push the project into a combination of richer knowledge and better application.

1. Revising Posture Recognition CNN

Currently, a posture is represented by a 4 dimensional structure. To be specific, each data point has 2 channels, each channel have 3-dimensions. The two channels have separate syntactic meaning: The first channel is the target angle values and the second channel is the target angle scores.

  • Current Z-normalization considers both channels at once. That is, the mean and std calculated is based on all data in both channels. In the future, we'll instead perform z-normalization on a per-channel manner, where each channel have its own mean and std.
  • Along with a new Z-normalization method, we'll add batch norm layers into the network to further normalize intermediate inferences.
  • A combined criterion is planned to be applied with weights to further improve the model's robustness and generalization ability.

2. Revising YOLO11

We've gathered over 3000 images of cell phones and backgrounds (null samples), which would be increased to over 10,000 after augmentation in Roboflow. The YOLO model had gained evident ability to recognize cell phones on one's hand that's better than the pre-trained one. We're currently gathering data on-the-go to find dead-ends and some uncovered situations.

Further Progress

As an excessive progress, it is attempted to write a backend connected with a database that compares human faces is being constructed. Tech stack:

  • FastAPI: For handling API requests.
  • PostgreSQL: To store user data and face data (Work in progress)

Currently, simple CRUD of user authentication and permission system is done. This progress is less prioritized than the two above, and its progress is possible to be blocked by them.

Next Meeting's Agenda

  • Demonstrate our revision on the enhanced posture recognition model