Skip to content

Latest commit

 

History

History
84 lines (59 loc) · 5.07 KB

20241101.md

File metadata and controls

84 lines (59 loc) · 5.07 KB

Information

  • Time: $13:20 - 14:00$
  • Attendees: Bob Zhang (Supervisor), Huang Yanzhen, Mai Jiajun

Discussion Summary

1. Final Dataset Supplementation

Cover boundary data. Supplement the dataset on some actions that were not observed in data collection.

2. Adjust Features

Previously, we use $C_{13}^{3}$ to produce angle combinations. We mistakenly ignored that the order of the angle points are important. The previous $286$ angles are not robust enough, as the points with a larger index tends to be the angle point of the three points in more combinations.

We use a new method to generate target datasets:

  • First, list all the $13$ corner points into set $A$.
  • Then, generate all the possible combinations of two edge points from the $13$ points, into set $E\subset A\times A$. ($C_{13}^2$)
  • After that, product the corner points into the edge points combinations: $A'=E\times A$
  • Lastly, remove angles that has a point in the edge points that is the same to the corner point.

Totally, we have $C_{13}^{1}C_{12}^{2}=858$ angles. Adding the scores, it will be $858\times 2=1716$ dimensions in the feature dataset.

Previously, we only have $286$ angles, yielding a $572$ dimensional data sample. Now, the initial convolutional layers grows from $2$ to $6$, with each channel having $286$ data, still.

2. Trade-Off Factor in Application

We introduced a theory of "Loss Function" from CISC3024 Pattern Recognition. The idea is that, different selection errors may have differently significant consequences, i.e., "losses" or "costs".

  • For instance, if we mis-classify a pedestrian who is not using a phone into "using", the utmost cost we could have is to get sued by him/her.
  • However, if we mis-classify a pedestrian who is using a phone, it is possible that he/she will get hit by a car and die, which is a way larger cost.

Therefore, to cover this issue, we introduced a bias factor during application. It is an adjustable factor that allows the user (i.e. the pathway chief monitor, or whatever you'd like to call it ;) ) to adjust how "casually" the model considers a person as "using".

This does not require to re-train the model, or have the model trained multiple times. We only work on the output of the model. As discussed before, the model has two outputs: The un-softmaxed 2-dimensional output.

$$ \mathbf{y}=\begin{pmatrix}y_{0} \ y_{1}\end{pmatrix}=f(\mathbf{x}), \ \mathbf{x}\in\mathbb{R}^{3}\times\mathbb{R}^1\times\mathbb{R}^{286}, \ \mathbf{y}\in\mathbb{R}^2 $$ The output will be then sigmoid-ed to suppress it into a possibility-like score in range $[0,1]$. $$ \mathbf{y}'=\begin{pmatrix}y_0'\y_1'\end{pmatrix}=\text{sigmoid}(\mathbf{y})\in\mathbb{R}^2 $$ The final output will be: $$ y^*=\begin{cases} \text{argmax}_{i\in{0,1}} \ y_i & y_1 < \theta \ y_1 & \text{otherwise} \end{cases} $$ where $\theta$ is the pre-defined threshold.

Moreover, this bias also worked as a work-around of the lack of data variance in the training data.

3. Performance Inspection

Posture Estimation Model Equipment (GPU) Data Num Epoch Num LR
RTMPose Medium RTX-4060 Laptop 4118 +, 3858 - 650 5e-6

3.1 Training Performance

Under a larger data sample dimension, the overfitting point moved from around $950$ epochs to $650$ epochs. More useful information is discovered!

We can see that the gradient at the end of the training fluctuates, this means that the learning rate is quite large that it's jumping around a local minimum. However, it is still tolerable.

3.2 Testing Performance

Again, the model is well-fitted in the given data samples, yet it is overfitted in the actual posts in the real-world.

3.3 Inspection Performance

3.3.1 Regular

To begin with, this model performs well in the regular term.

We conclude that the convolution is useful, as it lets the model to learn the dependencies among different features. For an evident instance, the model had learned the dependencies between "lowering head" and "raising up hand."

https://drive.google.com/drive/u/0/folders/1oUhbQTxlxmzz8J78fJUmWMiYGS3mtNIR

Agenda for the next meeting

  1. Victor Mai Jiajun gave a new structure of inputs, and try to train and test it.
  2. get ready to train a phone-detector model with interface of YOLOv8 and dive into the next phase of the whole project.
  3. cover more videos that not yet been fitted in model in the 1st phase.