-
Time:
$13:20 - 14:00$ - Attendees: Bob Zhang (Supervisor), Huang Yanzhen, Mai Jiajun
Cover boundary data. Supplement the dataset on some actions that were not observed in data collection.
Previously, we use
We use a new method to generate target datasets:
- First, list all the
$13$ corner points into set$A$ . - Then, generate all the possible combinations of two edge points from the
$13$ points, into set$E\subset A\times A$ . ($C_{13}^2$ ) - After that, product the corner points into the edge points combinations:
$A'=E\times A$ - Lastly, remove angles that has a point in the edge points that is the same to the corner point.
Totally, we have
Previously, we only have
We introduced a theory of "Loss Function" from CISC3024 Pattern Recognition. The idea is that, different selection errors may have differently significant consequences, i.e., "losses" or "costs".
- For instance, if we mis-classify a pedestrian who is not using a phone into "using", the utmost cost we could have is to get sued by him/her.
- However, if we mis-classify a pedestrian who is using a phone, it is possible that he/she will get hit by a car and die, which is a way larger cost.
Therefore, to cover this issue, we introduced a bias factor during application. It is an adjustable factor that allows the user (i.e. the pathway chief monitor, or whatever you'd like to call it ;) ) to adjust how "casually" the model considers a person as "using".
This does not require to re-train the model, or have the model trained multiple times. We only work on the output of the model. As discussed before, the model has two outputs: The un-softmaxed 2-dimensional output.
$$
\mathbf{y}=\begin{pmatrix}y_{0} \ y_{1}\end{pmatrix}=f(\mathbf{x}), \ \mathbf{x}\in\mathbb{R}^{3}\times\mathbb{R}^1\times\mathbb{R}^{286}, \ \mathbf{y}\in\mathbb{R}^2
$$
The output will be then sigmoid-ed to suppress it into a possibility-like score in range
Moreover, this bias also worked as a work-around of the lack of data variance in the training data.
Posture Estimation Model | Equipment (GPU) | Data Num | Epoch Num | LR |
---|---|---|---|---|
RTMPose Medium | RTX-4060 Laptop | 4118 +, 3858 - | 650 | 5e-6 |
Under a larger data sample dimension, the overfitting point moved from around
We can see that the gradient at the end of the training fluctuates, this means that the learning rate is quite large that it's jumping around a local minimum. However, it is still tolerable.
Again, the model is well-fitted in the given data samples, yet it is overfitted in the actual posts in the real-world.
To begin with, this model performs well in the regular term.
We conclude that the convolution is useful, as it lets the model to learn the dependencies among different features. For an evident instance, the model had learned the dependencies between "lowering head" and "raising up hand."
https://drive.google.com/drive/u/0/folders/1oUhbQTxlxmzz8J78fJUmWMiYGS3mtNIR
- Victor Mai Jiajun gave a new structure of inputs, and try to train and test it.
- get ready to train a phone-detector model with interface of YOLOv8 and dive into the next phase of the whole project.
- cover more videos that not yet been fitted in model in the 1st phase.