DeepInfant is a deep learning model for classifying infant cries into different categories using audio processing and neural networks. The model is designed to be deployed on iOS devices and uses a pre-trained model for initial weights.
DeepInfant/ ├── Data/ │ └── v2/ │ ├── belly_pain/ │ ├── burping/ │ ├── cold_hot/ │ ├── discomfort/ │ ├── hungry/ │ ├── lonely/ │ ├── scared/ │ ├── tired/ │ └── unknown/ ├── processed_dataset/ │ ├── train/ │ ├── test/ │ └── metadata.csv ├── prepare_dataset.py ├── train.py └── TRAINING.md
The model uses pre-trained weights from the iOS deployment model as a starting point. This transfer learning approach helps improve performance and reduce training time.
The raw dataset should be organized in the following structure under Data/v2/
:
- belly_pain (bp): Belly pain cries
- burping (bu): Burping sounds
- cold_hot (ch): Temperature discomfort
- discomfort (dc): General discomfort
- hungry (hu): Hunger cries
- lonely (lo): Loneliness cries
- scared (sc): Fear-related cries
- tired (ti): Tiredness cries
- unknown (un): Unclassified cries
Run:
python prepare_dataset.py
This script:
- Creates train/test splits (80/20)
- Resamples all audio to 16kHz
- Converts files to WAV format
- Generates metadata.csv
- Organizes processed files in the processed_dataset directory
- CNN layers for feature extraction
- Bi-directional LSTM for temporal modeling
- Squeeze-and-excitation blocks for channel attention
- Final classification layers
- Sample rate: 16kHz
- Duration: 7 seconds (padded/trimmed)
- Features: Mel spectrogram
- n_mels: 80
- n_fft: 1024
- hop_length: 256
- frequency range: 20Hz - 8000Hz
During training:
- Random time shift (-100ms to 100ms)
- Random noise injection (30% probability)
python train.py
- Batch size: 32
- Learning rate: 0.001
- Epochs: 50
- Optimizer: Adam
- Loss function: CrossEntropyLoss
The training process outputs:
- Training loss and accuracy per epoch
- Validation loss and accuracy per epoch
- Progress bars
- Best model checkpoint saving
- Trained model saved as 'deepinfant.pth'
- Best model selected based on validation accuracy
The model outputs predictions for 9 classes with evaluation metrics:
- Training accuracy
- Validation accuracy
- Loss curves
After training, the model can be converted and deployed to iOS devices using Core ML conversion tools.
- Ensure sufficient GPU memory for training
- Monitor validation metrics for overfitting
- Backup trained models regularly
- Consider early stopping if validation metrics plateau