This project implements deep learning models for video frame prediction using different architectures including ConvLSTM, PredRNN, and Transformer-based approaches. The models are trained on the UCF101 dataset and can predict future video frames based on a sequence of input frames.
DEEP-LEARNING/
βββ checkpoints/ # Model checkpoints directory
βββ ucf101/ # UCF101 dataset
β βββ train/ # Training videos
β β βββ GolfSwing/
β β βββ PizzaTossing/
β β βββ Punch/
β β βββ Typing/
β β βββ YoYo/
β βββ test/ # Testing videos
β βββ GolfSwing/
β βββ PizzaTossing/
β βββ Punch/
β βββ Typing/
β βββ YoYo/
βββ app.py # Streamlit web application
βββ ConvLSTM.py # ConvLSTM model implementation
βββ PredRNN.py # PredRNN model implementation
βββ Preprocessing.py # Data preprocessing script
βββ requirements.txt # Project dependencies
βββ Transformer.py # Transformer model implementation
βββ readme.md # Project documentation
Install the required packages using:
pip install -r requirements.txt
Required packages:
- torch>=1.9.0
- torchvision>=0.10.0
- numpy>=1.19.2
- opencv-python>=4.5.3
- scikit-image>=0.18.3
- matplotlib>=3.4.3
- streamlit>=1.0.0
- tqdm>=4.62.3
- Pillow>=8.3.2
- pytest>=6.2.5
- black>=21.9b0
- flake8>=3.9.2
- isort>=5.9.3
- Download the UCF101 dataset and organize it in the following structure:
data/ucf101/
βββ train/
β βββ GolfSwing/
β βββ PizzaTossing/
β βββ Punch/
β βββ Typing/
β βββ YoYo/
βββ test/
βββ GolfSwing/
βββ PizzaTossing/
βββ Punch/
βββ Typing/
βββ YoYo/
- Run the preprocessing script:
python Preprocessing.py --input-dir ucf101 --output-dir processed_data
Note: The preprocessing script is configured for Windows environments. For Linux users, file path separators and video reading mechanisms may need to be adjusted.
Train each model separately using their respective Python files:
- ConvLSTM Model:
python ConvLSTM.py
- PredRNN Model:
python PredRNN.py
- Transformer Model:
python Transformer.py
Each training script will:
- Load the preprocessed data
- Train the model
- Save the model checkpoints to the
checkpoints
directory
Launch the Streamlit web interface:
streamlit run app.py
The web application will:
- Allow you to select a model (ConvLSTM, PredRNN, or Transformer)
- Upload a video or choose from sample videos
- Generate and display frame predictions
- Convolutional LSTM architecture
- Combines spatial and temporal feature learning
- Suitable for capturing short-term motion patterns
- Advanced spatiotemporal memory flow
- Multiple LSTM layers with skip connections
- Effective for long-term dependencies
- Self-attention mechanism for temporal modeling
- Position encoding for frame sequence information
- Memory-efficient implementation
-
Video Loading Issues:
- Ensure videos are in .avi or .mp4 format
- Check file permissions
- Verify OpenCV installation
-
CUDA/GPU Issues:
- Verify PyTorch is installed with CUDA support
- Check GPU memory usage
- Adjust batch size if needed
-
Preprocessing Errors:
- Ensure correct file paths
- Verify disk space availability
- Check input video format compatibility
This project is open-source and available under the MIT License.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request