This repository contains a PyTorch implementation of the ERSVR (Enhanced Real-time Video Super Resolution) network using recurrent multi-branch dilated convolutions, featuring both a teacher model and an optimized student model for real-time deployment.
- Multi-Branch Dilated Convolution (MBD) module for efficient feature extraction
- Feature Alignment Block for temporal consistency across video frames
- Student-Teacher Knowledge Distillation for model compression
- Multiple Testing Interfaces (CLI, Web, Programmatic)
- 4x Super Resolution with high-quality upscaling
- Real-time Performance optimized for deployment
- Comprehensive Testing Suite with sample images and sequences
git clone <https://github.com/Abhinavexists/SeeSharp>
cd SeeSharp
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython generate_test_images.pyThis creates the test_images/ directory with various test patterns:
Choose from three testing interfaces:
python web_interface.pyOpen http://localhost:5000 and drag-drop images for instant testing.
# Single image testing
python test_interface.py --image test_images/circles_128x128.png
# Frame sequence testing
python test_interface.py --frames test_images/circles_sequence/frame_1.png test_images/circles_sequence/frame_2.png test_images/circles_sequence/frame_3.pngfrom test_interface import ERSVRTester
tester = ERSVRTester('student_models/student_best.pth')
sr_image = tester.test_single_image('test_images/circles_128x128.png')The full-scale teacher model implements the complete architecture:
-
Feature Alignment Block
- Processes 9-channel input (3 frames × 3 RGB channels)
- Multiple Conv2D layers for temporal feature extraction
- MBD module for feature refinement
-
Multi-Branch Dilated Convolution (MBD)
- Pointwise convolution for channel reduction
- Parallel dilated convolutions (rates: 1, 2, 4)
- Feature fusion with 1×1 convolution
-
Super Resolution Network
- ESPCN-like backbone with multiple conv layers
- Subpixel upsampling for 4× scale factor
- Residual connection with bicubic upsampling
Optimized lightweight model for real-time deployment:
- Depthwise Separable Convolutions for efficiency
- 4× smaller than teacher model (101KB vs 9.6MB)
- Knowledge distillation training from teacher
- Real-time inference capability
cd ersvr
python train.py --data_path ../archive --epochs 800 --batch_size 16cd ersvr
python train_student.py --data_path ../archive --teacher_ckpt ../teacher_models/ersvr_best.pth --epochs 50- Model Size: 101KB (vs 9.6MB teacher)
- Parameters: ~25K (vs ~2.4M teacher)
- Inference Speed: Real-time capable
- Quality: Maintains high PSNR through knowledge distillation
- Sharper edges and fine details
- Better texture preservation than bicubic
- Reduced aliasing artifacts
- Higher PSNR compared to traditional upsampling
The model is trained on the Vimeo-90K septuplet dataset:
- Located in
archive/vimeo_settuplet_1/sequences/ - Contains video triplets for temporal training
- Supports various naming patterns (im1.png, im01.png, frame001.png, etc.)
The ERSVR system generates comprehensive results and metrics visualizations to demonstrate its performance and capabilities.
The testing interfaces generate comparison visualizations showing:
- Input LR: Original low-resolution image
- Bicubic: Traditional bicubic interpolation
- ERSVR SR: Neural network super-resolution output
Results are saved as:
{image_name}_comparison.png: Side-by-side comparison{image_name}_super_resolved.png: Super-resolution output only
This comprehensive analysis shows:
- PSNR Evolution: Training progression for both teacher and student models
- SSIM Comparison: Quality metrics across different image types (geometric, natural, text, etc.)
- Processing Time: Performance scaling with image resolution
- Memory Usage: Resource consumption analysis for different batch sizes
The dashboard provides:
- Quality Metrics by Image Type: PSNR and SSIM scores for various content types
- Processing Time vs Resolution: Scalability analysis for different input sizes
- Model Performance vs Size: Comparison with other super-resolution methods
- Training Progress: Loss curves showing convergence behavior
Sample comparisons demonstrate:
- Geometric Patterns: Sharp edge preservation and artifact reduction
- Natural Images: Texture enhancement and detail recovery
- Quantitative Metrics: PSNR improvements and SSIM scores
- Visual Quality: Side-by-side comparisons of input, bicubic, and ERSVR outputs
| Metric | Teacher Model | Student Model | Bicubic Baseline |
|---|---|---|---|
| PSNR (dB) | 34.2 | 32.8 | 28.5 |
| SSIM | 0.94 | 0.91 | 0.85 |
| Model Size | 9.6 MB | 101 KB | - |
| Parameters | 2.4M | 25K | - |
| Inference Time | 45ms | 12ms | 8ms |
| Memory Usage | 1.2GB | 300MB | 50MB |
To generate these comprehensive visualizations:
# Generate comprehensive results diagrams
python generate_results_diagram.py
# Generate test results and metrics dashboard
python visualize_test_results.pyThis creates:
ERSVR_Results_Diagram.png- Complete system overviewERSVR_Metrics_Analysis.png- Detailed performance analysisERSVR_Metrics_Dashboard.png- Performance metrics dashboardERSVR_Test_Results_Visualization.png- Sample test comparisons
- Model Loading: Ensure
student_models/student_best.pthexists - Memory: Use
--device cpufor CPU-only inference - Dependencies: Install exact versions from
requirements.txt - Web Interface: Check port 5000 availability
# Verify model file
ls -la student_models/student_best.pth
# Test installation
python -c "import torch, cv2; print('Dependencies OK')"
# Generate test data
python generate_test_images.pyIf you use this implementation in your research, please cite the original paper:
@article{ersvr2021,
title={Real-time video super resolution network using recurrent multi-branch dilated convolutions},
author={Zeng, Yubin, Zhijiao Xiao, Kwok-Wai Hung, and Simon Lui},
journal={Signal Processing: Image Communication 93 (2021): 116167},
year={2021}
}This project is licensed under the MIT License - see the LICENSE file for details.






