A sophisticated document conversion system that transforms various document formats into professional LaTeX documents using AI-powered analysis and hybrid processing approaches.
- AI-Powered Classification: Uses GPT-4o for intelligent document type detection
- Hybrid Processing: Code-first conversion with GPT-4 fallback for complex documents
- Indian Government Standards: Compliant with Government of India manual of office procedures
- Web Interface: User-friendly Flask web application
- Multiple Input Formats: Supports TXT, DOC, DOCX, and PDF files
- PDF Compilation: Integrated pdfLaTeX compilation with error handling
- Template Library: Pre-built templates for various document types
- Office Memorandums
- Government Circulars
- Notifications
- Reports
- Policy Documents
- Academic Papers
- Legal Documents
- Python 3.8+
- LaTeX distribution (TeX Live/MiKTeX)
- OpenAI API key (optional, for AI features)
-
Clone the repository:
git clone https://github.com/ymcaPrabhu/AI.git cd AI -
Create and activate virtual environment:
python -m venv .venv # Windows: .venv\Scripts\activate # Linux/Mac: source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables (optional):
cp .env.example .env # Edit .env and add your OpenAI API key
-
Start the web server:
python app_ai.py
-
Open your browser to
http://localhost:5001 -
Upload a document and select conversion options
-
Download the generated LaTeX source or compiled PDF
python src/convert.py --in input/sample.docx --template pro_report --meta config/docmeta.yaml --brand config/brand.yaml --out output/overleaf_project --buildpython src/export/pack_overleaf.py --src output/overleaf_project --zip output/overleaf_project.zipThe system uses an intelligent hybrid approach:
- Classification: GPT-4o analyzes document structure and type
- Conversion: Code-based rules handle standard documents efficiently
- Enhancement: GPT-4 processes complex cases requiring advanced understanding
- Cost Optimization: Minimal AI usage while maintaining high quality
βββ src/
β βββ ai_processor.py # AI document analysis and processing
β βββ template_engine.py # LaTeX template generation
β βββ convert.py # Command-line conversion tool
β βββ export/
β βββ pack_overleaf.py # Overleaf package creator
βββ templates/ # HTML templates for web interface
βββ config/ # Configuration files
βββ input/ # Sample input documents
βββ app_ai.py # Main web application
βββ requirements.txt # Python dependencies
βββ README.md # This file
OPENAI_API_KEY: Your OpenAI API key for AI featuresDISABLE_AI: Set to '1' to disable AI featuresFLASK_DEBUG: Set to '1' for debug modeHOST: Server host (default: 127.0.0.1)PORT: Server port (default: 5001)
- Modify
config/docmeta.yamlfor document metadata - Customize
config/brand.yamlfor branding elements - Edit templates in
src/template_engine.pyfor custom formatting
- Government Offices: Convert documents to standard government formats
- Academic Institutions: Transform research papers and reports
- Legal Firms: Format legal documents with proper structure
- Corporate: Create professional reports and documentation
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for GPT-4o and GPT-4 API
- Government of India Manual of Office Procedures
- LaTeX community for excellent documentation tools
Note: This system is optimized for cost-effective AI usage while maintaining high-quality document conversion standards.