Your Personal, Private, and Powerful AI Data Scientist - Built for Impact
"What if every person, regardless of their technical background, could have access to a world-class data scientist that works entirely on their device, never sharing their private information, and costs nothing to use?"
Gemma Data Assistant transforms this vision into reality using Google's groundbreaking Gemma 3n model, creating the world's first truly autonomous, offline-first data science assistant that democratizes access to advanced analytics while preserving absolute privacy.
- 98% of small businesses lack access to professional data analysis capabilities.
- Medical researchers in developing countries can't afford expensive cloud AI services.
- Students and educators are blocked by subscription fees and internet requirements.
- Privacy-sensitive organizations (healthcare, finance, government) can't use cloud-based AI tools.
- Rural areas with limited internet are excluded from the AI revolution.
Note
Our mission is to eliminate these barriers by providing a tool that is zero-cost, fully offline, and guarantees bank-level privacy by design.
"A rural clinic in Kenya uses Gemma Data Assistant to analyze patient outcomes and optimize treatment protocols without sending sensitive health data to external servers."
Impact Features:
- Analyze patient data while maintaining HIPAA compliance.
- Identify treatment effectiveness patterns.
- Optimize resource allocation for limited-budget healthcare.
"A university in Bangladesh empowers 500+ students to perform advanced statistical analysis without expensive software licenses or cloud subscriptions."
Educational Benefits:
- Free access to professional data analysis tools.
- Learn by doing with real datasets and Jupyter notebook outputs.
- Offline accessibility for remote and underserved learning environments.
"Environmental scientists in the Amazon rainforest analyze deforestation patterns and climate data completely offline, ensuring sensitive location data remains secure."
Environmental Applications:
- Climate change impact analysis.
- Biodiversity research and conservation.
- Environmental monitoring and reporting.
"A family-owned restaurant chain uses the assistant to analyze sales patterns, optimize inventory, and improve customer satisfaction without hiring expensive consultants."
Business Intelligence:
- Customer behavior analysis and sales forecasting.
- Inventory management and marketing effectiveness.
- ๐ง Mobile-First Architecture: Designed specifically for edge devices.
- ๐ Efficiency Revolution: Runs on devices with just 8GB of RAM.
- ๐ Privacy by Design: No data is ever transmitted externally.
- ๐ Universal Access: Works without internet connectivity.
- ๐ช Professional Capabilities: Matches expensive cloud AI services.
Tip
Our implementation takes full advantage of Gemma 3n's efficiency. The core of our system is a sophisticated dual-agent pipeline that plans and executes tasks autonomously.
Our system uses two coordinated AI agents that work together in a three-stage process to deliver a seamless experience, from user request to final report.
-
Stage 1: User Interaction & Planning (
app.py)- โก๏ธ ๐๏ธ User Input: A user uploads a dataset and provides a natural language prompt.
- โก๏ธ ๐ง Planner Agent Activated: The system analyzes the request.
- โก๏ธ ๐ RAG Search: The Planner consults the
knowledge_base.jsonto find relevant analysis steps. - โก๏ธ ๐ Plan Proposal: The agent generates a high-level analysis plan for the user to review.
- โก๏ธ โ User Approval: The user approves the plan, which triggers the next stage.
-
Stage 2: Autonomous Execution & Self-Healing (
coding_agent.py)- โก๏ธ ๐จโ๐ป Executor Agent Activated: A background agent takes over and begins executing the approved plan task-by-task in a loop.
- โก๏ธ Inside the Execution Loop:
- 1. Code Generation: The agent writes Python code for the current task.
- 2. Notebook Execution: The code is run in a Jupyter Notebook cell.
- 3. Error Check:
- If
Error๐ฅ:- โก๏ธ Self-Healing: The agent uses its RAG system to search the
knowledge_base.jsonfor a solution, rewrites the code, and returns to Step 1 to try again.
- โก๏ธ Self-Healing: The agent uses its RAG system to search the
- If
Successโ :- โก๏ธ Append Results: The output is saved, and the agent moves to the next task in the plan.
- If
- โก๏ธ The loop continues until all tasks in the plan are completed.
-
Stage 3: Final Report
- โก๏ธ ๐ Analysis Complete: Once the loop finishes, the Executor Agent generates a final, user-friendly summary of the findings.
- โก๏ธ ๐ Deliverables: The user is presented with the summary and the complete, documented Jupyter Notebook.
Important
Your system must be 64-bit. An active internet connection is required only for the initial download of Ollama and the AI models. The application is 100% offline afterward.
Minimum Configuration (CPU Mode)
- RAM: 8GB
- Storage: 10GB free space
- CPU: Any modern 64-bit processor
Recommended Configuration (GPU Acceleration)
- GPU: NVIDIA GPU with 4GB+ VRAM (CUDA support)
- RAM: 16GB
- Storage: 10GB free space
- Python 3.9 - 3.11
- Ollama
- Streamlit
- Jupyter, FAISS, Pandas, Scikit-learn, and other libraries listed in
requirements.txt.
Note
This is a condensed guide. For a comprehensive, step-by-step walkthrough, please see the Installation & Setup Instructions file.
Important
The Ollama server must be running in the background for the assistant to function. The models are a one-time download.
# 1. Download and install Ollama from https://ollama.com
# 2. Start the Ollama service in a separate terminal
ollama serve
# 3. Download the required AI models
ollama pull gemma:2b
ollama pull nomic-embed-textTip
Using a Python virtual environment is highly recommended to avoid conflicts with other projects.
# Clone the project repository
git clone https://github.com/MarvelBoy047/Gemma_Ai_DataAssistant.git
cd Gemma_Ai_DataAssistant
# Create and activate a virtual environment
python -m venv venv
# On Windows: venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate
# Install all required Python packages
pip install -r requirements.txt# Start the Streamlit web application
streamlit run app.py
# Your browser will automatically open to http://localhost:8501The project's intelligence is driven by a decoupled, dual-agent architecture that communicates through the file system, ensuring robustness and scalability.
Gemma_Ai_DataAssistant/
โ
โโโ ๐ app.py # Core Streamlit UI & "Planner Agent"
โ
โโโ ๐ค coding_agent.py # Autonomous "Executor Agent" & Self-Healing Logic
โ
โโโ ๐ knowledge_base.json # Shared RAG database for planning and code correction
โ
โโโ ๐ฆ Requirements.txt # All Python package dependencies
โ
โโโ ๐งฉ Installation setup Instructions.md # nothing extra straight forward steps
|
โโโ ๐ก README.md #explaining why the project even exists
|
โโโ ๐๏ธchat_history/ # Communication Bus: Stores user chat logs and approved plans
โ โโโ <session_id>.json
โ
โโโ ๐๏ธagent_memory/ # Agent's Logbook: Detailed logs of the Executor's decisions & actions
โ โโโ <session_id>_memory.json
โ
โโโ ๐๏ธnotebooks/ # Final Deliverables: Stores generated .ipynb files and datasets
โ โโโ ๐๏ธ<session_id>/
โ โโโ <dataset_name>.csv
โ โโโ <session_id>.ipynb
โ
โโโ planner_kb_index/ # Cached FAISS index for the Planner Agent's knowledge base
-
app.py(The Planner & UI):- Manages the user-facing Streamlit application.
- Takes the user's natural language request and uses a FAISS-powered RAG search on the
knowledge_base.jsonto find relevant tasks. - Asks Gemma to create a high-level analysis plan, which is then presented to the user for approval.
-
coding_agent.py(The Autonomous Executor):- Runs in a background thread, constantly monitoring the
chat_historydirectory for new tasks approved by the user. - This agent is the workhorse: it generates, executes, and debugs Python code step-by-step.
- Features a multi-layered, self-healing mechanism that uses the
knowledge_base.jsonto fix its own errors.
- Runs in a background thread, constantly monitoring the
-
knowledge_base.json(The Shared Brain):- A repository of over 100 trusted data science code patterns and task descriptions.
- It serves both the Planner Agent (for creating plans) and the Executor Agent (for RAG-based error correction).
Tip
The Asynchronous Communication Bus
The app.py front-end and the coding_agent.py back-end are fully decoupled. They communicate asynchronously by writing and reading JSON files in the chat_history directory. This robust, file-based messaging system is what allows the agent to work on complex, long-running tasks in the background without freezing the UI.
- Context-Aware Strategy: Analyzes your dataset structure and creates optimal workflows.
- Goal-Oriented Planning: Focuses on delivering actionable insights, not just charts.
- Production-Quality Python: Generates clean, documented, professional code.
- Library Intelligence: Automatically selects appropriate tools (pandas, scikit-learn, etc.).
- Automatic Bug Detection: Identifies and diagnoses code execution failures.
- RAG-Powered Solutions: Searches 100+ proven fixes for similar problems.
- Context-Aware Repair: Considers your specific data when fixing errors.
Warning
This is the most important feature. Your data and analysis never leave your computer. There is zero data transmission to any cloud service, making it safe for sensitive information.
| Feature | Gemma Data Assistant | Cloud AI Services |
|---|---|---|
| Privacy | โ 100% Local | โ Data sent to servers |
| Cost | โ Free after setup | โ $20-100+/month |
| Offline Access | โ Works anywhere | โ Requires internet |
| Speed | โ No network latency | โ Network dependent |
| Aspect | Our Solution | Traditional Tools |
|---|---|---|
| Learning Curve | โ Natural language | โ Months of training |
| Setup Time | โ Under 15 minutes | โ Days/weeks |
| Error Handling | โ Automatic fixing | โ Manual debugging |
| Documentation | โ Auto-generated | โ Manual writing |
- โ Core offline analysis engine with self-healing RAG.
- โ Professional notebook generation.
- ๐ Advanced statistical methods and ML model recommendations.
- ๐ Multi-modal analysis (text, images, audio).
- ๐ฎ Healthcare-specific analysis templates (HIPAA-compliant).
- ๐ฎ Financial compliance and reporting tools.
Note
Our vision is to create a powerful, adaptable platform that can be specialized for any industry, putting expert-level analysis in the hands of domain professionals everywhere.
We believe in the power of open source to drive global change.
- Fork the repository.
- Create a new feature branch (
git checkout -b feature/amazing-improvement). - Commit your changes and push to your branch.
- Submit a pull request with a detailed description of your improvement.
Tip
The easiest way to contribute is by expanding the knowledge_base.json file. Add a new code pattern, a fix for a common error, or a new analysis technique. This directly improves the agent's intelligence!
This project is licensed under the MIT License. See the LICENSE file for details.
Important
We do not collect, store, or transmit any of your data. All processing, analysis, and storage happens exclusively on your local machine. You retain 100% ownership and control of your information at all times.
- Google Gemma Team: For creating the revolutionary Gemma model family.
- Ollama Community: For making local AI accessible to everyone.
- The Python Open Source Community: For the incredible libraries that power this tool.
- GitHub Issues: For bugs and feature requests, please open an issue.