🤖 Gemma Data Assistant: Democratizing AI-Powered Data Science

Your Personal, Private, and Powerful AI Data Scientist - Built for Impact

"What if every person, regardless of their technical background, could have access to a world-class data scientist that works entirely on their device, never sharing their private information, and costs nothing to use?"

Gemma Data Assistant transforms this vision into reality using Google's groundbreaking Gemma 3n model, creating the world's first truly autonomous, offline-first data science assistant that democratizes access to advanced analytics while preserving absolute privacy.

🌍 The Global Impact Challenge We're Solving

The Problem: Data Science Inequality

98% of small businesses lack access to professional data analysis capabilities.
Medical researchers in developing countries can't afford expensive cloud AI services.
Students and educators are blocked by subscription fees and internet requirements.
Privacy-sensitive organizations (healthcare, finance, government) can't use cloud-based AI tools.
Rural areas with limited internet are excluded from the AI revolution.

Note

Our mission is to eliminate these barriers by providing a tool that is zero-cost, fully offline, and guarantees bank-level privacy by design.

🎯 Real-World Impact Scenarios

🏥 Healthcare & Medical Research

"A rural clinic in Kenya uses Gemma Data Assistant to analyze patient outcomes and optimize treatment protocols without sending sensitive health data to external servers."

Impact Features:

Analyze patient data while maintaining HIPAA compliance.
Identify treatment effectiveness patterns.
Optimize resource allocation for limited-budget healthcare.

🎓 Education & Academic Research

"A university in Bangladesh empowers 500+ students to perform advanced statistical analysis without expensive software licenses or cloud subscriptions."

Educational Benefits:

Free access to professional data analysis tools.
Learn by doing with real datasets and Jupyter notebook outputs.
Offline accessibility for remote and underserved learning environments.

🌱 Environmental & Climate Research

"Environmental scientists in the Amazon rainforest analyze deforestation patterns and climate data completely offline, ensuring sensitive location data remains secure."

Environmental Applications:

Climate change impact analysis.
Biodiversity research and conservation.
Environmental monitoring and reporting.

💼 Small Business Empowerment

"A family-owned restaurant chain uses the assistant to analyze sales patterns, optimize inventory, and improve customer satisfaction without hiring expensive consultants."

Business Intelligence:

Customer behavior analysis and sales forecasting.
Inventory management and marketing effectiveness.

⚡ Why Gemma 3n Makes This Possible

Technical Breakthrough Features:

🧠 Mobile-First Architecture: Designed specifically for edge devices.
🚀 Efficiency Revolution: Runs on devices with just 8GB of RAM.
🔒 Privacy by Design: No data is ever transmitted externally.
🌐 Universal Access: Works without internet connectivity.
💪 Professional Capabilities: Matches expensive cloud AI services.

Tip

Our implementation takes full advantage of Gemma 3n's efficiency. The core of our system is a sophisticated dual-agent pipeline that plans and executes tasks autonomously.

🚀 How It Works: The Dual-Agent Intelligence Pipeline

Our system uses two coordinated AI agents that work together in a three-stage process to deliver a seamless experience, from user request to final report.

Stage 1: User Interaction & Planning (app.py)
- ➡️ 🗂️ User Input: A user uploads a dataset and provides a natural language prompt.
- ➡️ 🧠 Planner Agent Activated: The system analyzes the request.
- ➡️ 📚 RAG Search: The Planner consults the knowledge_base.json to find relevant analysis steps.
- ➡️ 📝 Plan Proposal: The agent generates a high-level analysis plan for the user to review.
- ➡️ ✅ User Approval: The user approves the plan, which triggers the next stage.
Stage 2: Autonomous Execution & Self-Healing (coding_agent.py)
- ➡️ 👨‍💻 Executor Agent Activated: A background agent takes over and begins executing the approved plan task-by-task in a loop.
- ➡️ Inside the Execution Loop:
  - 1. Code Generation: The agent writes Python code for the current task.
  - 2. Notebook Execution: The code is run in a Jupyter Notebook cell.
  - 3. Error Check:
    - If Error 💥:
      - ➡️ Self-Healing: The agent uses its RAG system to search the knowledge_base.json for a solution, rewrites the code, and returns to Step 1 to try again.
    - If Success ✅:
      - ➡️ Append Results: The output is saved, and the agent moves to the next task in the plan.
- ➡️ The loop continues until all tasks in the plan are completed.
Stage 3: Final Report
- ➡️ 📋 Analysis Complete: Once the loop finishes, the Executor Agent generates a final, user-friendly summary of the findings.
- ➡️ 🎁 Deliverables: The user is presented with the summary and the complete, documented Jupyter Notebook.

📋 Prerequisites & System Requirements

Important

Your system must be 64-bit. An active internet connection is required only for the initial download of Ollama and the AI models. The application is 100% offline afterward.

Hardware Requirements:

Minimum Configuration (CPU Mode)

RAM: 8GB
Storage: 10GB free space
CPU: Any modern 64-bit processor

Recommended Configuration (GPU Acceleration)

GPU: NVIDIA GPU with 4GB+ VRAM (CUDA support)
RAM: 16GB
Storage: 10GB free space

Software Dependencies:

Python 3.9 - 3.11
Ollama
Streamlit
Jupyter, FAISS, Pandas, Scikit-learn, and other libraries listed in requirements.txt.

🚀 Quick Start Guide

Note

This is a condensed guide. For a comprehensive, step-by-step walkthrough, please see the Installation & Setup Instructions file.

Step 1: Install Ollama & Download Models

Important

The Ollama server must be running in the background for the assistant to function. The models are a one-time download.

# 1. Download and install Ollama from https://ollama.com

# 2. Start the Ollama service in a separate terminal
ollama serve

# 3. Download the required AI models
ollama pull gemma:2b
ollama pull nomic-embed-text

Step 2: Clone Repository & Install Dependencies

Tip

Using a Python virtual environment is highly recommended to avoid conflicts with other projects.

# Clone the project repository
git clone https://github.com/MarvelBoy047/Gemma_Ai_DataAssistant.git
cd Gemma_Ai_DataAssistant

# Create and activate a virtual environment
python -m venv venv
# On Windows: venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate

# Install all required Python packages
pip install -r requirements.txt

Step 3: Launch the Application

# Start the Streamlit web application
streamlit run app.py

# Your browser will automatically open to http://localhost:8501

🏗️ Project Architecture & File Structure

The project's intelligence is driven by a decoupled, dual-agent architecture that communicates through the file system, ensuring robustness and scalability.

Gemma_Ai_DataAssistant/
│
├── 🚀 app.py                 # Core Streamlit UI & "Planner Agent"
│
├── 🤖 coding_agent.py         # Autonomous "Executor Agent" & Self-Healing Logic
│
├── 📚 knowledge_base.json    # Shared RAG database for planning and code correction
│
├── 📦 Requirements.txt         # All Python package dependencies
│
├── 🧩 Installation setup Instructions.md  # nothing extra straight forward steps
|
├── 💡 README.md #explaining why the project even exists
|
├── 🗂️chat_history/        # Communication Bus: Stores user chat logs and approved plans
│   └── <session_id>.json
│
├── 🗂️agent_memory/        # Agent's Logbook: Detailed logs of the Executor's decisions & actions
│   └── <session_id>_memory.json
│
├── 🗂️notebooks/           # Final Deliverables: Stores generated .ipynb files and datasets
│   └── 🗂️<session_id>/
│       ├── <dataset_name>.csv
│       └── <session_id>.ipynb
│
└── planner_kb_index/    # Cached FAISS index for the Planner Agent's knowledge base

Key Component Roles:

app.py (The Planner & UI):
- Manages the user-facing Streamlit application.
- Takes the user's natural language request and uses a FAISS-powered RAG search on the knowledge_base.json to find relevant tasks.
- Asks Gemma to create a high-level analysis plan, which is then presented to the user for approval.
coding_agent.py (The Autonomous Executor):
- Runs in a background thread, constantly monitoring the chat_history directory for new tasks approved by the user.
- This agent is the workhorse: it generates, executes, and debugs Python code step-by-step.
- Features a multi-layered, self-healing mechanism that uses the knowledge_base.json to fix its own errors.
knowledge_base.json (The Shared Brain):
- A repository of over 100 trusted data science code patterns and task descriptions.
- It serves both the Planner Agent (for creating plans) and the Executor Agent (for RAG-based error correction).

Tip

The Asynchronous Communication Bus The app.py front-end and the coding_agent.py back-end are fully decoupled. They communicate asynchronously by writing and reading JSON files in the chat_history directory. This robust, file-based messaging system is what allows the agent to work on complex, long-running tasks in the background without freezing the UI.

✨ Key Features & Capabilities

🎯 Intelligent Analysis Planning

Context-Aware Strategy: Analyzes your dataset structure and creates optimal workflows.
Goal-Oriented Planning: Focuses on delivering actionable insights, not just charts.

💻 Advanced Code Generation

Production-Quality Python: Generates clean, documented, professional code.
Library Intelligence: Automatically selects appropriate tools (pandas, scikit-learn, etc.).

🔧 Self-Healing Error Recovery

Automatic Bug Detection: Identifies and diagnoses code execution failures.
RAG-Powered Solutions: Searches 100+ proven fixes for similar problems.
Context-Aware Repair: Considers your specific data when fixing errors.

🔒 Privacy & Security by Design

Warning

This is the most important feature. Your data and analysis never leave your computer. There is zero data transmission to any cloud service, making it safe for sensitive information.

🏆 Competitive Advantages

vs. Cloud AI Services (ChatGPT, Claude, etc.)

Feature	Gemma Data Assistant	Cloud AI Services
Privacy	✅ 100% Local	❌ Data sent to servers
Cost	✅ Free after setup	❌ $20-100+/month
Offline Access	✅ Works anywhere	❌ Requires internet
Speed	✅ No network latency	❌ Network dependent

vs. Traditional Data Science Tools (Tableau, SPSS)

Aspect	Our Solution	Traditional Tools
Learning Curve	✅ Natural language	❌ Months of training
Setup Time	✅ Under 15 minutes	❌ Days/weeks
Error Handling	✅ Automatic fixing	❌ Manual debugging
Documentation	✅ Auto-generated	❌ Manual writing

📈 Roadmap & Future Vision

Phase 1: Foundation (Current)

✅ Core offline analysis engine with self-healing RAG.
✅ Professional notebook generation.

Phase 2: Enhanced Intelligence (Q4 2025)

🔄 Advanced statistical methods and ML model recommendations.
🔄 Multi-modal analysis (text, images, audio).

Phase 3: Domain Specialization (Q1 2026)

🔮 Healthcare-specific analysis templates (HIPAA-compliant).
🔮 Financial compliance and reporting tools.

Note

Our vision is to create a powerful, adaptable platform that can be specialized for any industry, putting expert-level analysis in the hands of domain professionals everywhere.

🤝 Contributing & Community

We believe in the power of open source to drive global change.

How to Contribute:

Fork the repository.
Create a new feature branch (git checkout -b feature/amazing-improvement).
Commit your changes and push to your branch.
Submit a pull request with a detailed description of your improvement.

Tip

The easiest way to contribute is by expanding the knowledge_base.json file. Add a new code pattern, a fix for a common error, or a new analysis technique. This directly improves the agent's intelligence!

📄 License & Legal

Open Source License

This project is licensed under the MIT License. See the LICENSE file for details.

Privacy & Data Protection

Important

We do not collect, store, or transmit any of your data. All processing, analysis, and storage happens exclusively on your local machine. You retain 100% ownership and control of your information at all times.

🙏 Acknowledgments

Google Gemma Team: For creating the revolutionary Gemma model family.
Ollama Community: For making local AI accessible to everyone.
The Python Open Source Community: For the incredible libraries that power this tool.

📞 Contact & Support

GitHub Issues: For bugs and feature requests, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
assets		assets
Installation setup Instructions.md		Installation setup Instructions.md
README.md		README.md
app.py		app.py
coding_agent.py		coding_agent.py
knowledge_base.json		knowledge_base.json
requirements.txt		requirements.txt

MarvelBoy047/Gemma_Ai_DataAssistant

Folders and files

Latest commit

History

Repository files navigation

🤖 Gemma Data Assistant: Democratizing AI-Powered Data Science

🌍 The Global Impact Challenge We're Solving

The Problem: Data Science Inequality

🎯 Real-World Impact Scenarios

🏥 Healthcare & Medical Research

🎓 Education & Academic Research

🌱 Environmental & Climate Research

💼 Small Business Empowerment

⚡ Why Gemma 3n Makes This Possible

Technical Breakthrough Features:

🚀 How It Works: The Dual-Agent Intelligence Pipeline

📋 Prerequisites & System Requirements

Hardware Requirements:

Software Dependencies:

🚀 Quick Start Guide

Step 1: Install Ollama & Download Models

Step 2: Clone Repository & Install Dependencies

Step 3: Launch the Application

🏗️ Project Architecture & File Structure

Key Component Roles:

✨ Key Features & Capabilities

🎯 Intelligent Analysis Planning

💻 Advanced Code Generation

🔧 Self-Healing Error Recovery

🔒 Privacy & Security by Design

🏆 Competitive Advantages

vs. Cloud AI Services (ChatGPT, Claude, etc.)

vs. Traditional Data Science Tools (Tableau, SPSS)

📈 Roadmap & Future Vision

Phase 1: Foundation (Current)

Phase 2: Enhanced Intelligence (Q4 2025)

Phase 3: Domain Specialization (Q1 2026)

🤝 Contributing & Community

How to Contribute:

📄 License & Legal

Open Source License

Privacy & Data Protection

🙏 Acknowledgments

📞 Contact & Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages