Skip to content

A private, offline AI agent powered by Google's Gemma and Ollama that autonomously generates complete data science Jupyter notebooks from natural language prompts.

Notifications You must be signed in to change notification settings

MarvelBoy047/Gemma_Ai_DataAssistant

Repository files navigation

๐Ÿค– Gemma Data Assistant: Democratizing AI-Powered Data Science

Your Personal, Private, and Powerful AI Data Scientist - Built for Impact

Gemma 3n Offline First Open Source

"What if every person, regardless of their technical background, could have access to a world-class data scientist that works entirely on their device, never sharing their private information, and costs nothing to use?"

Gemma Data Assistant transforms this vision into reality using Google's groundbreaking Gemma 3n model, creating the world's first truly autonomous, offline-first data science assistant that democratizes access to advanced analytics while preserving absolute privacy.


๐ŸŒ The Global Impact Challenge We're Solving

The Problem: Data Science Inequality

  • 98% of small businesses lack access to professional data analysis capabilities.
  • Medical researchers in developing countries can't afford expensive cloud AI services.
  • Students and educators are blocked by subscription fees and internet requirements.
  • Privacy-sensitive organizations (healthcare, finance, government) can't use cloud-based AI tools.
  • Rural areas with limited internet are excluded from the AI revolution.

Note

Our mission is to eliminate these barriers by providing a tool that is zero-cost, fully offline, and guarantees bank-level privacy by design.


๐ŸŽฏ Real-World Impact Scenarios

๐Ÿฅ Healthcare & Medical Research

"A rural clinic in Kenya uses Gemma Data Assistant to analyze patient outcomes and optimize treatment protocols without sending sensitive health data to external servers."

Impact Features:

  • Analyze patient data while maintaining HIPAA compliance.
  • Identify treatment effectiveness patterns.
  • Optimize resource allocation for limited-budget healthcare.

๐ŸŽ“ Education & Academic Research

"A university in Bangladesh empowers 500+ students to perform advanced statistical analysis without expensive software licenses or cloud subscriptions."

Educational Benefits:

  • Free access to professional data analysis tools.
  • Learn by doing with real datasets and Jupyter notebook outputs.
  • Offline accessibility for remote and underserved learning environments.

๐ŸŒฑ Environmental & Climate Research

"Environmental scientists in the Amazon rainforest analyze deforestation patterns and climate data completely offline, ensuring sensitive location data remains secure."

Environmental Applications:

  • Climate change impact analysis.
  • Biodiversity research and conservation.
  • Environmental monitoring and reporting.

๐Ÿ’ผ Small Business Empowerment

"A family-owned restaurant chain uses the assistant to analyze sales patterns, optimize inventory, and improve customer satisfaction without hiring expensive consultants."

Business Intelligence:

  • Customer behavior analysis and sales forecasting.
  • Inventory management and marketing effectiveness.

โšก Why Gemma 3n Makes This Possible

Technical Breakthrough Features:

  • ๐Ÿง  Mobile-First Architecture: Designed specifically for edge devices.
  • ๐Ÿš€ Efficiency Revolution: Runs on devices with just 8GB of RAM.
  • ๐Ÿ”’ Privacy by Design: No data is ever transmitted externally.
  • ๐ŸŒ Universal Access: Works without internet connectivity.
  • ๐Ÿ’ช Professional Capabilities: Matches expensive cloud AI services.

Tip

Our implementation takes full advantage of Gemma 3n's efficiency. The core of our system is a sophisticated dual-agent pipeline that plans and executes tasks autonomously.


๐Ÿš€ How It Works: The Dual-Agent Intelligence Pipeline

Our system uses two coordinated AI agents that work together in a three-stage process to deliver a seamless experience, from user request to final report.

  • Stage 1: User Interaction & Planning (app.py)

    • โžก๏ธ ๐Ÿ—‚๏ธ User Input: A user uploads a dataset and provides a natural language prompt.
    • โžก๏ธ ๐Ÿง  Planner Agent Activated: The system analyzes the request.
    • โžก๏ธ ๐Ÿ“š RAG Search: The Planner consults the knowledge_base.json to find relevant analysis steps.
    • โžก๏ธ ๐Ÿ“ Plan Proposal: The agent generates a high-level analysis plan for the user to review.
    • โžก๏ธ โœ… User Approval: The user approves the plan, which triggers the next stage.
  • Stage 2: Autonomous Execution & Self-Healing (coding_agent.py)

    • โžก๏ธ ๐Ÿ‘จโ€๐Ÿ’ป Executor Agent Activated: A background agent takes over and begins executing the approved plan task-by-task in a loop.
    • โžก๏ธ Inside the Execution Loop:
      • 1. Code Generation: The agent writes Python code for the current task.
      • 2. Notebook Execution: The code is run in a Jupyter Notebook cell.
      • 3. Error Check:
        • If Error ๐Ÿ’ฅ:
          • โžก๏ธ Self-Healing: The agent uses its RAG system to search the knowledge_base.json for a solution, rewrites the code, and returns to Step 1 to try again.
        • If Success โœ…:
          • โžก๏ธ Append Results: The output is saved, and the agent moves to the next task in the plan.
    • โžก๏ธ The loop continues until all tasks in the plan are completed.
  • Stage 3: Final Report

    • โžก๏ธ ๐Ÿ“‹ Analysis Complete: Once the loop finishes, the Executor Agent generates a final, user-friendly summary of the findings.
    • โžก๏ธ ๐ŸŽ Deliverables: The user is presented with the summary and the complete, documented Jupyter Notebook.

๐Ÿ“‹ Prerequisites & System Requirements

Important

Your system must be 64-bit. An active internet connection is required only for the initial download of Ollama and the AI models. The application is 100% offline afterward.

Hardware Requirements:

Minimum Configuration (CPU Mode)

  • RAM: 8GB
  • Storage: 10GB free space
  • CPU: Any modern 64-bit processor

Recommended Configuration (GPU Acceleration)

  • GPU: NVIDIA GPU with 4GB+ VRAM (CUDA support)
  • RAM: 16GB
  • Storage: 10GB free space

Software Dependencies:

  • Python 3.9 - 3.11
  • Ollama
  • Streamlit
  • Jupyter, FAISS, Pandas, Scikit-learn, and other libraries listed in requirements.txt.

๐Ÿš€ Quick Start Guide

Note

This is a condensed guide. For a comprehensive, step-by-step walkthrough, please see the Installation & Setup Instructions file.

Step 1: Install Ollama & Download Models

Important

The Ollama server must be running in the background for the assistant to function. The models are a one-time download.

# 1. Download and install Ollama from https://ollama.com

# 2. Start the Ollama service in a separate terminal
ollama serve

# 3. Download the required AI models
ollama pull gemma:2b
ollama pull nomic-embed-text

Step 2: Clone Repository & Install Dependencies

Tip

Using a Python virtual environment is highly recommended to avoid conflicts with other projects.

# Clone the project repository
git clone https://github.com/MarvelBoy047/Gemma_Ai_DataAssistant.git
cd Gemma_Ai_DataAssistant

# Create and activate a virtual environment
python -m venv venv
# On Windows: venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate

# Install all required Python packages
pip install -r requirements.txt

Step 3: Launch the Application

# Start the Streamlit web application
streamlit run app.py

# Your browser will automatically open to http://localhost:8501

๐Ÿ—๏ธ Project Architecture & File Structure

The project's intelligence is driven by a decoupled, dual-agent architecture that communicates through the file system, ensuring robustness and scalability.

Gemma_Ai_DataAssistant/
โ”‚
โ”œโ”€โ”€ ๐Ÿš€ app.py                 # Core Streamlit UI & "Planner Agent"
โ”‚
โ”œโ”€โ”€ ๐Ÿค– coding_agent.py         # Autonomous "Executor Agent" & Self-Healing Logic
โ”‚
โ”œโ”€โ”€ ๐Ÿ“š knowledge_base.json    # Shared RAG database for planning and code correction
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ฆ Requirements.txt         # All Python package dependencies
โ”‚
โ”œโ”€โ”€ ๐Ÿงฉ Installation setup Instructions.md  # nothing extra straight forward steps
|
โ”œโ”€โ”€ ๐Ÿ’ก README.md #explaining why the project even exists
|
โ”œโ”€โ”€ ๐Ÿ—‚๏ธchat_history/        # Communication Bus: Stores user chat logs and approved plans
โ”‚   โ””โ”€โ”€ <session_id>.json
โ”‚
โ”œโ”€โ”€ ๐Ÿ—‚๏ธagent_memory/        # Agent's Logbook: Detailed logs of the Executor's decisions & actions
โ”‚   โ””โ”€โ”€ <session_id>_memory.json
โ”‚
โ”œโ”€โ”€ ๐Ÿ—‚๏ธnotebooks/           # Final Deliverables: Stores generated .ipynb files and datasets
โ”‚   โ””โ”€โ”€ ๐Ÿ—‚๏ธ<session_id>/
โ”‚       โ”œโ”€โ”€ <dataset_name>.csv
โ”‚       โ””โ”€โ”€ <session_id>.ipynb
โ”‚
โ””โ”€โ”€ planner_kb_index/    # Cached FAISS index for the Planner Agent's knowledge base

Key Component Roles:

  • app.py (The Planner & UI):

    • Manages the user-facing Streamlit application.
    • Takes the user's natural language request and uses a FAISS-powered RAG search on the knowledge_base.json to find relevant tasks.
    • Asks Gemma to create a high-level analysis plan, which is then presented to the user for approval.
  • coding_agent.py (The Autonomous Executor):

    • Runs in a background thread, constantly monitoring the chat_history directory for new tasks approved by the user.
    • This agent is the workhorse: it generates, executes, and debugs Python code step-by-step.
    • Features a multi-layered, self-healing mechanism that uses the knowledge_base.json to fix its own errors.
  • knowledge_base.json (The Shared Brain):

    • A repository of over 100 trusted data science code patterns and task descriptions.
    • It serves both the Planner Agent (for creating plans) and the Executor Agent (for RAG-based error correction).

Tip

The Asynchronous Communication Bus The app.py front-end and the coding_agent.py back-end are fully decoupled. They communicate asynchronously by writing and reading JSON files in the chat_history directory. This robust, file-based messaging system is what allows the agent to work on complex, long-running tasks in the background without freezing the UI.


โœจ Key Features & Capabilities

๐ŸŽฏ Intelligent Analysis Planning

  • Context-Aware Strategy: Analyzes your dataset structure and creates optimal workflows.
  • Goal-Oriented Planning: Focuses on delivering actionable insights, not just charts.

๐Ÿ’ป Advanced Code Generation

  • Production-Quality Python: Generates clean, documented, professional code.
  • Library Intelligence: Automatically selects appropriate tools (pandas, scikit-learn, etc.).

๐Ÿ”ง Self-Healing Error Recovery

  • Automatic Bug Detection: Identifies and diagnoses code execution failures.
  • RAG-Powered Solutions: Searches 100+ proven fixes for similar problems.
  • Context-Aware Repair: Considers your specific data when fixing errors.

๐Ÿ”’ Privacy & Security by Design

Warning

This is the most important feature. Your data and analysis never leave your computer. There is zero data transmission to any cloud service, making it safe for sensitive information.


๐Ÿ† Competitive Advantages

vs. Cloud AI Services (ChatGPT, Claude, etc.)

Feature Gemma Data Assistant Cloud AI Services
Privacy โœ… 100% Local โŒ Data sent to servers
Cost โœ… Free after setup โŒ $20-100+/month
Offline Access โœ… Works anywhere โŒ Requires internet
Speed โœ… No network latency โŒ Network dependent

vs. Traditional Data Science Tools (Tableau, SPSS)

Aspect Our Solution Traditional Tools
Learning Curve โœ… Natural language โŒ Months of training
Setup Time โœ… Under 15 minutes โŒ Days/weeks
Error Handling โœ… Automatic fixing โŒ Manual debugging
Documentation โœ… Auto-generated โŒ Manual writing

๐Ÿ“ˆ Roadmap & Future Vision

Phase 1: Foundation (Current)

  • โœ… Core offline analysis engine with self-healing RAG.
  • โœ… Professional notebook generation.

Phase 2: Enhanced Intelligence (Q4 2025)

  • ๐Ÿ”„ Advanced statistical methods and ML model recommendations.
  • ๐Ÿ”„ Multi-modal analysis (text, images, audio).

Phase 3: Domain Specialization (Q1 2026)

  • ๐Ÿ”ฎ Healthcare-specific analysis templates (HIPAA-compliant).
  • ๐Ÿ”ฎ Financial compliance and reporting tools.

Note

Our vision is to create a powerful, adaptable platform that can be specialized for any industry, putting expert-level analysis in the hands of domain professionals everywhere.


๐Ÿค Contributing & Community

We believe in the power of open source to drive global change.

How to Contribute:

  1. Fork the repository.
  2. Create a new feature branch (git checkout -b feature/amazing-improvement).
  3. Commit your changes and push to your branch.
  4. Submit a pull request with a detailed description of your improvement.

Tip

The easiest way to contribute is by expanding the knowledge_base.json file. Add a new code pattern, a fix for a common error, or a new analysis technique. This directly improves the agent's intelligence!


๐Ÿ“„ License & Legal

Open Source License

This project is licensed under the MIT License. See the LICENSE file for details.

Privacy & Data Protection

Important

We do not collect, store, or transmit any of your data. All processing, analysis, and storage happens exclusively on your local machine. You retain 100% ownership and control of your information at all times.


๐Ÿ™ Acknowledgments

  • Google Gemma Team: For creating the revolutionary Gemma model family.
  • Ollama Community: For making local AI accessible to everyone.
  • The Python Open Source Community: For the incredible libraries that power this tool.

๐Ÿ“ž Contact & Support

  • GitHub Issues: For bugs and feature requests, please open an issue.

About

A private, offline AI agent powered by Google's Gemma and Ollama that autonomously generates complete data science Jupyter notebooks from natural language prompts.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages