- 100% Offline - Works without internet after initial setup
- Free Forever - No subscriptions, no API keys, no hidden costs
- No Compilation - Powered by Mozilla llamafile (just download and run)
- Private & Secure - Your data never leaves your device
- Android Native - Optimized for mobile with proot isolation
- Multiple Models - Choose from tiny (270MB) to powerful (2GB+)
- Web Dashboard - Browser-based UI for easy management
- REST API - Full control via HTTP endpoints
- OpenAI Compatible - Drop-in replacement for OpenAI API
git clone https://github.com/mithun50/PocketAi.git
cd PocketAi
./setup.sh# Activate environment (or restart terminal)
source ~/.pocketai_env
# Install a model (Qwen3 recommended for 2025)
pai install qwen3
# Start chatting!
pai chat| Model | Size | RAM | Quality | Best For |
|---|---|---|---|---|
qwen3 |
400MB | 512MB | ⭐⭐⭐ | Best for low RAM |
llama3.2 |
700MB | 1GB | ⭐⭐⭐⭐ | Best balance |
llama3.2-3b |
2.0GB | 2GB | ⭐⭐⭐⭐⭐ | Best quality |
| Model | Size | RAM | Quality | Best For |
|---|---|---|---|---|
smollm2 |
270MB | 400MB | ⭐⭐ | Ultra-low RAM |
qwen2 |
400MB | 512MB | ⭐⭐⭐ | Low RAM |
qwen2-1b |
1.0GB | 1.2GB | ⭐⭐⭐⭐ | Daily use |
gemma2b |
1.4GB | 2GB | ⭐⭐⭐⭐ | Google quality |
qwen2-3b |
2.0GB | 3GB | ⭐⭐⭐⭐⭐ | Best quality |
phi2 |
1.6GB | 3GB | ⭐⭐⭐⭐ | Coding tasks |
pai chat # Interactive chat
pai ask "What is AI?" # Quick question
pai complete "Once..." # Text completionpai models # List available models
pai models installed # List installed models
pai install <model> # Download a model
pai use <model> # Switch active model
pai remove <model> # Delete a modelpai server start # Start API server (port 8080)
pai server stop # Stop the server
pai server status # Show server infoUse with any OpenAI-compatible client:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}]}'pai api start # Start REST API (port 8081)
pai api web # Start API + Web Dashboard
pai api stop # Stop API server
pai api status # Show API endpointsOpen http://localhost:8081/ in your browser for the web dashboard.
API Endpoints:
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health |
Health check |
| GET | /api/status |
System status |
| GET | /api/models |
Available models |
| GET | /api/models/installed |
Installed models |
| POST | /api/models/install |
Install model |
| POST | /api/models/use |
Switch model |
| POST | /api/chat |
Send message |
| GET | /api/config |
Get config |
| POST | /api/config |
Set config |
pai config # Show current config
pai config set key val # Change settings
pai config reset # Reset to defaults| Option | Default | Description |
|---|---|---|
threads |
4 | CPU threads to use |
ctx_size |
2048 | Context window size |
pai status # System information
pai doctor # Diagnose issues
pai help # Show all commands
pai version # Version infopocketai/
├── bin/
│ └── pai # CLI entry point
├── core/
│ └── engine.sh # Core engine (inference, models, API)
├── data/
│ ├── config # User configuration
│ ├── llamafile # LLM runtime engine
│ └── api_server.py # REST API server
├── models/ # Downloaded GGUF models
├── web/
│ └── index.html # Web dashboard
├── docs/
│ ├── COMMANDS.md # Command reference
│ ├── MODELS.md # Model guide
│ └── TROUBLESHOOTING.md # Problem solving
├── setup.sh # Installer
└── README.md
┌─────────────────────────────────────────────────────────────┐
│ PocketAI │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────────────────┐ │
│ │ CLI │───►│ Engine │───►│ proot container │ │
│ │ (pai) │ │ │ │ (Alpine Linux) │ │
│ └─────────┘ └──────────┘ └──────────┬──────────┘ │
│ │ │
│ ┌─────────┐ ┌──────────┐ ▼ │
│ │ Web │───►│ REST API │ ┌──────────┐ │
│ │Dashboard│ │ (Python) │ │llamafile │ │
│ └─────────┘ └──────────┘ └────┬─────┘ │
│ │ │
│ ┌─────────┐ ▼ │
│ │ OpenAI │◄──────────────────── GGUF Model │
│ │ Clients │ │
│ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Components:
- pai CLI - User-friendly bash interface
- engine.sh - Core logic (model management, inference, API)
- api_server.py - REST API + Web dashboard server
- llamafile - Mozilla's portable LLM runtime
- proot - Lightweight Linux container for isolation
- GGUF models - Quantized models optimized for mobile
- Device: Android phone/tablet
- App: Termux from F-Droid
- Storage: 1GB+ free (varies by model)
- RAM: 512MB+ (more = better models)
pai doctor # Diagnose all issues| Issue | Solution |
|---|---|
pai: command not found |
Run source ~/.pocketai_env |
No model active |
Run pai install qwen3 |
| Slow responses | Use smaller model: pai use smollm2 |
| Out of memory | Close apps, use smaller model |
| API offline | Run pai api web not pai api start |
See TROUBLESHOOTING.md for more.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Mozilla llamafile - Portable LLM runtime
- Termux - Android terminal emulator
- proot-distro - Linux containers for Termux
- Model providers: Qwen, Meta (Llama), HuggingFace, Google, Microsoft
- Author: Mithun
- GitHub: @mithun50
- Issues: GitHub Issues
Star this repo if you find it useful!
Made with love for the Android AI community