This project demonstrates how to integrate LiteLLM with a FastAPI application to use the Gemma model from HuggingFace. It provides an API endpoint to send prompts to the model and receive responses.
- FastAPI for creating the API.
- LiteLLM for interacting with various LLMs.
- Gemma model running via Ollama as the language model. (Previously HuggingFace)
- Unit tests for all functionalities.
Note on Ollama Setup: This project now uses Ollama to serve the LLM locally. You need to have Ollama installed and running, and the desired model (e.g., qwen2.5:3b as configured in app/main.py) pulled. You can change the model in app/main.py and ensure you have pulled it via ollama pull <model_name>. Visit https://ollama.com for installation instructions.
.
├── .env.example # Example environment variables
├── .gitignore
├── app
│ ├── __init__.py
│ └── main.py
├── tests
│ ├── __init__.py
│ └── test_main.py
├── requirements.txt
└── README.md
This project uses a .env file to manage configurations, primarily the Ollama model name.
- Create a
.envfile: Copy the example file:cp .env.example .env
- Edit
.env: Open the.envfile and set your desiredOLLAMA_MODEL_NAME. Ensure the model you specify is available in your local Ollama instance.IfOLLAMA_MODEL_NAME="ollama/your-chosen-model:tag"
OLLAMA_MODEL_NAMEis not set in the.envfile or as an environment variable, the application will default toollama/qwen2.5:3b.
(Instructions to be added once the project is further developed)
(Instructions to be added once the project is further developed)
##Running Tests
(Instructions to be added once the project is further developed)