-
Nexa AI Inc
- Bay Area, United States
Stars
Unified framework for building enterprise RAG pipelines with small, specialized models
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.
Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.
An MLIR-based toolchain for AMD AI Engine-enabled devices.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
lightweight, standalone C++ inference engine for Google's Gemma models.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++
Fully open reproduction of DeepSeek-R1
On-device AI across mobile, embedded and edge for PyTorch
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…
No-code CLI designed for accelerating ONNX workflows
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
A framework for few-shot evaluation of language models.
🚀 Generate GitHub profile README easily with the latest add-ons like visitors count, GitHub stats, etc using minimal UI.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Swift library to work with llama and other large language models.
llama and other large language models on iOS and MacOS offline using GGML library.


