-
Nexa AI Inc
- Bay Area, United States
Stars
Unified framework for building enterprise RAG pipelines with small, specialized models
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.
Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.
An MLIR-based toolchain for AMD AI Engine-enabled devices.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
lightweight, standalone C++ inference engine for Google's Gemma models.
chraac / llama.cpp
Forked from ggml-org/llama.cppLLM inference in C/C++
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++
Fully open reproduction of DeepSeek-R1
On-device AI across mobile, embedded and edge for PyTorch
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…
No-code CLI designed for accelerating ONNX workflows
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
A framework for few-shot evaluation of language models.
🚀 Generate GitHub profile README easily with the latest add-ons like visitors count, GitHub stats, etc using minimal UI.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Swift library to work with llama and other large language models.


