Skip to content
View Davidqian123's full-sized avatar
:octocat:
Exploring AI
:octocat:
Exploring AI
  • Nexa AI Inc
  • Bay Area, United States

Block or report Davidqian123

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Unified framework for building enterprise RAG pipelines with small, specialized models

Python 14,457 2,973 Updated Jul 24, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,026 226 Updated Nov 23, 2025

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

C++ 479 22 Updated Dec 1, 2025

Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.

Go 6,078 791 Updated Dec 1, 2025

An MLIR-based toolchain for AMD AI Engine-enabled devices.

MLIR 531 161 Updated Nov 26, 2025

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

Python 73,123 8,727 Updated Nov 30, 2025

MLX: An array framework for Apple silicon

C++ 22,940 1,410 Updated Nov 27, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,203 3,213 Updated Nov 30, 2025

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,628 576 Updated Nov 28, 2025

LLM inference in C/C++

C++ 47 5 Updated Nov 30, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,828 4,028 Updated Dec 1, 2025

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

JavaScript 51,699 5,522 Updated Nov 27, 2025

Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++

C++ 4,632 448 Updated Nov 30, 2025

Fully open reproduction of DeepSeek-R1

Python 25,702 2,402 Updated Nov 24, 2025

Tensor library for machine learning

C++ 13,642 1,412 Updated Nov 24, 2025

LLM training in simple, raw C/CUDA

Cuda 28,294 3,305 Updated Jun 26, 2025

Vulkan中文教程 | 自撰 | 面向对象

C++ 191 17 Updated Oct 5, 2025

On-device AI across mobile, embedded and edge for PyTorch

Python 3,601 741 Updated Nov 28, 2025

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…

C++ 2,392 180 Updated Nov 9, 2025

No-code CLI designed for accelerating ONNX workflows

Python 217 30 Updated Jun 11, 2025

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 13,606 2,116 Updated Nov 20, 2025

A framework for few-shot evaluation of language models.

Python 10,789 2,881 Updated Nov 27, 2025

Interface for OuteTTS models.

Python 1,412 114 Updated Jun 21, 2025

🚀 Generate GitHub profile README easily with the latest add-ons like visitors count, GitHub stats, etc using minimal UI.

TypeScript 23,797 8,158 Updated Oct 28, 2025

Python bindings for llama.cpp

Python 9,790 1,256 Updated Aug 15, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 63,326 7,654 Updated Nov 30, 2025

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

C++ 1,555 122 Updated Mar 23, 2025

Swift library to work with llama and other large language models.

C++ 275 55 Updated Sep 19, 2025

⚠️DirectML is in maintenance mode ⚠️ DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tas…

C++ 2,534 327 Updated Sep 23, 2025
Next