Skip to content
View Nagharjun17's full-sized avatar

Block or report Nagharjun17

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. MCP-Ollama-Client MCP-Ollama-Client Public

    Lightweight MCP client that uses a local Ollama LLM to query multiple MCP servers defined in config.json

    Python 7 2

  2. CUDA-Custom-Kernels CUDA-Custom-Kernels Public

    Contains my CUDA kernels implementations and benchmarking like Tiled Matrix Multiplication for learning.

    Cuda

  3. ECE-GY-9143---High-Performance-Machine-Learning ECE-GY-9143---High-Performance-Machine-Learning Public

    Contains laboratory and project work for the course ECE-GY 9143 - High Performance Machine Learning

    Python 3 1

  4. Flash-Attention-Triton Flash-Attention-Triton Public

    This repository contains the codebase for the Flash Attention implementation on Triton.

    Python

  5. MLIR-to-PTX-CUDA MLIR-to-PTX-CUDA Public

    Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU

    C++

  6. Multimodal-Architecture-Optimisation-on-RTX3060-using-TVM Multimodal-Architecture-Optimisation-on-RTX3060-using-TVM Public

    This repository contains the codebase for optimizing a Vision to Text model on a target RTX3060 device using Apache TVM

    Python