The best open-source data science agent I’ve tried so far: 𝗗𝗮𝘁𝗮 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 — it can build an entire notebook workflow from a single prompt. If you’ve worked in data science, you know how most AI coding tools fall short when it comes to Jupyter Notebooks. They don’t handle the notebook structure well — no context, no new cells, no real understanding of the data flow. Data Copilot changes that. It feels like Cursor, but built for data scientists. I just drop it into my Jupyter environment, and it picks up the context of my files and datasets automatically. *It's open source — install it in seconds: 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗺𝗶𝘁𝗼-𝗮𝗶 𝗺𝗶𝘁𝗼𝘀𝗵𝗲𝗲𝘁 Here’s what I’ve seen it do: 🔹 From a single prompt, build a full machine learning notebook, including data importing, data cleaning, model training and testing 🔹 Take a notebook and swap all of the Matplotlib code for Plotly code 🔹 Automatically catch errors and debug them We’ve seen great AI tools for software developers. Data Copilot is one of the first tools that is excellent for Data Science workflows. Key features: 🔹 An AI agent for full notebook creation and editing 🔹 An AI Chat for editing specific cells 🔹 Automatic error debugging from the AI 🔹 Visual edits for DataFrames and Charts 📍Docs here: https://lnkd.in/gSJEMshP #productivity #datascience #machinelearning #aitools #opensource
Open Source Tools for Developers
Explore top LinkedIn content from expert professionals.
-
-
🧠 12 open-source GenAI tools that actually deliver (and scale) Not every tool with a GitHub repo deserves your trust. These ones do. 👉 If you're building real GenAI systems—not just demos—save this list. I grouped them into Build, Orchestrate, and Monitor so you know when to use what. GenAI AgentOS: (NEW) 📎 Agent registry → memory handoff → orchestration layer → HITL toggle ✅ Focused on production reliability and audit trails ⭐ https://lnkd.in/gyzMnnjw 🔧 BUILD – For devs building GenAI-powered apps LangChain – The Swiss army knife for chains, RAG, agents, and tools. ⭐ 70k+ stars | https://lnkd.in/gun-rmdj LlamaIndex – Clean integration layer between LLMs and your data. Great for structured docs + flexible vector backends ⭐ 30k+ stars | https://lnkd.in/gW-iBKR2 Flowise – Drag-and-drop LLM orchestration (perfect for demos & MVPs) UI-first, deploy fast, iterate even faster ⭐ 19k+ stars | https://lnkd.in/gA8J3Tr5 Embedchain – Minimalist RAG framework that just works Perfect if you’re tired of config overkill ⭐ 8.5k+ stars | https://lnkd.in/g8DnHQg2 RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. 🔁 ORCHESTRATE – For managing agents, workflows & system logic LangGraph – Declarative, stateful agent workflows built on top of LangChain Role-based agents + memory + edge control ⭐ 2.5k+ stars | https://lnkd.in/gveKVfE4 Superagent – Plug-and-play LLM agent framework API + UI, works with OpenAI, Claude, Mistral ⭐ 5.5k+ stars | https://lnkd.in/gtsy5CQ3 CrewAI – Multi-agent task planning + collaboration Gives each agent purpose, tool access, and autonomy ⭐ 9k+ stars | https://lnkd.in/gUpwvbn9 📊 MONITOR – For logging, debugging, and scaling safely Langfuse – Logging, tracing, and evals for GenAI pipelines Inspect every token and decision ⭐ 4.5k+ stars | https://lnkd.in/g6BEnVyA Phoenix – Open-source observability for LLM workflows Error tracking, token usage, monitoring ⭐ 3k+ stars | https://lnkd.in/gT3ERHgm PromptLayer – Prompt logging + analytics Simple but powerful tracking for prompt performance ⭐ 4k+ stars | https://lnkd.in/gGSRRBrH Helicone – Open-source alternative to OpenAI’s usage dashboard Understand cost, latency, and user behavior ⭐ 6k+ stars | https://lnkd.in/gCgcy7Kd 🔍 Why these matter: Too many GenAI teams waste time gluing together 20 tools, only to discover they can’t scale. These 12 tools are: ✅ Well-maintained ✅ Actively used in production ✅ Community-supported ✅ Actually helpful when you go beyond a chatbot Don’t just play with LLMs. Build systems that can grow. 🔖 Save this. ♻️ Repost this.
-
The open-source AI agent ecosystem is exploding, but most market maps and guides cater to VCs rather than builders. As someone in the trenches of agent development, I've found this frustrating. That's why I've created a comprehensive list of the open-source tools I've personally found effective in production. The overview includes 38 packages across: -> Agent orchestration frameworks that go beyond basic LLM wrappers: CrewAI for role-playing agents, AutoGPT for autonomous workflows, Superagent for quick prototyping -> Tools for computer control and browser automation: Open Interpreter for local machine control, Self-Operating Computer for visual automation, LaVague for web agents -> Voice interaction capabilities beyond basic speech-to-text: Ultravox for real-time voice, Whisper for transcription, Vocode for voice-based agents -> Memory systems that enable truly personalized experiences: Mem0 for self-improving memory, Letta for long-term context, LangChain's memory components -> Testing and monitoring solutions for production-grade agents: AgentOps for benchmarking, openllmetry for observability, Voice Lab for evaluation With the holiday season here, it's the perfect time to start building. Post https://lnkd.in/gCySSuS3
-
The Future of AI is Open-Source! 10 years ago when I started in ML, building out end-to-end ML applications would take you months, to say the least, but in 2025, going from idea to MVP to production happens in weeks, if not days. One of the biggest changes I am observing is "free access to the best tech", which is making the ML application development faster. You don't need to be working in the best-tech company to have access to these, now it is available to everyone, thanks to the open-source community! I love this visual of the open-source AI stack by ByteByteGo. It lays out the tools/frameworks you can use (for free) and build these AI applications right on your laptop. If you are an AI engineer getting started, checkout the following tools: ↳ Frontend Technologies : Next.js, Vercel, Streamlit ↳ Embeddings and RAG Libraries : Nomic, Jina AI, Cognito, and LLMAware ↳ Backend and Model Access : FastAPI, LangChain, Netflix Metaflow, Ollama, Hugging Face ↳ Data and Retrieval : Postgres, Milvus, Weaviate, PGvector, FAISS ↳ Large Language Models: llama models, Qwen models, Gemma models, Phi models, DeepSeek models, Falcon models ↳ Vision Language Models: VisionLLM v2, Falcon 2 VLM, Qwen-VL Series, PaliGemma ↳ Speech-to-text & Text-to-speech models: OpenAI Whisper, Wav2Vec, DeepSpeech, Tacotron 2, Kokoro TTS, Spark-TTS, Fish Speech v1.5, StyleTTS (I added more models missing in the infographic) Plus, I would recommend checking out the following tools as well: ↳ Agent frameworks: CrewAI, AutoGen, SuperAGI, LangGraph ↳ Model Optimization & Deployment: vLLM, TensorRT, and LoRA methods for model fine-tuning PS: I had shared some ideas about portfolio projects you can build, in an earlier post, so if you are curious about that, check out my past post. Happy Learning 🚀 There is nothing stopping you to start building on your idea! ----------- If you found this useful, please do share it with your network ♻️ Follow me (Aishwarya Srinivasan) for more AI educational content and insights to help you stay up-to-date in the AI space :)
-
🦾 Great milestone for open-source robotics: pi0 & pi0.5 by Physical Intelligence are now on Hugging Face, fully ported to PyTorch in LeRobot and validated side-by-side with OpenPI for everyone to experiment with, fine-tune & deploy in their robots! π₀.₅ is a Vision-Language-Action model which represents a significant evolution from π₀ to address a big challenge in robotics: open-world generalization. While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training. Generalization must occur at multiple levels: - Physical Level: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments - Semantic Level: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills - Environmental Level: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals The breakthrough innovation in π₀.₅ is co-training on heterogeneous data sources. The model learns from: - Multimodal Web Data: Image captioning, visual question answering, object detection - Verbal Instructions: Humans coaching robots through complex tasks step-by-step - Subtask Commands: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed) - Cross-Embodiment Robot Data: Data from various robot platforms with different capabilities - Multi-Environment Data: Static robots deployed across many different homes - Mobile Manipulation Data: ~400 hours of mobile robot demonstrations This diverse training mixture creates a "curriculum" that enables generalization across physical, visual, and semantic levels simultaneously. Huge thanks to the Physical Intelligence team & contributors Model: https://lnkd.in/eAEr7Yk6 LeRobot: https://lnkd.in/ehzQ3Mqy
-
AI has transformed how we approach software engineering today. What has been missing from this technological shift is a comprehensive way to evaluate emerging AI coding assistants. My team has been working on a project that’s going to help give customers the data they need to choose the right AI agent for their business needs. This is one that I’m personally invested in: introducing SWE-PolyBench. 🚀 SWE-PolyBench is the first industry benchmark to evaluate AI coding agents' ability to navigate and understand complex codebases, introducing rich metrics to advance AI performance in real-world scenarios. These metrics, file retrieval and node retrieval, evaluate how well AI coding assistants can identify which files need changes and pinpoint specific functions or classes requiring modification. It’s designed to provide much deeper insights than just task completion. Beyond that, it is multilingual and supports Java, JavaScript, TypeScript, and Python with an extensive dataset and task diversity. What I’m really excited about is that we’ve made SWE-PolyBench open-source. Advancing AI-assisted software engineering is a collective effort, and SWE-PolyBench can serve as a foundation for future work. I invite you all to explore it, use it, and help shape its future. This new benchmark will bring us closer to understanding and improving how AI coding assistants perform with complexity. All the details about our launch are in the blog, check it out ➡️ https://lnkd.in/g5YkXUY2
-
Last week, a junior dev messaged me: “Everyone talks about Cursor and Copilot... But are there any less hyped AI tools that are actually useful?” Great question. Because the AI space is bigger than just the usual names. 👇 So I shared this curated list of 10 underrated AI tools for developers in 2025 — Not just trendy, but genuinely useful across coding, design, automation, and workflows. Here it is 🧵 1️⃣ Relume AI: (https://relume.io) -> Instantly build sitemaps and wireframes. Export to Figma, React, or HTML. 2️⃣ Amazon CodeWhisperer: (https://lnkd.in/dPpRFDvG) -> Context-aware code suggestions with a security-first mindset. 3️⃣ DeepCode: (https://www.deepcode.ai) -> AI code reviews that catch bugs and improve code quality. 4️⃣ Codeium: (https://codeium.com) -> Lightweight AI assistant with fast, real-time coding suggestions. 5️⃣ Visual Studio IntelliCode: (https://lnkd.in/dNd8EdHK) -> Smarter auto-complete and bug spotting inside VS. 6️⃣ GitHub AI Coding Agent: (https://lnkd.in/dQ5XYTY8) -> Agentic coding assistant beyond Copilot’s autocomplete. 7️⃣ Dataiku AI Studio: (https://www.dataiku.com) -> Build ML workflows and automate with agentic AI. 8️⃣ Google Conversational Agents Console: (https://lnkd.in/dAeZbB_j) -> Create AI-powered chat agents with Google Cloud. 9️⃣ ServiceNow AI Agent: (https://lnkd.in/dEhKeWg9) -> Automate workflows using natural language understanding. 🔟 Snowflake Snowpark AI: (https://lnkd.in/dd-ANy6b) -> AI + ML for your data engineering inside Snowflake. These aren’t the loudest tools on the internet— But they are helping devs build faster, cleaner, and smarter. 💬 Have you used any of these? Or do you have an underrated AI tool in your stack? Drop it in the comments — I’m always adding to my toolkit! P.S. Tag a developer who’s curious about AI but tired of hearing the same 3 tools everywhere. - Ghazi Khan | Follow for more #AItools #Developers #Productivity #CodeSmarter #BuildWithAI #UnderratedTools #DevLife2025
-
4 opensource LLM fine-tuning libraries you need to know about as an AI engineer. Fine-tuning used to require enterprise budgets and PhD-level expertise. Not anymore. Here are 4 libraries that makes LLM fine-tuning accessible for all: 1. Unsloth AI • 2x faster training with 80% VRAM reduction • Custom Triton kernels with manual backprop engine • Supports 4-bit to 16-bit quantization • Works on NVIDIA GPUs from V100 onwards (CUDA 7.0+) 2. HuggingFace TRL (Transformer Reinforcement Learning) • Specialized trainers: SFTTrainer, DPOTrainer, RewardTrainer • Full integration with Transformers and PEFT ecosystems • Scalable via Accelerate (single GPU to multi-node) • CLI interface for quick experimentation 3. Axolotl • Single YAML config for entire pipeline (preprocess → train → inference) • Advanced optimizations: Flash Attention, multipacking, sequence parallelism • Multi-GPU/multi-node support (FSDP, DeepSpeed, Ray) • Flexible data loading (local, HuggingFace, cloud storage) 4. LlamaFactory • Supports 100+ models including LLaMA, Mistral, Qwen, DeepSeek • Multiple training methods: full fine-tuning, LoRA, QLoRA (2/4/8-bit), DPO, PPO • Web UI and CLI interface - no coding required • Built-in experiment tracking (TensorBoard, Wandb, MLflow) These tools have removed the barriers. The question isn't whether you can fine-tune - it's which approach fits your use case. The best part? They're all 100% opensource. Which library are you planning to try first? Link to the GitHub Repos in the comments.
-
Kubernetes is Free. Docker is Free. PostgreSQL is Free. Redis is Free. RabbitMQ is Free. MongoDB is Free. Nginx is Free. Prometheus & Grafana is Free. GitHub is Free. You keep scrolling, thinking you need courses, bootcamps, or paid tools to start. But every tool is already available to you with world-class docs. – Set up an API with real authentication. – Connect services with RabbitMQ or Kafka. – Monitor your local cluster with Prometheus and Grafana. – Break stuff, debug it, deploy it again. Nobody cares if you learned it from a paid course or a free README. The only thing that matters is, can you build, break, and fix systems that actually run. Stop waiting for perfect timing. Pick a problem, use these free tools, and start building. The rest will follow.
-
7 Python tools I use on EVERY project (and you should too) 1️⃣ 𝐮𝐯 A no-brainer these days. It's a blazing fast Python package and project manager. I started using it instead of Poetry a few months ago and have never looked back. 2️⃣ 𝐫𝐮𝐟𝐟 The linter and formatter that does it all. 𝐫𝐮𝐟𝐟 is insanely fast and combines the functionality of tools like flake8 or isort into one, with over 800 rules you can use. It’s a must-have for keeping your code clean and consistent. 3️⃣ 𝐩𝐲𝐫𝐢𝐠𝐡𝐭 A static type checker that’s faster and more accurate than many alternatives. If you’re using type hints (and you should be!), 𝐩𝐲𝐫𝐢𝐠𝐡𝐭 will catch errors before they happen and make your codebase way more robust. 4️⃣ 𝐩𝐝𝐨𝐜 Documentation made simple. pdoc automatically generates beautiful, modern docs from your codebase. It’s perfect for keeping your projects well-documented without spending hours writing manual guides or configuring your styles. 5️⃣ 𝐜𝐨𝐦𝐦𝐢𝐭𝐢𝐳𝐞𝐧 Stop writing messy commit messages! 𝐜𝐨𝐦𝐦𝐢𝐭𝐢𝐳𝐞𝐧 enforces a consistent commit message format (like Conventional Commits) and makes it easy to generate Changelogs with automatic semantic versioning. 6️⃣ 𝐩𝐫𝐞-𝐜𝐨𝐦𝐦𝐢𝐭 A tool for managing 𝐩𝐫𝐞-𝐜𝐨𝐦𝐦𝐢𝐭 hooks. It runs checks (like linting, formatting, and type checking) before you commit code, ensuring that only acceptable code makes it into your repo. 7️⃣ 𝐉𝐮𝐬𝐭𝐟𝐢𝐥𝐞 A command runner that simplifies your workflow. Instead of memorizing long commands, 𝐉𝐮𝐬𝐭𝐟𝐢𝐥𝐞 lets you define simple, reusable commands for common tasks. Start integrating them one at a time. You don’t have to overhaul your workflow overnight - small, consistent improvements add up! 💪