Most LLM agent demos are impressive only because they rely on cloud infrastructure, paid APIs, and a lot of hand-holding. I wanted to see how far a fully local system could go. So I built an AI agent that decides for itself whether to answer from memory or retrieve live data. It runs locally with real tool-calling, uses llama3-groq-tool-use-8b through Ollama, and avoids hardcoded routing logic. When the user asks a product question, it fetches live Amazon data and synthesises a response from the retrieved fields. When the question is general knowledge, it answers directly from memory. The result is a simple but realistic pattern for production agent systems: retrieval, validation, and generation working together under model control. What I built: → Adaptive tool-calling architecture → Live product data retrieval → Validation safeguards → Test coverage for edge cases → $0 cloud cost GitHub: https://lnkd.in/ecfFwJdW #LLM #AIAgents #GenerativeAI #Python #RAG #OpenToWork
Building a Self-Contained LLM Agent with Local Infrastructure
More Relevant Posts
-
Azure skills for your coding agents. There are 19+ skills covering compute, observability, AI, compliance, storage, migration, RBAC, messaging, and more. https://lnkd.in/eieGYzPx
To view or add a comment, sign in
-
Day 111/255: Amazon CodeGuru – AI pairs with your reviewers and profilers Engineering teams want cleaner code and faster apps, but senior reviewers are busy and manual performance profiling is often postponed until users complain. Subtle concurrency bugs, inefficient loops, or hot code paths that waste CPU can live in production for months. Amazon CodeGuru solves this with two components: Reviewer continuously analyzes your code and pull requests to flag bugs, security issues, and bad patterns, while Profiler watches running applications to identify the most expensive methods and lines, showing how to reduce CPU, latency, and cost. Real case: Cutting compute costs in a Java microservice A payments team had a Java-based service that scaled fine but spent more on compute than expected. Latency SLOs were met, so performance work kept getting deprioritized. Challenge: ✓ High EC2/Fargate cost for a simple API ✓ No time for deep profiling ✓ PR reviews ignored performance Without Amazon CodeGuru: ✓ Ad-hoc laptop profiling ✓ No visibility into CPU-heavy methods ✓ Performance regressions slipped in With CodeGuru: ✓ Continuous profiling in production ✓ Flame graphs exposed heavy JSON/logging paths ✓ PR suggestions for better structures and concurrency Result: ✓ Lower CPU usage ✓ Smaller instances and autoscaling ✓ Reduced compute cost with better latency headroom About AWS Explore AWS Explore is a 255-day series covering one AWS service per day with simple carousels and real examples. Tomorrow: Day 112 → Apache MXNet on AWS (open-source deep learning framework that runs efficiently on AWS—train and deploy models on EC2, ECS, and SageMaker with good multi-GPU and distributed training support, plus tight integration with S3 and other AWS services). Using Amazon CodeGuru? Do you get more value from Reviewer comments on PRs or from Profiler’s CPU hotspot insights? Thanks, Jay Tillu www.jaytillu.com
To view or add a comment, sign in
-
𝗔𝗻 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁 𝗱𝗲𝗹𝗲𝘁𝗲𝗱 𝗮 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲. Not staging. Not test. Production !! I came across a fascinating post by Alexey Grigorev describing how an AI coding agent accidentally wiped an entire production infrastructure while working with Terraform. The agent didn’t have the correct Terraform state. So it assumed nothing existed. When asked to clean up duplicates, it effectively ran: terraform destroy Database. VPC. Load balancers. Gone !!! The real lesson: 𝗔𝗜 𝗱𝗶𝗱𝗻’𝘁 𝗳𝗮𝗶𝗹. 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀 𝗱𝗶𝗱. As AI agents start operating infrastructure, the rule should probably be simple: AI proposes. Humans approve. Would you let an AI agent run Terraform on production? #AI #DevOps #Terraform #CloudEngineering #AIAgents #LessonsLearned Full story: https://lnkd.in/e9XCN3-i
To view or add a comment, sign in
-
Holy Molly! If this is accurate, the root cause isn't AI — it's missing infrastructure guardrails. RemovalPolicy.RETAIN should be the default for any stateful resource (RDS, DynamoDB, S3, etc.) in CDK. Letting a stack destroy production data on cdk destroy is a configuration failure, not an AI failure. Blame the deployment process, not the tool that executed it. Beyond retention policies, there are layers of best practices you can apply to avoid this kind of disaster entirely https://lnkd.in/gvTMg9ca
To view or add a comment, sign in
-
Following up on the Google Cloud AI demo I posted about last month! ☁️🤖 I promised to share the code, and it's finally live. If you're tired of LLMs just giving you terminal commands to copy-paste, I wrote a breakdown on how to build an agent that actually runs them for you via the gcloud CLI. Full Python code and architecture details are in the article. Let me know what you build with it!
To view or add a comment, sign in
-
Your app has blind spots. It just hasn't told you yet. The new AWS Observability Kiro Power includes an automated gap analysis feature that scans your code and finds what's missing from your observability stack — unlogged errors, missing correlation IDs, absent distributed tracing... you know, the stuff you only discover at 2am during an incident. Think of it as a code review, but instead of your grumpy senior engineer asking "why isn't this logged?" — an AI agent asks it. Politely. With recommendations. Here's what it looks for: - Unlogged errors (the ones silently ruining your day) - Missing correlation IDs (because "it works on my machine" isn't a trace) - Absent distributed tracing (your microservices are talking — you just can't hear them) The gap analysis doesn't wait for an outage to tell you your observability is incomplete. That's the whole point. Proactive beats reactive. Every single time. One-click install in Kiro IDE. Your future on-call self will thank you. https://lnkd.in/eVJfKZwk #AWSObservability #CloudWatch #Kiro #DevOps #CloudOperations #AIOps #AWS
To view or add a comment, sign in
-
𝗧𝗛𝗘 𝗙𝗨𝗧𝗨𝗥𝗘 𝗢𝗙 𝗖𝗟𝗢𝗨𝗗 𝗜𝗦 𝗔𝗟𝗥𝗘𝗔𝗗𝗬 𝗢𝗡 𝗬𝗢𝗨𝗥 𝗟𝗔𝗣𝗧𝗢𝗣. Imagine testing 147 AWS services without a credit card or internet. That’s Robotocore – an MIT‑licensed AWS twin that runs locally, zero telemetry, free forever. It listens on localhost:4566, parses the 12‑digit key, and gives you real‑world behavior for S3, Lambda, DynamoDB, and 144+ more. Spin it up in ONE Docker command, then plug it into your CI pipelines – no flaky network, no cloud bills. Perfect for AI agents, dev‑ops testing, or teaching students a safe sandbox. Multi‑account, multi‑region isolation means each team gets its own in‑memory state – just like real AWS. Over 5,500 unit tests and 11,000 compatibility checks guarantee reliability. What would you build first with a local AWS twin? Drop your ideas below. Read the full article to get the step‑by‑step guide. #CloudDevelopment #OpenSource #Robotocore #DevOps #AI
Introducing Robotocore: Open‑Source Local AWS Twin Revolutionizes Cloud Development - UBOS ubos.tech To view or add a comment, sign in
-
A few months ago I asked a simple question: why are we paying OpenAI for every token when we could run capable open-weight models ourselves? Turns out the hard part isn't running the model. It's everything around it — API key management, usage tracking, multi-team access control, and making sure your existing apps don't need to be rewritten. So I built a stack that solves all of that. Ollama runs the models locally. LiteLLM wraps it in a fully OpenAI-compatible API gateway. Postgres handles keys, teams, and spend limits. Docker Compose ties it all together. Your developers get an API key and an endpoint. They never know — or care — what's running behind it. I documented the whole thing: architecture, config files, and the registration script that makes it feel like a real SaaS. Link in the comments if you're building in this space 👇 #LLM #AIInfrastructure #DevOps #SelfHosting #Docker #OpenSource
To view or add a comment, sign in
-
AI coding assistants are great at writing code. But cloud engineering is rarely about just writing code. The real work happens after that: - provisioning infrastructure - validating configurations - handling security & governance - deploying and operating workloads reliably This is why the new Azure Skills Plugin announcement caught my attention. Instead of AI assistants just suggesting commands or documentation, Azure is packaging real cloud expertise into reusable “skills” These skills allow agents to interact with Azure services through MCP servers and follow structured workflows like: prepare → validate → deploy. In other words, AI assistants are slowly evolving from code generators to cloud workflow collaborators. If this direction continues, we may soon see a shift where engineers work with agents that understand platform operations rather than manually stitching together CLI commands and docs. The next phase of DevOps might not just be automation. It might be agent-assisted cloud operations. Curious how others see this evolving in real-world engineering teams. #AI #MicrosoftAzure #MicrosoftFoundry #AIops #AIevolution #AzureOpenAI #AzureSkillsPlugin
To view or add a comment, sign in
-
“Let’s give the AI agent access to production infrastructure… it will save us time.” AI: Runs one command 💥 Production database deleted. 💥 Infrastructure gone. 💥 Engineer questioning life choices. AI: Task completed successfully. ✅ DevOps Lesson of the day: Automation is great. AI is powerful. But production still needs human brains in the approval loop. Because one wrong command can delete months of work in seconds. https://lnkd.in/dn62u8aB #DevOps #AI #Terraform #AWS #CloudComputing #EngineeringLessons #Automation
To view or add a comment, sign in
Explore related topics
- Building Reliable LLM Agents for Knowledge Synthesis
- Using Local LLMs to Improve Generative AI Models
- How to Improve Agent Performance With Llms
- How to Build Reliable LLM Systems for Production
- How Llms Process Language
- Building AI Applications with Open Source LLM Models
- Developing a Local LLM Pipeline Using Unlabeled Data
- Best Practices for Secure AI Sampling in LLM Agents
- Building LLM-Agnostic Adapters for AI Model Integration
- LLaMA 4 Context Length and MoE Features