Choosing the Right AWS Service for Real-Time Scale

This title was summarized by AI from the post below.

7mo

🚀 𝗙𝗮𝗿𝗴𝗮𝘁𝗲 𝘃𝘀. 𝗟𝗮𝗺𝗯𝗱𝗮: 𝗖𝗵𝗼𝗼𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗶𝗴𝗵𝘁 𝗧𝗼𝗼𝗹 𝗳𝗼𝗿 𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗦𝗰𝗮𝗹𝗲 Following up on the Quiz Showdown architecture challenge, the choice of compute - 𝗔𝗪𝗦 𝗙𝗮𝗿𝗴𝗮𝘁𝗲 vs 𝗔𝗪𝗦 𝗟𝗮𝗺𝗯𝗱𝗮 was critical. We didn't just pick one; we chose the right one for each service's job. It all comes down to 𝗦𝘁𝗮𝘁𝗲𝗳𝘂𝗹𝗻𝗲𝘀𝘀 𝘃𝘀. 𝗘𝘃𝗲𝗻𝘁𝗳𝘂𝗹𝗻𝗲𝘀𝘀. 𝟭. 𝗔𝗪𝗦 𝗙𝗮𝗿𝗴𝗮𝘁𝗲: 𝗧𝗵𝗲 𝗟𝗼𝗻𝗴-𝗗𝗶𝘀𝘁𝗮𝗻𝗰𝗲 𝗥𝘂𝗻𝗻𝗲𝗿𝘀 🏃♂️ We chose 𝗙𝗮𝗿𝗴𝗮𝘁𝗲 (our serverless container engine) for services that need to run continuously and 𝗺𝗮𝗶𝗻𝘁𝗮𝗶𝗻 𝘀𝘁𝗮𝘁𝗲 in memory for a period of time. • 𝗤𝘂𝗶𝘇 𝗦𝗲𝗿𝘃𝗶𝗰𝗲: This service is 𝗦𝘁𝗮𝘁𝗲𝗳𝘂𝗹. It must hold the live 𝗶𝗻-𝗺𝗲𝗺𝗼𝗿𝘆 𝗴𝗮𝗺𝗲 𝘀𝘁𝗮𝘁𝗲 (scores, connection IDs) for the entire 60-second quiz. Fargate provides the necessary 𝗣𝗲𝗿𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝗲, guaranteeing the runtime needed to manage the entire session. If we used Lambda here, the game would crash between requests! • 𝗠𝗮𝘁𝗰𝗵𝗺𝗮𝗸𝗶𝗻𝗴 𝗦𝗲𝗿𝘃𝗶𝗰𝗲: This requires a dedicated, 𝗮𝗹𝘄𝗮𝘆𝘀-𝗼𝗻 𝘄𝗼𝗿𝗸𝗲𝗿 𝗽𝗿𝗼𝗰𝗲𝘀𝘀 that continuously monitors the Redis queues. Fargate provides the 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗢𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻 needed to run this background task efficiently. 𝟮. 𝗔𝗪𝗦 𝗟𝗮𝗺𝗯𝗱𝗮: 𝗧𝗵𝗲 𝗦𝗽𝗿𝗶𝗻𝘁 𝗦𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘀𝘁𝘀 💨 We chose 𝗟𝗮𝗺𝗯𝗱𝗮 (our event-driven serverless functions) for services that are short-lived, triggered by a specific event, and entirely 𝘀𝘁𝗮𝘁𝗲𝗹𝗲𝘀𝘀. • 𝗦𝗰𝗼𝗿𝗶𝗻𝗴 𝗦𝗲𝗿𝘃𝗶𝗰𝗲: This is 𝗕𝘂𝗿𝘀𝘁𝘆 & 𝗦𝘁𝗮𝘁𝗲𝗹𝗲𝘀𝘀. It's only needed for a single task: calculate the score and push it to Kinesis. Lambda provides 𝗜𝗻𝘀𝘁𝗮𝗻𝘁 𝗦𝗰𝗮𝗹𝗲 & 𝗖𝗼𝘀𝘁 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆, scaling instantly to meet the massive burst of answers without pre-provisioning capacity, helping us meet that sub-200ms SLA. • 𝗟𝗲𝗮𝗱𝗲𝗿𝗯𝗼𝗮𝗿𝗱 𝗦𝗲𝗿𝘃𝗶𝗰𝗲: This is 𝗔𝘀𝘆𝗻𝗰𝗵𝗿𝗼𝗻𝗼𝘂𝘀 & 𝗘𝘃𝗲𝗻𝘁-𝗗𝗿𝗶𝘃𝗲𝗻. It's triggered only when a GAME_ENDED event arrives from Kinesis. Lambda wakes up, updates the database, and shuts down. This maximizes 𝗖𝗼𝘀𝘁 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 while ensuring reliable database writes. 𝗧𝗵𝗲 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆 By combining Fargate and Lambda, we achieved the best of both worlds: 𝗰𝗼𝘀𝘁 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 and 𝗿𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆. This layered approach is the essence of building a robust, modern cloud-native system. What trade-offs do you prioritize when choosing between containers and serverless functions for a real-time app? 🤔 #AWSServerless #Fargate #AWSLambda #SystemDesign #CloudNative #TechSeries

To view or add a comment, sign in

More Relevant Posts

Sriram Manne
7mo Edited
Report this post
𝗔𝗣𝗜 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝘀𝗻’𝘁 𝗷𝘂𝘀𝘁 𝗮𝗯𝗼𝘂𝘁 𝗺𝗶𝗹𝗹𝗶𝘀𝗲𝗰𝗼𝗻𝗱𝘀 — 𝗶𝘁’𝘀 𝗮𝗯𝗼𝘂𝘁 𝗱𝗲𝗰𝗶𝘀𝗶𝗼𝗻𝘀. I’ve stopped asking “how fast is this API?” and started asking “why does it take that long?” Speed in modern systems isn’t a single lever — it’s an orchestra of protocols, frameworks, caching, persistence, and concurrency models, each playing its tune, often out of sync. Take the HTTP layer. HTTP/1 blocked connections. HTTP/2 brought multiplexing. HTTP/3 dances over QUIC and UDP — perfect for global, latency-sensitive workloads. The difference isn’t theory. It’s transaction time. Real millisecs. REST is clean and familiar — but it blocks like an rotary phone. GraphQL is elegant until resolvers start racing. gRPC? A performance junkie’s dream — binary payloads, HTTP/2 under the hood, lightning-fast inter-service calls. And if you’re into streaming, WebFlux or gRPC streams make even backpressure feel poetic when used right. Most bottlenecks hide below the surface. Lazy loading and 2nd-level caching sound clever until multiple apps share them. One service updates truth while another serves yesterday’s data. Those ORM tricks don’t survive distributed reality — they add coupling, stale reads, and deployment pain. Caching works when it’s deliberate: Redis, versioned keys, clear ownership — not “magically remembered” data. Redis can turn 60 ms DB calls into 5 ms, but forget invalidation once and you’re chasing ghosts. Lucene gives raw control and sub-3 ms reads. Self-hosted ElasticSearch adds scale; Managed Elastic (AWS/OpenSearch) removes ops overhead — but adds 30 ms per query even in-region from extra hops and control-plane latency. Performance always has a price tag; it’s just hidden in different parts of the stack. Threads are predictable — they block, they wait, they burn memory. Reactive is elegant — it multiplexes, never sleeps. It’s not hype; it’s math. When the bottleneck is I/O, fewer waiting threads means more doing threads. If you’re CPU-bound, reactive won’t save you — it just complicates life. The real question isn’t which one’s faster — it’s which one fits. HTTP/2 when connections are many; HTTP/3 when distance is long. REST when clarity matters; gRPC when throughput does. GraphQL for flexibility, not simplicity. Reactive scales beautifully — until you’re debugging it. Architecture is context, not correctness. As apps evolve, that context increasingly includes machine learning. Now you’re balancing latency, inference cost, and model freshness. Sometimes the fastest response is the wrong one — and the slower, smarter path delivers real value. That’s where performance meets intelligence. In the end, performance isn’t about the fastest stack. It’s about knowing where you spend time, why it happens, and what you trade for speed, safety, or learning. You don’t optimize APIs — you optimize choices. #architecture #systemdesign #machinelearning #apis #reactiveprogramming #caching #microservices

2 Comments
Like Comment
To view or add a comment, sign in
Todd Bergman
7mo
Report this post
Redis acquired Featureform to make it even easier for developers to deliver real-time, structured data into AI agents and applications. This brings together Redis’ speed and scalability with Featureform’s feature store technology. A huge step forward for anyone building production-grade AI systems that rely on fast, reliable data. Read more here: https://lnkd.in/eYzQc5KC

Redis Acquires Featureform to Help Developers Deliver Real-time Structured Data into AI Agents globenewswire.com
Like Comment
To view or add a comment, sign in
Wes Oliver
7mo
Report this post
🚀 Redis is becoming the real-time context engine for AI. Over the past few months, we’ve made two powerful additions to the Redis family — Decodable and Featureform. Why? Because the future of AI isn’t just about bigger models. It’s about better context. • Decodable brings streaming data pipelines that move information in real time. • Featureform adds feature orchestration — the structured, reusable signals that make models and agents smart. • Redis remains the lightning-fast database and vector engine where all that context lives and is served in milliseconds. Together, we’re building the real-time data and context layer for AI agents and applications. Imagine: Live data flows in → features are built and versioned → context is stored and served instantly → your AI systems stay fresh, accurate, and responsive. This is the foundation for intelligent, real-time systems that don’t just react — they understand. The future of AI is powered by context, and Redis is where that context lives. #AI #DataStreaming #MLOps #FeatureStore #VectorDB #RealTimeData #Redis https://lnkd.in/em-Ai6Ns

Redis for AI - Redis redis.io
Like Comment
To view or add a comment, sign in
Andrew Anokhin
6mo
Report this post
⭐ 𝗣𝗼𝗽𝘂𝗹𝗮𝗿 𝗩𝗲𝗰𝘁𝗼𝗿 𝗦𝘁𝗼𝗿𝗲 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝘀 𝗳𝗼𝗿 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 As 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 and 𝗥𝗔𝗚 move from weekend hacks to production, picking the right vector store becomes a core architecture decision. Here’s a brief overview and when each option shines. 👇 𝗦𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘇𝗲𝗱 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 🔹 Qdrant 𝗣𝗿𝗼𝘀: Built in Rust, very fast, OSS. Supports filtering, hybrid search, offers both self-hosted and cloud options, simple REST/gRPC APIs, and easy Docker/K8s setup. Rapid innovation and an active community make it a top choice for agent memory and RAG. 𝗖𝗼𝗻𝘀: Requires careful sharding and tuning at billion-scale workloads. 🔹 Milvus, created by Zilliz 𝗣𝗿𝗼𝘀: OSS. Supports filtering, hybrid search, offers both self-hosted and cloud options. Designed for billion-scale workloads with rich index types (HNSW/IVF/DiskANN, GPU). Excellent for multi-tenant or analytics-heavy RAG. 𝗖𝗼𝗻𝘀: Complex setup with many components; heavy for small teams or single-agent environments. 🔹 Weaviate 𝗣𝗿𝗼𝘀: OSS. Supports self-hosted and cloud deployment. Schema + GraphQL API, hybrid search, good for “knowledge graph + RAG” use cases. 𝗖𝗼𝗻𝘀: Learning curve and slower startup for simple use cases. 🔹 Pinecone 𝗣𝗿𝗼𝘀: Fully managed, hybrid search, auto-scaling, no ops overhead; ideal for SaaS or production agents. 𝗖𝗼𝗻𝘀: Proprietary and cloud-only; limited control and costlier at scale. 🔹 ChromaDB 𝗣𝗿𝗼𝘀: pip install chromadb and go; great for quick prototypes or local experiments. 𝗖𝗼𝗻𝘀: Single-node, memory-hungry, lacks enterprise features. 𝗘𝘅𝗶𝘀𝘁𝗶𝗻𝗴 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝘀 𝘄𝗶𝘁𝗵 𝗮𝗱𝗱𝗲𝗱 𝘀𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝘃𝗲𝗰𝘁𝗼𝗿𝘀 🔹 pgvector (Postgres) 𝗣𝗿𝗼𝘀: ACID, joins, one backup story - semantic search as a built-in feature. 𝗖𝗼𝗻𝘀: Slower ANN at scale; not suited for very large datasets. 🔹 Redis Stack (Redis Vector) 𝗣𝗿𝗼𝘀: Ultra-low latency, perfect for hot agent memory and session state. 𝗖𝗼𝗻𝘀: RAM expensive; unsuitable for large persistent stores. 🔹 MongoDB 𝗣𝗿𝗼𝘀: Vectors next to operational data; filtering, hybrid search; multi-cloud and managed. 𝗖𝗼𝗻𝘀: Newer feature set; not as optimized for huge datasets. 🔹 Amazon Web Services (AWS) OpenSearch 𝗣𝗿𝗼𝘀: OSS, Combines full-text + vector search; integrates tightly with AWS and AgentCore. 𝗖𝗼𝗻𝘀: Complex tuning, overkill for small workloads. 🔹 Snowflake Cortex Search 𝗣𝗿𝗼𝘀: filtering, hybrid search over docs data + text fields; seamless with Cortex embeddings. Data and docs in one place. 𝗖𝗼𝗻𝘀: Best if you’re already on Snowflake; heavy for small apps. 💡 T𝗮𝗸𝗲𝗮𝘄𝗮𝘆 • Rapid prototypes → ChromaDB • Self-hosted → Qdrant, Milvus, Weaviate. • Enterprise-scale → Qdrant, Milvus, Pinecone, AWS OpenSearch. • Integrated product stacks → pgvector, MongoDB, Snowflake Cortex Search. I’ll share a more detailed review of each solution in the comments - stay tuned. 👇 #VectorDB #VectorStore #AI #AgenticAI #RAG
15 Comments
Like Comment
To view or add a comment, sign in
Syed Shoaib Noor
7mo
Report this post
🚀 How We Cut RAG Latency and Cloud Costs by 40% with One Simple Architecture Shift Retrieval-Augmented Generation (RAG) systems are powerful, but when scaled, they can become slow and expensive. The main culprits are repeated embedding computations and constant vector database queries. Here’s the architecture we used to fix it 👇 💡 Optimization Highlights ✅ Caching embeddings in Redis (ElastiCache) to reuse results instead of recomputing every time ✅ Batch processing to make fewer, larger embedding calls and reduce overhead ✅ Serverless scaling with AWS Lambda to handle bursts without paying for idle time ✅ Monitoring with CloudWatch to detect slow queries early and adjust load The result: ⚡ 30–50% fewer compute calls ⚡ Faster end-user responses ⚡ Lower AWS bills without changing the core model The attached diagram breaks down how the flow works — from user query → cache → vector DB → LLM → final answer. What’s your favorite strategy for making AI pipelines cheaper and faster? Would love to hear your ideas 👇 #AI #RAG #Python #AWS #CloudComputing #Architecture #MLOps #AIAgents
1 Comment
Like Comment
To view or add a comment, sign in
Walter Zwaans
7mo Edited
Report this post
Are you using Amazon Bedrock and LLM’s for AI development and production? Then have a look at AWS Valkey Semantic Caching for AI Workloads! A couple of weeks ago I attended an AWS session about DB’s & tools for AI. In that session I got insights how Valkey can be used. Because I think this can be very beneficial for you, as our partner using Bedrock and LLM’s, I want to share how AWS Valkey—an open-source, in-memory data store—enables blazing-fast semantic caching and search for generative AI applications. How does the AWS Valkey semantic caching work? Valkey is a drop-in replacement for Redis OSS (fork from 7.2), stewarded by the Linux Foundation and fully managed on AWS via ElastiCache. It supports advanced caching patterns, including semantic caching for AI workloads. A Semantic Caching Example Imagine your GenAI chatbot receives the query: “What adventure books do you recommend?” Valkey stores the vector embedding of this query and its response. Next time a user asks a semantically similar question, e.g., “Tell me some adventure books to read?” Valkey performs a vector similarity search, finds the cached embedding, and instantly returns the relevant answer—no need to invoke the LLM again. Your benefits here are a faster response and decreased use of tokens against the LLM, so it saves costs. An Agentic Memory Example: For multi-turn conversations, Valkey can store both short-term (current session) and long-term (user preferences, previous interactions) memory. Example: - Short-term: “Plan a 5-day trip to Dublin with a $5000 budget.” - Long-term: “Prefers cultural sites, has Global Entry.” Valkey enables your AI agent to recall and apply this context across sessions. Three benefits of AWS Valkey: 1. Ultra-Low Latency & High Throughput: - Valkey delivers microsecond response times and supports over 1 million requests per second per node. - Example: Semantic cache hits return results in ~5ms vs. 100ms for a typical LLM call. 2. Massive Cost Savings: - Semantic caching with Valkey can reduce LLM inference costs by up to 89%. - Example: Monthly Bedrock LLM cost drops from $28K to $2.9K with 90% cache hit rate. - Financial Impact: By serving repeated or similar queries from cache, you avoid expensive LLM calls. For high-traffic AI apps, this means thousands of dollars saved monthly, enabling you to scale without breaking your budget. 3. Redis Compatibility & Managed Operations: - Supports all Redis data structures and libraries, making migration seamless. - On AWS ElastiCache, you get auto-scaling, serverless options, enterprise-grade security, and up to 230% improved throughput compared to Redis OSS. Explore more about AWS ElastiCache and Valkey on: https://lnkd.in/eNrH6e3N Do you want to receive a copy of the session slide deck, please leave me a pm. #TDSYNNEX #AWS #Valkey #SemanticCaching #AIDevelopment #GenAI #ElastiCache #Redis
Like Comment
To view or add a comment, sign in
Toby Stapleton
7mo
Report this post
I spent the last month building a calculator that answers the question nobody publishes numbers for: when does self-hosting AI models actually make financial sense? Everyone's talking about ditching OpenAI. AWS cut GPU prices 44% in June. Open-source models now match GPT-4 on benchmarks. Every founder I talk to is wondering if they should self-host. The problem is nobody shares the actual break-even math. You'll find hundreds of blog posts saying "it depends on your use case." Zero posts with specific token thresholds that flip the economics. So I crunched the numbers across every major cloud provider, model tier, and traffic pattern. The break-even point sits at 90-140 million tokens monthly. That's the spreadsheet answer. The honest answer is 500 million to 1 billion tokens monthly. Why the gap? The engineer maintaining your infrastructure costs $75,000-$150,000 annually. Cold start optimization adds another 2-3x to your infrastructure costs. Compliance audits, backup systems, staging environments. None of that shows up when you're comparing GPU pricing to API rates. Below 100 million tokens monthly, the math doesn't work. OpenAI's GPT-4 nano costs $0.10 per million tokens. You can't beat that with fixed infrastructure costs, period. 100M-500M tokens are the decision zone. Depends on traffic consistency and whether you actually have ML engineering capacity. Most founders don't. For tokens above 500M with predictable traffic patterns, self-hosting delivers 10-100x cost savings. The economics become impossible to ignore. Here's what surprised me most: 90% of founders who ask about the self-hosting process 2-10 million tokens monthly. They're not even close to the threshold. They're reading this content because it sounds like the smart, technical thing to do. The actual smart move? Keep your team focused on the product. Pay the API bill that costs less than a single engineer's salary. The data privacy argument almost never applies either. Unless you're in healthcare, finance, or defence with actual HIPAA/GDPR requirements, you're solving a problem you don't have. OpenAI doesn't train on API data anyway (hasn't since March 2023). I'm terrible at this myself—I'll catch myself justifying technical decisions with "security concerns" that don't actually exist. Maybe that's the real question. Not whether you care about privacy or cost optimization. It's whether you're processing 500M+ tokens monthly with consistent traffic patterns and dedicated ML infrastructure capacity. If you're not sure where you fall, I built the calculator to run the actual economics for your specific situation. Comment "CALCULATOR" and I'll send you the link.
2 Comments
Like Comment
To view or add a comment, sign in
Tony Tomy
7mo
Report this post
Exciting update for Azure Cosmos DB users! Microsoft introduces Binary Encoding, delivering up to 70% storage savings and faster query performance—especially for large datasets. Automatically enabled for new containers and rolling out to existing ones in 2025, no action needed! Bonus: SDK updates will unlock even more speed. A big step toward cost-efficient, high-performance data solutions. #AzureCosmosDB #CloudComputing #AI #NoSQL #MicrosoftAzure

Announcing Cost and Performance Improvements with Azure Cosmos DB's Binary Encoding - Azure Cosmos DB Blog https://devblogs.microsoft.com/cosmosdb
Like Comment
To view or add a comment, sign in
Jayas Balakrishnan
7mo
Report this post
𝗔𝗜/𝗠𝗟 𝘄𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 𝗼𝗻 𝗔𝗪𝗦: 𝗪𝗵𝗮𝘁 𝘁𝗵𝗲𝘆 𝗱𝗼𝗻'𝘁 𝘁𝗲𝗹𝗹 𝘆𝗼𝘂 𝗮𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗰𝗼𝘀𝘁𝘀 Your CEO wants AI features in the product. The data science team builds a beautiful model. It works perfectly in the lab. Then you deploy it to production and AWS bills you $80K for the first month. Nobody warned you that ML infrastructure costs different than everything else you've run. 𝗪𝗵𝘆 𝗠𝗟 𝗰𝗼𝘀𝘁𝘀 𝗯𝗹𝗶𝗻𝗱𝘀𝗶𝗱𝗲 𝘁𝗲𝗮𝗺𝘀: Traditional application costs are predictable. Scale linearly with traffic, optimize with caching, and right-size instances. ML workloads break all those assumptions. GPU instances cost 10x more than CPU instances. Training runs consume TB of data transfer. Model inference latency requirements force you into expensive instance types. Finance budgeted like it's another microservice. It's not even close. 𝗧𝗵𝗲 𝗰𝗼𝘀𝘁𝘀 𝘁𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗵𝘂𝗿𝘁: • 𝗚𝗣𝗨 𝗶𝗻𝘀𝘁𝗮𝗻𝗰𝗲𝘀 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 𝟮𝟰/𝟳: p4d.24xlarge costs $32/hour. Your data scientists spin up three for experimentation and you're at $70K monthly before shipping a feature. Nobody budgeted for keeping these running because cold starts take 30 seconds and you can't afford the latency. • 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲: Your model runs in 100ms. Seems fast. At 10M predictions daily, you need GPU instances running continuously because ML workloads don't scale down gracefully like web servers. You're paying peak capacity prices for average load. • 𝗙𝗮𝗶𝗹𝗲𝗱 𝗲𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝘀: Data scientists run 50 training experiments. 45 fail. You paid for all 50. That's $15K in compute for experiments that went nowhere. Traditional software doesn't work like this: you test cheaply and deploy what works. ML burns money during exploration. 𝗪𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘄𝗼𝗿𝗸𝘀: • 𝗦𝗽𝗼𝘁 𝗶𝗻𝘀𝘁𝗮𝗻𝗰𝗲𝘀 𝗳𝗼𝗿 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴: 70% discount on GPU compute. Build checkpointing into training pipelines. That $23K monthly p4d instance becomes $7K. Yes, jobs get interrupted; that's fine for training. • 𝗕𝗮𝘁𝗰𝗵 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝘄𝗵𝗲𝗿𝗲 𝗽𝗼𝘀𝘀𝗶𝗯𝗹𝗲: Real-time inference means 24/7 infrastructure. Batch processing, whether hourly or daily, reduces costs by 60-70% because you only pay for compute time used. If your use case tolerates it, batch wins. • 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗯𝘂𝗱��𝗲𝘁𝘀 𝗽𝗲𝗿 𝗲𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁: Give data scientists $5K monthly for experiments, not unlimited access. Forces prioritization. They'll run fewer, better-designed experiments instead of brute-force hyperparameter searches. 𝗦𝘁𝗮𝗿𝘁 𝘄𝗶𝘁𝗵: Before committing to ML in production, run a cost model for expected traffic. Most teams discover their MVP will cost 5-10x their initial estimate. Better to know that before building. What's the most expensive surprise you've hit running ML workloads in production? #AWS #awscommunity #kubernetes #CloudNative #DevOps #Containers #TechLeadership

2 Comments
Like Comment
To view or add a comment, sign in
Kingsley Uyi Idehen
7mo Edited
Report this post
Large Language Models (LLMs) are phenomenal RDF clients. They are now surfacing insights from datasets long available in the Linked Open Data (LOD) cloud, often via direct access to vectorized instances of source data through MCP Servers—for example, Anthropic has released an MCP server for PubMed. What many may not fully realize is that large parts of the Semantic Web Project are being revisited through LLM-powered User Agents. This opens enormous opportunities for synergy—once properly understood. Ultimately, what we all want is a dimension of the Web that functions like a DBMS on steroids, free from the limitations of conventional siloed DBMSes: 1. Standardized Identifiers – Global entity identity, including references such as hyperlinks. 2. Fine-Grained Entity Relationships – Represented as 3-tuples rather than coarse-grained n-tuples (tables). 3. Federated Querying – Across multiple query languages, including SQL, SPARQL, and GraphQL. 4. Negotiable Result Representations – Flexible formats for query solutions. Together, these capabilities enable true data de-silo-fication at Web scale—remembering that the Web is an abstraction layer over the Internet, where hyperlinks replace machine hostnames (and IP addresses) as the fundamental unit of reference. As per usual, see comments section for links. #CDO #CDIO #CAIO #CIO #CTO #CMO #CxO #AI #GenAI #SemanticWeb #KnowledgeGraphs #LODCloud #LinkedData

3 Comments
Like Comment
To view or add a comment, sign in

3,004 followers

22 Posts

View Profile Follow

Choosing the Right AWS Service for Real-Time Scale

More Relevant Posts

Explore content categories