How Fluid makes serverless fast and cost-effective for AI

This title was summarized by AI from the post below.

185,948 followers

6mo

AI apps expose the limits of traditional serverless: idle time, cold starts, and wasted compute. Fluid compute keeps the flexibility and efficiency of serverless, but without the tradeoffs. See how Fluid makes serverless fast and cost-effective for AI. https://vercel.fyi/Xg02LWI

Fluid compute: Evolving serverless for AI workloads - Vercel vercel.com

3 Comments

UpTech Solution

6mo

This sounds amazing. Nice work!

Rangle.io

5mo

Awesome stuff!

Rana Muhammad Raheel Tariq

6mo

Definitely going to help, will test it if works well or not

See more comments

To view or add a comment, sign in

More Relevant Posts

Hyacinths Pennefather
1mo
Report this post
How do we accurately measure the true environmental cost of AI inference? Google Cloud's new comprehensive methodology mandates accounting for the full stack: including idle capacity, CPU/RAM, and data center overhead, not just active consumption. This rigor reveals huge gains: the median Gemini Apps text prompt’s energy footprint dropped by 33x (and carbon by 44x) in just 12 months. Should this full-stack approach be the industry’s new standard for sustainable AI? ➡️ Read the full methodology: https://lnkd.in/gzHDpTG9 #SustainableAI #AIEfficiency #TechForGood #GoogleCloud #AIInference

Measuring the environmental impact of AI inference | Google Cloud Blog cloud.google.com
Like Comment
To view or add a comment, sign in
Bilal Saeed
1mo
Report this post
"We're building an AI startup. Just need to train our model." Then they see the GPU costs. Here's what nobody tells you about AI economics in 2025. THE HYPE: AI is democratized! Everyone can build: → Custom LLMs → Personalized models → Domain-specific AI Just spin up some GPUs and train, right? THE REALITY: GPU costs in 2025: → Single A100 instance: 15X more expensive than standard CPU → H100 rental: $0.99-$2/hour (down from $8/hour in 2023) → Training GPT-scale model: 10,000+ GPU hours That "$300K DeepSeek model"? Actually cost $5.87M when you include the base model training. WHAT ACTUALLY HAPPENS: Met a founder last month. Budget: $50K for AI development. Their plan: → Train custom model from scratch → Fine-tune for their domain → Deploy at scale Reality check: → Training alone would cost $200K+ → Doesn't include failed experiments → Doesn't include data acquisition → Doesn't include inference costs THE MATH NOBODY SHOWS: Training costs are just the start: → Data preprocessing (ETL pipelines) → Storage: 10TB daily = $25K/month on AWS S3 → Cross-region data transfer: $0.09/GB → Inference costs (recurring forever) → Failed experiments (happens a lot) And here's the kicker: commitment discounts don't work for AI. Reserved instances save 72% on regular compute. But AI workloads are too unpredictable to commit long-term. THE GPU SHORTAGE PARADOX: 2023: H100s rented for $8/hour (massive shortage) 2025: Same H100s rent for $1/hour (oversupply) What changed? Startups shifted from training custom models to fine-tuning existing open-source ones. The demand moved from training (expensive) to inference (cheaper). WHAT ACTUALLY WORKS: 99% of companies don't need custom training: → Use existing open-source models (Llama, Mistral) → Fine-tune with your data (10X cheaper) → Focus budget on inference optimization → Use RAG instead of retraining Save the $5M training budget for scaling your business. THE HONEST ECONOMICS: Large companies with deep pockets: → Can train custom models → Have GPU allocations locked in → Amortize costs across millions of users Startups: → Can't compete on training costs → Need to be smarter, not bigger → Fine-tuning + RAG is the viable path THE HIDDEN COSTS: Beyond GPUs: → AI team salaries ($200K+ per engineer) → Data labeling and cleaning → Failed experiments (most models don't work first try) → Ongoing maintenance and retraining → Vendor lock-in penalties BOTTOM LINE: The AI gold rush has real costs. Training custom models: $5M+ Fine-tuning existing models: $50K-$500K Using APIs: $100-$10K/month Most startups should start at the bottom and move up only when revenue justifies it. Don't let FOMO drive your architecture decisions. What's your experience with AI infrastructure costs? https://lnkd.in/eCUppZ2e #AI #MachineLearning #GPUShortage #AIcos

AI Costs In 2025: A Guide To Pricing, Implementation, And Mistakes To Avoid cloudzero.com
Like Comment
To view or add a comment, sign in
Yusuf Adeyemo
1mo
Report this post
Amazon has announced the general availability of vector search for ElastiCache, enabling customers to index, search, and update billions of vector embeddings from major AI providers with microsecond latency and up to 99% recall. The feature supports use cases like semantic caching for LLMs, RAG-based systems, recommendation engines, and anomaly detection, and is available on Valkey 8.2 clusters. https://lnkd.in/dFcqEauB

Announcing vector search for Amazon ElastiCache - AWS aws.amazon.com
Like Comment
To view or add a comment, sign in
Prashanth Kumar Y
1mo
Report this post
Is this the end of pay-per-page OCR? The release of the open-source DeepSeek-OCR isn't just another update; it's a fundamental shift in how we approach document processing, especially in the age of LLMs. For years, our choice has been between: * Traditional Open-Source (Tesseract): Free, but often brittle, requiring heavy pre-processing and struggling with complex layouts. * Cloud APIs (Google Vision AI, Amazon Textract, Azure Read): Powerful, accurate, and handle tables/handwriting well. But they operate on a costly pay-per-page/API-call model, which becomes a massive OpEx bottleneck for RAG and high-volume data pipelines. DeepSeek-OCR creates a new, third category. Here’s a breakdown from a cost and features perspective. 🚀 FEATURES: Extraction vs. Compression This is the most critical difference. * Traditional & Cloud OCR: Their goal is EXTRACTION. They read pixels and output text (or JSON). They are designed to get text out of a document. * DeepSeek-OCR: Its goal is COMPRESSION. It’s a multimodal model that reads a document and converts it into a highly compressed set of "vision tokens" for an LLM to read. Why does this matter? It’s 10x more efficient. Instead of an LLM processing 8,000 text tokens for a dense page, it can process just 800 vision tokens that represent the entire page—layout, text, charts, and all. 💰 COST: OpEx vs. CapEx This is where the business case gets really interesting. * Cloud APIs (OpEx): You have $0 setup cost but pay a variable fee for every page. This is predictable but scales expensively. Processing 10 million pages means 10 million charges. * Tesseract (OpEx/CapEx): $0 license fee and low compute cost (runs on a CPU). The "cost" is in (high) developer time and (low) accuracy. * DeepSeek-OCR (CapEx/Compute Cost): $0 license fee (it's open-source), but it requires a powerful GPU (like an A100) to run. The upfront/hourly compute cost is real, but the economics flip at scale. One report states DeepSeek-OCR can process 200,000+ pages per day on a single A100 GPU. When you do the math, the cost to self-host and process millions of documents becomes a fraction of the cost of pay-per-page cloud APIs. You are trading a variable operational expense for a fixed (or hourly) compute cost. My Take: Cloud OCR APIs aren't dead, but their role is changing. For simple business automation (e.g., "extract 5 fields from 1,000 invoices/month"), they are still the easiest solution. But for any company building a serious, high-volume RAG system or LLM-native document workflow, the pay-per-page model is a financial bottleneck. DeepSeek-OCR is the first major tool that is built for the AI-native future. It’s designed not just to read documents, but to feed them to LLMs efficiently. What's your take? Are you feeling the pain of per-page API costs in your RAG pipelines? #AI #OCR #DeepSeek #OpenSource #GenAI #RAG #DocumentAI #LLMs #AmazonTextract #GoogleVision #AzureAI #TechComparison #FinOps

2 Comments
Like Comment
To view or add a comment, sign in
George Heron
1mo
Report this post
A fast-growing AI infrastructure company is building secure, scalable platforms for deploying advanced machine learning models in enterprise environments. The company specializes in confidential computing, enabling organizations to run large language models (LLMs), agentic systems, and other AI workloads securely with sensitive data. Its platform supports on-premises, private cloud, and air-gapped deployments, serving industries such as healthcare, finance, and government. The team is composed of experienced engineers in AI, infrastructure, and security, with offices in Europe and North America.

ML & AI Opportunities in Finland aplitrak.com

1 Comment
Like Comment
To view or add a comment, sign in
TEKsystems

1,260,373 followers
2mo
Report this post
“The narrative around artificial intelligence feels eerily similar to the dot-com era with rapid investment, sky-high expectations and a flood of entrepreneurs racing to stake their claim.” – Armando Franco, Director of Modernization, TEKsystems Global Services History tells us the hype may cool, but the technology will mature — just as the internet did after the dot-com crash. 👉 Read the full article to see what this means for the future of AI: https://bit.ly/3K3exL3 #AI #Cloud #DigitalTransformation

OpenAI Inks $300 Billion Computing Power Deal With Oracle: Report - Techstrong.ai https://techstrong.ai
Like Comment
To view or add a comment, sign in
Mat Morgan
1mo Edited
Report this post
a really great example of ways AI is having a huge impact internally for Google - we've essentially created a never-ending game of 'VM death pool' which can optimize placement decisions for new VMs across hosts for bin-packing efficiency in under 10 microseconds (!!) I do wish it were called 'VM-DeadPool', but LAVA also has a nice ring to it. https://lnkd.in/gtSEEnNZ

Solving virtual machine puzzles: How AI is optimizing cloud computing research.google
Like Comment
To view or add a comment, sign in
Florin Lungu

Lead DevOps Engineer | Vice President (VP) @ Deutsche Bank
1mo
Report this post
AWS has launched the DeepSeek-V3.1 model, a hybrid open weight model that seamlessly switches between detailed step-by-step analysis and quicker responses. I found it interesting that this flexibility could greatly enhance decision-making processes across various applications. How do you think such advancements in AI could transform your industry?

DeepSeek-V3.1 model now available in Amazon Bedrock aws.amazon.com
Like Comment
To view or add a comment, sign in
Scott St. John
1mo
Report this post
https://ow.ly/i4Mi50Xbgfc NVIDIA and Oracle announced they are deepening their collaboration to bolster sovereign #AI initiatives and accelerate government digital transformation worldwide.

News | Oracle and NVIDIA Accelerate Sovereign AI | Pipeline Publishing pipelinepub.com
Like Comment
To view or add a comment, sign in
Pipeline Publishing

1,300 followers
1mo
Report this post
https://ow.ly/i4Mi50Xbgfc NVIDIA and Oracle announced they are deepening their collaboration to bolster sovereign #AI initiatives and accelerate government digital transformation worldwide.

News | Oracle and NVIDIA Accelerate Sovereign AI | Pipeline Publishing pipelinepub.com
Like Comment
To view or add a comment, sign in

185,948 followers

View Profile Connect

How Fluid makes serverless fast and cost-effective for AI

More Relevant Posts

Explore related topics

Explore content categories