AI Model Training Costs and Tariffs

Explore top LinkedIn content from expert professionals.

Summary

AI model training costs and tariffs refer to the money and fees required to develop, run, and maintain artificial intelligence systems. Unlike traditional software, AI models often incur high and variable expenses for computing power, data processing, and human labor, which can fluctuate widely based on usage and scale.

  • Understand true costs: Look beyond basic pricing and consider all the expenses involved in AI projects, including infrastructure, data labeling, and ongoing management.
  • Match pricing to usage: Choose models and payment plans that fit your specific needs, data privacy requirements, and budget to avoid unexpected charges.
  • Consider broader impacts: Be aware of hidden costs like environmental effects and labor conditions that contribute to the overall price of AI model training and deployment.
Summarized by AI based on LinkedIn member posts
  • View profile for Aakash Gupta
    Aakash Gupta Aakash Gupta is an Influencer

    Helping you succeed in your career + land your next job

    313,809 followers

    $7,225 for one day of coding. And Cursor isn't even the worst example. Replit's margins went negative. Anthropic throttles its best users. I mapped pricing across 50 AI startups. Six distinct patterns emerged. The core tension: traditional SaaS has near-zero marginal cost per user. AI products pay for compute on every interaction. A casual Claude user costs pennies. A developer running Claude Code all day costs tens of thousands per month. Your best users are your most expensive users. That tension is breaking every pricing model in the market. Cursor charged a flat 500 requests/month. Worked fine until users leaned into multi-step agent workflows. They switched to credit pools. One developer burned 500 requests in a single day. The plan description changed from "Unlimited" to "Extended" twelve days after launch. Replit grew 15x in ten months ($16M to $252M ARR). But they were buying revenue with compute. When they launched a more autonomous agent, margins crashed to negative 14%. They had to invent "effort-based pricing" mid-flight. Anthropic played it differently. Their $17/$100/$200 tiers map to genuinely different user personas, not volume bands. A casual user and a Claude Code developer are different products with different willingness to pay. The lesson across all 50 companies: before you set any price, pull the cost distribution. What does your P10 user cost? P50? P90? If the ratio exceeds 10x, flat pricing will break. In AI products, it almost always exceeds 10x. Full guide with all 6 models, 4 case studies, and a decision tree: https://lnkd.in/gdKaQSMk

  • View profile for Arjun Jain

    Founder & CEO, Fast Code AI | Research-grade AI for enterprises with hard problems | Dad

    37,135 followers

    #MIT's new "Radial Attention" makes Generative Video 4.4x cheaper to train and 3.7x faster to run. Here's why: The problem with current AI video? It's BRUTALLY expensive. Every frame must "pay attention" to every other frame. With thousands of frames, costs explode exponentially. Training one model? $100K+ Running it? Painfully slow. Massachusetts Institute of Technology, NVIDIA, Princeton, UC Berkeley, Stanford, and First Intelligence just changed the game. Their breakthrough insight: Video attention works like physics. - Sound gets quieter with distance - Light dims as it travels - Heat dissipates over space Turns out, AI video tokens follow the same rules. Why waste compute power on distant, irrelevant connections? Enter Radial Attention: Instead of checking EVERY connection: • Nearby frames → full attention • Distant frames → sparse attention • Computation scales logarithmically, not quadratically Technical result: O(n log n) vs O(n²) Translation: MASSIVE efficiency gains Real-world results on production models: 📊 HunyuanVideo (Tencent): • 2.78x training speedup • 2.35x inference speedup 📊 Mochi 1: • 1.78x training speedup • 1.63x inference speedup Quality? Maintained or IMPROVED. What this unlocks: 4x longer videos, same resources 4.4x cheaper training costs 3.7x faster generation Works with existing models (no retraining!) And, MIT open-sourced everything: https://lnkd.in/gETYw8eT The bigger picture: The internet is transforming. BEFORE: A place to store videos from the real world NOW: A machine that generates synthetic content on demand Think about it: • TikTok filled with AI-generated content • YouTube creators using AI for entire videos • Streaming services producing personalized shows • Educational content generated for each student This changes everything. Remember when only big tech could afford image AI? 2020: GPT-3 → Only OpenAI 2022: Stable Diffusion → Everyone 2024: Midjourney everywhere Video AI is next. Radial Attention probably just accelerated the timeline. The future isn't coming. It's here. And it's more accessible than ever. Want to ride this wave? → Follow me for weekly AI breakthroughs → Share if this opened your eyes → Try the code: https://lnkd.in/gETYw8eT What will YOU create when video AI costs 4x less? #AI #VideoGeneration #MachineLearning #TechInnovation #FutureOfContent

  • View profile for Bhavishya Pandit

    Turning AI into enterprise value | $20 M in Business Impact | Speaker - MHA/IITs/IIMs/NITs | Google AI Expert | 50 Million+ views | MS in ML - UoA

    85,667 followers

    People debate restaurants for 45 mins but pick their AI models in 45 secs. 300+ models. Millions at stake. Yet people pick the one who has the best marketing budget. 67% of organisations globally now run LLM-powered workflows. API pricing spans from $0.15 to $60+ per million tokens, depending on what you pick. 239+ models are being actively evaluated on major benchmarks as of 2026. And the model that topped every leaderboard six months ago is already in the middle of the pack. So here's the decision framework and where today's most relevant models actually sit within it 👇 🔐 STEP 1: Local or Cloud? Sensitive data? Go local. And local no longer means weak. → Low/medium tasks (extraction, simple logic): IBM Granite 4.1 (3B, free), MiniCPM-V 4.6 (1.3B, free + multimodal), NVIDIA Nemotron-4 15B → Complex local reasoning: DeepSeek-V4-Flash (self-hostable, 13B active parameters) paired with Chain-of-Thought prompting does heavy lifting without touching a cloud API. Cloud-connected? Horsepower isn't the problem. Cost and latency are. ⚡ STEP 2: Capability vs. Speed/Cost? Maximum intelligence needed: → Claude Mythos Preview leads on GPQA reasoning. Gemini 3.1 Pro is the strongest for coding agent workflows. GPT-5.5 hits 74.9% on SWE-Bench for software engineering tasks. Speed and cost matter more: → GPT-5.5 Instant is now the default ChatGPT model — fast and affordable. → Qwen3-Max at ~$1.20/M tokens matches Claude Sonnet 4.6 on Arena-Hard. DeepSeek V3.2 sits at $0.28/M input — arguably the best cost-performance ratio available right now. 🎯 STEP 3: Does a specialised model exist for your domain? Almost always yes — and it almost always wins on narrow tasks. → Code agents: GPT-5.3 Codex (tuned for agentic IDE workflows at $1.25/M), DeepSeek-V4-Pro, Gemini 3.1 Pro → Budget-conscious general use: Kimi K2.6 at $0.95/M is the cheapest model in the current top 10 by GPQA Diamond. Gemini Flash for real-time agentic use cases. The shift nobody's pricing into their stack yet: GPT-4-level performance cost $30/M tokens in 2023. Today, you get equivalent capability for under $1/M. Intelligence is commoditising faster than teams are adapting. The open-source ecosystem is now self-hosting models that would have required expensive API access just a year ago. The right model isn't the most powerful one. It's the one that fits your task, your data privacy requirements, and your budget simultaneously. Every time. Saved the full decision tree above as your cheat sheet. Stop choosing models on vibes. Start choosing a strategy. If you liked this, you'll hate how much time you've been wasting on the wrong models. Follow me. I break down AI decisions that actually matter for builders.

  • View profile for Uchechukwu Ajuzieogu

    Driving Technological Innovation and Leadership Excellence

    64,674 followers

    I spent six months investigating Retrieval-Augmented Generation economics. What I found will make you reconsider every "cheaper AI" pitch you've ever heard. The promise: Build intelligent systems for $15,000 instead of $200,000. Democratize AI. Level the playing field. The reality I documented: → 72% of enterprise RAG projects fail within 12 months → Actual costs hit $1.5 million (10x vendor quotes) → Kenyan workers labeling your training data earn $1.32/hour reviewing 700 traumatic cases per shift → Venezuelan annotators make $0.11-$0.90 hourly → 185 workers unionized in Kenya. All were terminated. While Silicon Valley celebrates Pinecone's $750 million valuation (107x revenue), I followed the money to its source. I found Chen, a healthcare director whose $15K RAG pilot became a $1.2M terminated project. I found Maria in São Paulo, watching her journalism get embedded in corporate systems without a cent in licensing fees. I found workers across Kenya, Venezuela, Philippines, Syria, Bulgaria, Argentina, Ghana, and Colombia. 18-20 hour workdays. PTSD from content moderation. NDAs silencing their testimony. Productivity bonuses worth 50% of wages incentivizing them to process suicide videos in 50 seconds. The data centers powering these systems? They'll consume 1,000 TWh by 2026, equal to Japan's entire electricity usage. Ireland's data centers already take 17% of national power. Nevada facilities compete with Pyramid Lake Paiute Tribe for water rights. The technical reality vendors won't tell you: Vector database costs don't scale linearly. That $5K/month at 2TB becomes $75K/month at 10TB. RAG inflates token usage from 15 to 500+ per query. LLM inference costs dominate 60-80% of total expenses. Semantic chunking costs 37.5% more but delivers 15-20% better accuracy, the difference between success and user complaints. The question isn't whether RAG works technically. It clearly does. The question is: who pays for "cheaper" AI? Right now, the answer is Kenyan workers at $1.32/hour. Publishers without licensing deals. Communities bearing environmental costs they didn't create. Enterprises discovering that 72% failure rates make "cheaper" just deferred expensive. This isn't inevitable. It's a choice about who benefits from human intelligence. I documented everything: the labor chains, the copyright battles, the environmental data, the enterprise failures, the infrastructure consolidation, and the cooperative alternatives that exist in proof-of-concept form right now. Six months. 50+ primary sources. Court documents. Financial reports. Worker testimony. Technical analysis. The full investigation reveals the voices Silicon Valley doesn't want you to hear, and the alternative futures they're fighting to create. 🔗 Read the complete investigation on Aylgorith: https://lnkd.in/d2FCtMx4 #AIEconomics #DigitalColonialism #TechAccountability

  • View profile for Prem N.

    AI GTM & Transformation Leader | Value Realization | Evangelist | Perplexity Fellow | 22K+ Community Builder

    23,121 followers

    𝐌𝐨𝐬𝐭 𝐭𝐞𝐚𝐦𝐬 𝐮𝐧𝐝𝐞𝐫𝐞𝐬𝐭𝐢𝐦𝐚𝐭𝐞 𝐀𝐈 𝐜𝐨𝐬𝐭𝐬. They budget for models… but forget everything around them. That’s why AI projects often look “cheap” in pilots — and expensive in production. Real AI spend isn’t just inference. 𝐈𝐭’𝐬 𝐬𝐩𝐫𝐞𝐚𝐝 𝐚𝐜𝐫𝐨𝐬𝐬 𝟏𝟐 𝐦𝐚𝐣𝐨𝐫 𝐜𝐨𝐬𝐭 𝐛𝐮𝐜𝐤𝐞𝐭𝐬 𝐞𝐯𝐞𝐫𝐲 𝐂𝐅𝐎 𝐚𝐧𝐝 𝐂𝐓𝐎 𝐬𝐡𝐨𝐮𝐥𝐝 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝 👇 𝟏) 𝐂𝐨𝐦𝐩𝐮𝐭𝐞 (𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 + 𝐅𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠) GPUs, clusters, distributed runs. Costs rise with experiments, retries, and large models. 𝟐) 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 / 𝐑𝐮𝐧𝐭𝐢𝐦𝐞 (𝐓𝐨𝐤𝐞𝐧𝐬) API usage, token billing, agent tool calls. Driven by query volume and long contexts. 𝟑) 𝐃𝐚𝐭𝐚 𝐒𝐭𝐨𝐫𝐚𝐠𝐞 Warehouses, lakes, vector databases, feature stores. Embeddings, duplicates, and retention drive spend. 𝟒) 𝐃𝐚𝐭𝐚 𝐋𝐚𝐛𝐞𝐥𝐢𝐧𝐠 & 𝐇𝐮𝐦𝐚𝐧 𝐑𝐞𝐯𝐢𝐞𝐰 Annotations, SMEs, RLHF, QA checks. High-quality labeling is slow and expensive. 𝟓) 𝐃𝐚𝐭𝐚 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞𝐬 & 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 Ingestion, ETL/ELT, cleaning, transformations. Messy data creates ongoing maintenance costs. 𝟔) 𝐌𝐨𝐝𝐞𝐥 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭 (𝐏𝐞𝐨𝐩𝐥𝐞 𝐂𝐨𝐬𝐭) ML engineers, data scientists, prompt engineers. Hiring, retention, and specialist premiums add up. 𝟕) 𝐌𝐋𝐎𝐩𝐬 / 𝐋𝐋𝐌𝐎𝐩𝐬 𝐓𝐨𝐨𝐥𝐢𝐧𝐠 Model registries, prompt versioning, evaluations. Tool sprawl and enterprise licenses increase overhead. 𝟖) 𝐌𝐨𝐧𝐢𝐭𝐨𝐫𝐢𝐧𝐠 & 𝐎𝐛𝐬𝐞𝐫𝐯𝐚𝐛𝐢𝐥𝐢𝐭𝐲 Drift detection, hallucination monitoring, logging. Traces, alerts, and eval pipelines aren’t free. 𝟗) 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 Access control, secrets, red teaming, threat detection. Prompt injection and data exfiltration risks require investment. 𝟏𝟎) 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 & 𝐂𝐨𝐦𝐩𝐥𝐢𝐚𝐧𝐜𝐞 Documentation, policies, audits, legal reviews. Regulations like GDPR and EU AI Act drive ongoing costs. 𝟏𝟏) 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧 & 𝐂𝐡𝐚𝐧𝐠𝐞 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 Connecting AI to apps and workflows, training users. Adoption takes time and process redesign. 𝟏𝟐) 𝐕𝐞𝐧𝐝𝐨𝐫 & 𝐏𝐥𝐚𝐭𝐟𝐨𝐫𝐦 𝐂𝐨𝐬𝐭𝐬 SaaS tools, orchestration platforms, marketplaces. Watch for hidden add-ons and per-seat pricing. 𝐓𝐡𝐞 𝐭𝐚𝐤𝐞𝐚𝐰𝐚𝐲: AI budgeting isn’t a line item. It’s a system. If you only plan for tokens, you’ll miss most of the spend. If you plan across these 12 buckets, you build AI that scales sustainably. Save this if you’re planning AI investments. Share it with your CFO or CTO. ♻️ Repost this to help your network get started ➕ Follow Prem N. for more

  • View profile for Derek Snow

    Professor NYU | ML in Finance | Sov.ai | Prymer.ai

    12,038 followers

    The biggest lie about AI in finance: how much cloud you need. Mid-frequency quant firms (<100 employees) should spend thousands monthly, not hundreds of thousands. 𝗠𝘆𝘁𝗵: You need H100 fleets. Reality: Efficiency waves cut active compute: DeepSeek-V3 is a 671B MoE with ~37B active params/token; IBM's NorthPole in/near-memory chip shows lower latency at far higher efficiency than top GPUs in LLM inference. 𝗠𝘆𝘁𝗵: You must run on AWS/GCP/Azure. Reality: Hetzner dedicated starts €37–€100/mo; comparable Big-3 VMs land $276–$580+/mo. If you don't need managed services, pick dedicated for batch/inference. 𝗠𝘆𝘁𝗵: GPUs make the bill explode. Reality: Even GCP T4 on-demand ≈ $0.35/hr (spot/reserved lower) → a few hours/day <$50/DS/month; plenty of L4/T4 spot options at similar scale, or even better see Vast.ai 𝗠𝘆𝘁𝗵: Fine-tuning is cost-prohibitive. Reality: Sky-T1-32B-Preview reported ~$450 training to o1-preview-level performance on select reasoning/coding benches, and there are many more examples of this since. 𝗠𝘆𝘁𝗵: Advantage comes from training weights. Reality: Don't train weights; fine-tune, or even better shape the inputs (context engineering), e.g. see Perplexity's success. 𝗠𝘆𝘁𝗵: Foundation inference is pricey. Reality: Google's Gemini 2.5 Flash/Flash-Lite targets ultra-low token costs (cents-per-million-token pricing via Google/Vertex). 𝗠𝘆𝘁𝗵: You need BigQuery/Snowflake. Reality: Iceberg/Delta on object storage: ACID, time travel, schema evolution—DuckDB/Trino/Spark query in place. BigQuery/Snowflake now read Iceberg in your buckets: ~$0.023/GB-mo, no copy tax. 𝗠𝘆𝘁𝗵: Warehouses are mandatory for scale. Reality: DuckDB on a €3k laptop ingests ≈2 GB/s and runs TPC-H SF3,000–10,000 locally. Arrow↔DuckDB is zero-copy. 𝗠𝘆𝘁𝗵: Parquet is optimal for all workloads. Reality: Arrow-native engines fix time-series/object-store pain: TileDB (N-D arrays), Lance (S3 + ANN), Vortex (cascaded compression; LF AI project), F3 (Wasm decoders; SIGMOD'25). 𝗠𝘆𝘁𝗵: You need a vector-DB cluster for retrieval. Reality: LanceDB runs file-based on S3 with ANN indexes (serverless-friendly), avoiding hot persistent disks. 𝗠𝘆𝘁𝗵: Embeddings demand GPUs. Reality: Static/Model2Vec encoders deliver 100–400× faster CPU retrieval with competitive quality—ideal for hybrid (lexical+sparse+dense-rerank). 𝗠𝘆𝘁𝗵: Serving stacks are the bottleneck. Reality: vLLM/TGI continuous batching yields multi-k tokens/s on A-class GPUs and big throughput gains vs naïve serving. 𝗠𝘆𝘁𝗵: Banks are "piloting." Reality: HSBC AML: ~60% false-positive reduction (with 2–4× more true hits); BofA: 2–3B+ interactions; Morgan Stanley advisor ~98% adoption; Goldman: "~95% of an S-1 in minutes." The screenshot is from LightningAI. The bubble in AI will come down, not because AI doesn't work, but because the research enabling it is working so well, it's eating its own tail.

  • View profile for Dileep Pandiya

    Engineering Leadership (AI/ML) | Enterprise GenAI Strategy & Governance | Scalable Agentic Platforms

    21,942 followers

    Understanding LLM Adaptation: Full Training vs Fine-Tuning vs LoRA/QLoRA 🚀 Which approach really moves the needle in today’s AI landscape? As large language models (LLMs) become mainstream, I frequently get asked: “Should we train from scratch, full fine-tune, or use LoRA/QLoRA adapters for our use case?” Here’s a simple breakdown based on real-world considerations: 🔍 1. Full Training from Scratch What: Building a model from the ground up with billions of parameters. Who: Only major labs/Big Tech (OpenAI, Google, etc.) Cost: 🏦 Millions—requires massive clusters and huge datasets. Why: Needed ONLY if you want a truly unique model architecture or foundation. 🛠️ 2. Full Fine-Tuning What: Take an existing giant model and update ALL its weights for your task. Who: Advanced companies with deep pockets. Cost: 💰 Tens of thousands to millions—need multiple high-end GPUs. Why: Useful if you have vast domain data and need to drastically “re-train” the model’s capabilities. ⚡ 3. LoRA/QLoRA (Parameter-Efficient Tuning) What: Plug low-rank adapters into a model, updating just 0.5-5% of weights. Who: Startups, researchers, almost anyone! Cost: 💡 From free (on Google Colab) to a few hundred dollars on cloud GPUs. Why: Customize powerful LLMs efficiently—think domain adaption, brand voice, or private datasets, all without losing the model’s general smarts. 🤔 Which one should YOU use? For most organizations and projects, LoRA/QLoRA is the optimal sweet spot Fast: Results in hours, not weeks Affordable: Accessible to almost anyone Flexible: Update or revert adapters with ease Full fine-tuning and from-scratch training make sense only for the biggest players—99% of AI innovation today leverages parameter-efficient tuning! 💬 What’s your experience? Are you using full fine-tunes, or has LoRA/QLoRA met your business needs? Share your project (or frustrations!) in comments.

  • View profile for Paolo Perrone

    Shipping Production AI: Agents, Inference, GPU. Read by 1M+ AI engineers.

    131,578 followers

    GPUs Are The New Technical Debt Every startup in 2025: "We need GPUs!" Every CFO in 2026: "Why is AWS charging us $47,000/month?" I learned this lesson the hard way. $186,000 hard, to be exact. Here's how we created a financial nightmare: Month 1: "Just spin up a few A100s" ��� $3,200/month → "That's nothing for AI innovation!" Month 3: "We need more for training" → $12,000/month → "Cost of doing business" Month 6: "Production needs dedicated instances" → $31,000/month → CFO starts asking questions Month 9: The Full Horror Show → 12 training GPUs (mostly idle) → 8 inference servers (30% utilized) → 4 "experiment" instances nobody remembers starting → $47,000/month burning whether we use them or not The Real Technical Debt Nobody Talks About: ❌ Every custom model needs maintenance forever ❌ Every GPU cluster needs DevOps forever ❌ Every optimization becomes legacy code ❌ Every "quick experiment" becomes production What Actually Happened: • Built custom model for 2% accuracy gain • Llama 3.1 released 2 weeks later • Our model was now 15% worse • Still paying for the GPUs The Uncomfortable Math: Our "cutting-edge" setup: → $47K/month GPU costs → 2 ML engineers maintaining it ($35K/month) → Worse performance than API calls Claude API doing the same thing: → $3K/month → 0 maintenance → Gets better without us doing anything The Plot Twist: We shut down 90% of our GPUs. Switched to API calls. Performance went UP. Costs went down 94%. My framework now: 1️⃣ Start with APIs (always) 2️⃣ Prove you need custom models with data 3️⃣ Rent GPUs by the hour, not month 4️⃣ Set auto-shutdown on everything 5️⃣ Track utilization religiously The harsh reality: While you're managing GPU clusters, Your competitor is shipping features. Those GPUs you're "investing in"? They're depreciating faster than a new car. And unlike a car, they're costing you money while parked. What's your monthly GPU burn rate? And what percentage is actually being used? 💀 P.S. Made a spreadsheet that saved us $400K/year in GPU costs. Link in comments 👇

  • View profile for Maya Mikhailov

    Founder & CEO @ Savvi AI | Accelerate AI for FinServ | ex-SVP Synchrony

    9,612 followers

    According to Sam Altman, AI costs are dropping 10x every 12 months, so why are your costs just going up? 🤔 This question keeps the finance team and many AI enterprise leaders awake at night. GenAI token costs are indeed plummeting. Of course, the real story is a bit more complicated. So what's really driving increased spend? 💰𝐈𝐧𝐝𝐢𝐫𝐞𝐜𝐭 𝐜𝐨𝐬𝐭𝐬 𝐬𝐭𝐢𝐥𝐥 𝐝𝐢𝐫𝐞𝐜𝐭𝐥𝐲 𝐚𝐟𝐟𝐞𝐜𝐭 𝐭𝐡𝐞 𝐛𝐨𝐭𝐭𝐨𝐦 𝐥𝐢𝐧𝐞 Fine-tuning for specific use cases and/or RAGs, backend development and API integration, testing, security, compliance, and specialized model development or implementation resources. None of these things disappear with token prices decreasing. The path to production and scaling is still as challenging as ever. 💰𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬 𝐨𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐜𝐚𝐧 𝐛𝐞 𝐚 𝐜𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬 𝐜𝐨𝐬𝐭 Models are not one-and-done. Continuous optimization, training and learning are the keys to long-term success with AI. But, these lifetime costs rarely make it into the initial ROI calculations. Plus, as use cases expand, you might need larger models – and larger budgets. 💰𝐀𝐈 𝐈𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬𝐧’𝐭 𝐞𝐚𝐬𝐲 𝐨𝐫 𝐜𝐡𝐞𝐚𝐩 Not knowing how to scale models creates many unforeseen costs and headaches. This is true of both machine learning models and GenAI ones. While token prices drop, cloud costs are surging 30%+ annually, driven by AI scaling. Those shiny GPU clusters? That’ll cost you. One last thing to think about - today's token prices are subsidized by billions in investor capital. These large language model (LLM) companies will eventually need to monetize their massive R&D investments. The real question isn't about today's costs – it's about tomorrow's sustainability. Being AI-enabled isn't just about paying for tokens for a GenAI model. Success requires a comprehensive strategy that accounts for the full cost of implementation, optimization, and scale. Businesses need to navigate these hidden costs while delivering real value.

  • View profile for Nikhil Haas

    Co-Founder/CEO @ BioLM | AI x Molecules | Ex-Twist Bio

    2,457 followers

    Estimated costs to train bio-sequence models, based on their published methods and cloud-compute pricing: 𝑃𝑟𝑜𝑠𝑡𝑇5 • 'finetuned for 10 days on 8 NVIDIA A100 each with 80GB vRAM'   • $2,359 min (Spot instances) - $7,864 max (On-demand) 𝑃𝑟𝑜𝑡𝐺𝑃𝑇2 • 'trained on 128 NVIDIA A100s in 4 days' • $15,099 (min) - $50,335 (max) 𝑍𝑦𝑚𝐶𝑇𝑅𝐿 • '48 NVIDIA A100s 80GB for about 15,000 GPU hours'   • $18,431 (min) - $61,444 (max) 𝐷𝑁𝐴𝐵𝐸𝑅𝑇-2 • '14 days using eight NVIDIA RTX 2080Ti GPUs' • $790 (min) - $2,628 (max) 𝐷𝑁𝐴𝐵𝐸𝑅𝑇 • '25 days on 8 NVIDIA 2080Ti GPUs'   • $1,410 (min) - $4,692 (max) 𝐸𝑆𝑀-1𝑣 • 'ESM-1v models [...] for 6 days on 64 V100 GPUs. Weights for the MSA Transformer [...] 13 days on 128 V100 GPUs' • $57,508 (min) - $191,816 (max) 𝐺𝑒𝑛𝑆𝐿𝑀 • '40 A100 GPUs [...] approximately 6 hours' • $295 (min) - $983 (max) 𝑃𝑟𝑜𝑡𝑒𝑖𝑛𝐵𝐸𝑅𝑇 • 4 weeks on a single GPU • $155 (min) - $504 (max)   Consider adding labor costs and GPU hours to develop the models on top of that. Breaking down the cost, using ProstT5 as an example: • GPU hours (# of GPUs × hours per GPU): 1,920 • Instance hourly cost: $9.83 (AWS Spot Instance) (8x A100) • Instance hourly cost: $32.77 (AWS On-Demand) (8x A100) • Number of instances: 1 • Hours per instance: 240 • Est. cloud cost: $2,359 (min, spot) - $7,864 (max, on-demand) Given that the cost per kWh here is about $0.35 and most of the necessary GPUs pull between 170W and 600W, it can be an order of magnitude cheaper to train with on-prem GPUs and only pay the cost of electricity. But what about the capital expense to purchase GPUs and build an on-prem server? Yes, even after that.

Explore categories