Accelerate Apache Spark Workloads on Snowflake

This title was summarized by AI from the post below.

View organization page for Snowflake Developers

67,901 followers

2mo

How to accelerate your Apache Spark™ workloads on Snowflake. In this hands-on lab, you'll learn: - How to run existing PySpark workloads (including Spark DataFrames and UDFs) on Snowflake with minimal migration - How Spark workloads execute on Snowflake’s elastic compute via Snowpark Connect, and why this execution improves performance and total cost of ownership - How Toyota consolidated their Spark workloads, reducing operational complexity and costs. Watch on demand: https://lnkd.in/gw3iDrJi

To view or add a comment, sign in

More Relevant Posts

Malayna Tran
2mo
Report this post
Check out the latest Databricks benchmark report! The Capital One Slingshot team ran 4,750+ TPC-DS queries over 30 days to simulate real-world production workloads across 3 compute planes. Read the analysis of Jobs Classic vs Jobs Serverless vs serverless DBSQL in this post: https://bit.ly/4upvMJh
Like Comment
To view or add a comment, sign in
SAMBIT KUMAR PATI
1mo
Report this post
🚀 Snowflake Warehouse Essentials — Simplified Managing compute in Snowflake isn’t just about size — it’s about smart scaling. Here’s the essence: Start, stop, resize anytime for flexible compute. Auto‑suspend/resume saves credits when idle. Multi‑cluster warehouses handle concurrency seamlessly. Right‑size your workloads — bigger isn’t always faster. Notebook optimization via SYSTEM$STREAMLIT_NOTEBOOK_WH keeps costs lean. 💡 Efficiency isn’t about more power — it’s about the right power at the right time. HAPPY LEARNING 😊 #snowflake #dataengineering #spark #virtualwarehouse
1 Comment
Like Comment
To view or add a comment, sign in
Vinoth Chandar
1mo
Report this post
For years, teams running Apache Spark have had to make a tradeoff: 👉 You want control → run Spark yourself on k8s 👉 You want performance → use a managed service You rarely get both. Quanton Kubernetes Operator (QKO) changes that — a drop-in replacement for the Kubeflow Spark operator to run Spark workloads 3–4x faster on your own infrastructure. Why this matters: Spark on Kubernetes has quietly become the default for serious platform teams. No one wants to manage the data platform compute as a special snowflake anymore. But once you go self-managed, you’re usually stuck with vanilla OSS Spark — and you leave massive performance gains on the table. 👉 With QKO: - Same Spark jobs - Same Kubernetes setup - Completely different performance envelope Great fit for teams that want performance without leaving k8s infra With Quanton, you get: ⚡ 3–4x faster Spark 💰 40–60% lower compute costs 🔁 Zero code changes We’re opening up early access to the Quanton Operator.
3 Comments
Like Comment
To view or add a comment, sign in
Malayna Tran
2mo
Report this post
The Capital One Slingshot team ran the full TPC-DS benchmark across 3 Databricks compute planes. The results might surprise you. Check out the full analysis of Jobs Classic vs. Jobs Serverless vs. serverless DBSQL and run the notebooks on your own data here: https://bit.ly/4urh2JG
Like Comment
To view or add a comment, sign in
Nandeeshwar Reddy Singireddy
2mo
Report this post
🤔 What happens when your Airflow DAG expects 3 files… but suddenly 50 files arrive? 📂⚡ That exact situation pushed me to explore Dynamic Task Mapping and scalable orchestration patterns in Apache Airflow 🔍🚀 Static pipelines work fine when inputs are predictable. But in real-world data platforms, data volume is never constant 📈 This experience made me rethink pipeline design and move towards runtime scaling instead of fixed task structures ⚙️ Sometimes the best learning comes from real production challenges 💡 #ApacheAirflow #DataEngineering #CloudComposer #GCP #DataPipelines
Like Comment
To view or add a comment, sign in
Darryl R.
1mo
Report this post
GPU autoscaling on Kubernetes is a problem that quietly drains budgets if you get it wrong. Most teams either leave expensive nodes sitting idle or over-provision because the tooling feels too complex to fine-tune. The article below discusses a practical path through that. The architecture pairs KEDA with SQS on EKS to scale GPU workloads from zero based on actual queue depth. With this approach there should be no idle nodes burning money. It covers the full implementation, from install to Pod Identity to cost tuning. Check this out from Sanath Waghela if you are running GPU workloads in production or planning to. https://lnkd.in/eU3fHhvM

Zero-to-Scale GPU Workloads on EKS with KEDA + SQS medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Bamiyan Gobets
1mo Edited
Report this post
I just showed a data engineering team how to reduce their Apache Spark compute costs by 90% using the piece of silicon I’m holding in my hand. Their current setup: 10 full racks of Spark workers, scaling rapidly, all to prepare data for their AI models. Using our free Spark analysis tool, I demonstrated that by accelerating their Spark with Speedata APUs, they can run the exact same workloads 20× faster on half a rack. 90% less infrastructure. Same data. Same workloads. 20x performance boost @ 10% of their current monthly spend. We can do the same for your Spark environment. DM me or leave your questions/details here: https://lnkd.in/eN-HHw-J Speedata.io
1 Comment
Like Comment
To view or add a comment, sign in
Lefteris Karageorgiou
2mo
Report this post
🚨 We just shipped two new open-source tools for AWS Lambda Managed Instances. 🧮 LMI Pricing Calculator — compare LMI costs against standard Lambda before you commit ⏰ LMI Scheduler — schedule-driven cost optimisation for your LMI capacity providers Both are available now in our GitHub repo, alongside the Monte Carlo simulation sample from Debasis Rath 👉 Check out the tools folder under: https://lnkd.in/dvFvaX42

GitHub - aws-samples/sample-aws-lambda-managed-instances github.com
Like Comment
To view or add a comment, sign in
CloudBoostUP

259 followers
2mo
Report this post
We scaled a data platform to tens of isolated products. Each business unit gets their own storage, compute, catalog, and identity. Fully isolated. Nothing shared. Not by writing tens of configurations. By writing one Terraform module and executing it through config.

1 Comment
Like Comment
To view or add a comment, sign in
Spice AI

4,452 followers
1mo Edited
Report this post
Amazon DynamoDB acceleration that doesn't break on restart! Watch Spice engineer Viktor Yershov enable acceleration snapshots in Spice, stop the Spice runtime, delete the local acceleration cache, and keep writing to DynamoDB while Spice isn’t running. On restart, the snapshot is restored, the stream catches up, and the table is ready in seconds - with no data loss or reprocessing. Check out the complete workflow in the 3 minute demo: https://hubs.ly/Q049Wbgf0 Viktor also shared details on a customer's Spice + DynamoDB architecture in this blog: https://hubs.ly/Q049Wdz80

1 Comment
Like Comment
To view or add a comment, sign in

67,901 followers

View Profile Connect

Accelerate Apache Spark Workloads on Snowflake

More Relevant Posts

Explore related topics

Explore content categories