"If you're not using AI all day, every day, writing any code that's not written by an AI, I don't know what you're doing." - Max Beauchemin on the current state of software development. He's coding 10 hours a day at 20x speed using Claude. Built Claudette to manage the workflow. Wrote a sessionization pipeline with dbt and a custom CLI called sup that gives Claude SQL access. The people who figure out agentic coding on their large dbt projects and Airflow DAG repos will have a higher multiple over their peers. The shift is already here for those paying attention. Check out the entire episode at https://lnkd.in/gN-KgdBY and subscribe to the Data Renegades podcast wherever you get your podcasts. #AIEngineering #DataEngineering #AgenticWorkflow #ProductivityHacks #ClaudeAI
Max Beauchemin on AI-driven coding and productivity
More Relevant Posts
-
Data science is one of the most talked-about roles in tech, but who is a data scientist and what do they do? In this episode [https://lnkd.in/eukumVRC], I break down the real work of data scientists, explore how the role differs from that of data analysts, discuss where data scientists work, and examine how AI is shaping the future of the field. Whether you’re curious about data science, thinking about transitioning from analytics, or trying to cut through the hype, this episode gives you a clear and honest perspective on the role. I plan on recording an episode every week and discussing many topics relating to data, from data analysis to data science, to data engineering, to data annotation. I want to talk about it all: my personal journey, the job market, the skills required, and their evolution. Nothing would be off the table as long as it has to do with data. I would also be having guests over to share their experiences, so if you are interested in being a guest on the podcast, please send a dm or drop a message in the comments.
To view or add a comment, sign in
-
Day 16 of my RAG journey: Today, I laid the first real foundation for my RAG pipeline. We often think of fancy prompts or clever chat flows, but it all starts with data ingestion. Here's what I achieved today: → A batch file uploader for TXT, MD, and PDF → Automatic text extraction from these documents → Clean raw text stored in Snowflake tables No embeddings. No vectors. Just solid, usable text ready for the next steps. What did I learn? 1. RAG isn't a magic trick; it begins with clean data. 2. Handling different file formats is crucial. 3. If your data ingestion process is messy, retrieval will struggle later. 4. Snowflake serves as an excellent central store for RAG pipelines. The next steps are chunking, embedding, and retrieval. Building this pipeline step-by-step is illuminating the process behind RAG for me. Deep understanding comes from doing, not just reading or following others. So, whatever your journey, remember to focus on the fundamentals. What building block are you working on today? Share your experience and tag someone on a similar path. Snowflake Streamlit Chanin Nantasenamat 👍 Follow Sudeep Kumar ✅ for more on Data Engineering & AI ♻️ Repost to help your network learn along #30DaysOfAI #Streamlit #LLM #RAG #AIEngineering #DataEngineering #SudeepKumar10x #SnowflakeSquad
To view or add a comment, sign in
-
🚀 𝐃𝐀𝐘 13 𝐂𝐎𝐌𝐏𝐋𝐄𝐓𝐄𝐃 – 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 14 𝐃𝐚𝐲𝐬 𝐀𝐈 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞 📌 𝐓𝐨𝐩𝐢𝐜: 𝐌𝐨𝐝𝐞𝐥 𝐂𝐨𝐦𝐩𝐚𝐫𝐢𝐬𝐨𝐧 & 𝐅𝐞𝐚𝐭𝐮𝐫𝐞 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐖𝐡𝐚𝐭 𝐈 𝐥𝐞𝐚𝐫𝐧𝐞𝐝 𝐭𝐨𝐝𝐚𝐲 🔹 Training multiple ML models 🔹 Feature engineering using Spark 🔹 Building Spark ML Pipelines 🔹 Tracking and comparing experiments with MLflow 🔗 Notion Link: https://l1nq.com/yjENt ✨ 𝐊𝐞𝐲 𝐭𝐚𝐤𝐞𝐚𝐰𝐚𝐲: The best model is chosen through comparison and metrics, not assumptions. Codebasics | Databricks | Indian Data Club
To view or add a comment, sign in
-
-
Day 13/14 – Model Comparison & Feature Engineering 🚀 Today I worked on comparing multiple machine learning models using MLflow and building a Spark ML Pipeline in Databricks. What I did: • Trained Linear Regression, Decision Tree, and Random Forest models • Tracked metrics (R², RMSE) using MLflow • Compared model performance using MLflow visualizations • Built an end-to-end Spark ML pipeline • Selected the best model based on performance Key takeaway: Even with limited features, Random Forest performed better by capturing non-linear patterns. Databricks Codebasics Indian Data Club #Databricks #MLflow #MachineLearning #DataEngineering #LearningByDoing
To view or add a comment, sign in
-
Observe CTO and co-founder Jacob Leverich goes beyond the usual talking points and shares how Observe actually built its platform on a data lake architecture for real-time observability at scale. On the Data Engineering Podcast, Jacob covers why open standards matter (OpenTelemetry for collection and Apache Iceberg as an open table format), and how AI-assisted workflows can work better when they’re grounded in a curated “context graph” organized around real-world use cases—helping support and engineering teams, not just SRE. He also lays out the practical design decisions required to make a data lake searchable and economically viable as telemetry grows: durable streaming ingest engineered for scale, curated columnar datasets to reduce scan/read amplification, and context-driven correlation across logs, metrics, and traces so teams can troubleshoot faster with fewer handoffs. Bottom line: Observe is engineered to consolidate telemetry, extend retention, and accelerate troubleshooting—without the usual cost curve. https://lnkd.in/gb-sCP5w #observability #opentelemetry #apacheiceberg #dataengineering #sre
To view or add a comment, sign in
-
Most observability stacks break at scale because telemetry gets too expensive to keep. Observe cofounder/CTO Jacob Leverich shares a rare, founder-level walkthrough of the design decisions required to deliver real-time observability on a data lake. In a crowded technical market, this kind of under-the-hood detail is the best way to understand a company's differentiation. I really enjoyed this podcast episode!
Observe CTO and co-founder Jacob Leverich goes beyond the usual talking points and shares how Observe actually built its platform on a data lake architecture for real-time observability at scale. On the Data Engineering Podcast, Jacob covers why open standards matter (OpenTelemetry for collection and Apache Iceberg as an open table format), and how AI-assisted workflows can work better when they’re grounded in a curated “context graph” organized around real-world use cases—helping support and engineering teams, not just SRE. He also lays out the practical design decisions required to make a data lake searchable and economically viable as telemetry grows: durable streaming ingest engineered for scale, curated columnar datasets to reduce scan/read amplification, and context-driven correlation across logs, metrics, and traces so teams can troubleshoot faster with fewer handoffs. Bottom line: Observe is engineered to consolidate telemetry, extend retention, and accelerate troubleshooting—without the usual cost curve. https://lnkd.in/gb-sCP5w #observability #opentelemetry #apacheiceberg #dataengineering #sre
To view or add a comment, sign in
-
If you’re curious why Observe, Inc. is built on Snowflake, give this a listen. It just makes sense. The next generation of observability architectures need to be data lake native.
Observe CTO and co-founder Jacob Leverich goes beyond the usual talking points and shares how Observe actually built its platform on a data lake architecture for real-time observability at scale. On the Data Engineering Podcast, Jacob covers why open standards matter (OpenTelemetry for collection and Apache Iceberg as an open table format), and how AI-assisted workflows can work better when they’re grounded in a curated “context graph” organized around real-world use cases—helping support and engineering teams, not just SRE. He also lays out the practical design decisions required to make a data lake searchable and economically viable as telemetry grows: durable streaming ingest engineered for scale, curated columnar datasets to reduce scan/read amplification, and context-driven correlation across logs, metrics, and traces so teams can troubleshoot faster with fewer handoffs. Bottom line: Observe is engineered to consolidate telemetry, extend retention, and accelerate troubleshooting—without the usual cost curve. https://lnkd.in/gb-sCP5w #observability #opentelemetry #apacheiceberg #dataengineering #sre
To view or add a comment, sign in
-
The final part of my "Hands-on dbt with ClickHouse" series is live. As always, feedback, shares and likes are very welcome! A dbt project that needs a human to run it is not a pipeline. It is a script. In the final part of the series, we move from a manually triggered setup to a real pipeline. Scheduled runs, dependency-aware execution, and no human in the loop to remember to press a button. This article is about orchestration and about turning dbt into something that actually runs on its own. And as a bonus, it is not just about running jobs. We also cover notifications, so failures do not stay hidden in logs, but show up in Slack, where teams can react before broken data reaches dashboards. In the previous parts, we built the full foundation step by step: 1️⃣ Environment [https://lnkd.in/egn28jUf] 2️⃣ Staging [https://lnkd.in/ewryrVhh] 3️⃣ Intermediate models [https://lnkd.in/eZkAYPPE] 4️⃣ Configs [https://lnkd.in/edQmJery] 5️⃣ Seeds and Snapshots [https://lnkd.in/e53fkPBy] 6️⃣ Ad hoc Analyses [https://lnkd.in/ejRU_pyp] 7️⃣ Data Tests [https://lnkd.in/eT-Dt4X4] 8️⃣ Unit Tests [https://lnkd.in/dQvQcVyt] The final part ties everything together and shows how a dbt project moves from a local playground to a production-ready pipeline. Read it here 👉 [https://lnkd.in/ddA6-gXj] #dbt #clickhouse #airflow #dataengineering #analyticsengineering #learning
To view or add a comment, sign in
-
-
New to Data Science? Confused by all the buzz? We’ve created a free podcast series to explain what data science and machine learning really are — without jargon or hype. 5 short episodes on: • What data science is (and isn’t) • Why data quality matters • How real-world data becomes ML 🎧 Listen on Spotify: https://lnkd.in/dSeJyx9w
To view or add a comment, sign in
-
✅ 𝗗𝗮𝘆 11 𝗖𝗼𝗺𝗽𝗹𝗲𝘁𝗲𝗱 - 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝟭𝟰 𝗗𝗮𝘆𝘀 𝗔𝗜 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 🎯 PHASE 3: ADVANCED ANALYTICS Today’s focus was on Statistical Analysis & ML Preparation, where I explored how data moves one step closer to machine learning readiness using Databricks. What I Learn on Day 11: Descriptive statistics Hypothesis testing A/B test design Feature engineering 🔍 What I worked on Day 11: 1. Calculate statistical summaries 2. Test hypotheses (weekday vs weekend) 3. Identify correlations 4. Engineer features for ML Sharing today's progress below & now, I’m ready to visualize these insights on 𝐃𝐚𝐲 12 🚀🚀 Thanks to Codebasics, Databricks, and Indian Data Club for this hands-on learning experience. #Day11 #14DayAIChallenge #DatabricksWithIDC #Databricks #DeltaLake #DataEngineering #Codebasics #IndianDataClub
To view or add a comment, sign in
More from this author
Explore related topics
- How to Use AI Agents to Streamline Digital Workflows
- How AI Agents Are Changing Software Development
- How to Boost Productivity With AI Coding Assistants
- How to Use AI Agents to Optimize Code
- Using Asynchronous AI Agents in Software Development
- Claude's Contribution to Streamlining Workflows
- How Agent Mode Improves Development Workflow
- Best Use Cases for Claude AI
- How to Boost Developer Efficiency with AI Tools