Titelbild von DataTalksClubDataTalksClub
DataTalksClub

DataTalksClub

E-Learning-Anbieter

The community where we talk about data! Join our weekly events with practitioners: webinars, podcasts, free courses!

Info

DataTalks.Club - the place to talk about data! We are a community of people who are passionate about data. Join us: 🔸 to talk about everything related to data 🔸 to learn more about applied machine learning with our free courses and materials 🔸 to discuss the engineering aspects of data science and analytics 🔸 to chat about career options and learn tips and tricks for the job interviews 🔸 to discover new things and have fun! Our weekly events include: 👨🏼💻 Free courses and weekly study groups where you can start practicing within a friendly community of learners 🔧 Workshops where you can get hands-on tutorials about technical topics ⚙️ Open-Source Spotlight, where you can discover open-source tools with a short demo video 🎙 Live Podcasts with practitioners where they share their experience (and the recordings too) 📺 Webinars with slides, where we discuss technical aspects of data science Join our Slack channel to become a part of the community. Tap the "Register" button at the top of the page!

Website
https://datatalks.club/
Branche
E-Learning-Anbieter
Größe
1 Beschäftigte:r
Hauptsitz
Berlin
Art
Nonprofit
Gegründet
2020

Orte

Beschäftigte von DataTalksClub

Updates

  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    How do you evaluate an agent beyond "did it get the right answer"? A useful way is to break the agent down into 3 parts: 1. Goal 2. Plan 3. Action Then measure what happens at each boundary. Here are the main evaluation dimensions: 1. Goal -> Action: answer correctness, relevance, grounding or faithfulness against a source 2. Goal -> Plan: plan quality, including whether the tool sequence is likely to achieve the goal and whether the right tools were chosen 3. Plan -> Action: tool execution quality, including whether the agent follows its plan, passes valid inputs, and uses tools in a way that fits their expected behavior 4. Goal + Plan + Action together: logical consistency across the full trace, such as whether the agent says one thing and does another 5. Execution efficiency: whether the agent uses unnecessary tool calls or repeats work The practical idea is simple: a single final answer can look fine while the agent still makes poor decisions along the way. Evaluating the trace helps catch failure modes like: - choosing the wrong tool - calling a tool with bad inputs - drifting away from the original goal - being correct but inefficient LM judges are useful here because they can apply rubrics to these different intersections and score the agent behavior consistently. Practical takeaway: if you're evaluating an agent, don't stop at output quality. Inspect the plan and tool trace too, because that is where many real failures show up. For a deeper practical look at evaluating MCP-powered agents, watch our Snowflake workshop with Josh Reini: https://lnkd.in/evCiKZdD He walks through the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library.

  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    A useful pattern from ML explainability carries over to LLM and agent evaluation: 1. Start with the output you care about. 2. Trace back through the system to see which inputs, steps, or tool calls influenced it. 3. Use that trace to find failure points and debug faster. TrueEra began with explainability for machine learning, using a SHAP-like approach to understand how feature changes affected predictions. That same idea later moved into RAG and agent workflows, where the important question became: what happened across the full trace that led to this answer? For LLM apps, that means looking at: - Retrieval results in RAG - Intermediate agent actions - Summarization steps - Judge feedback on output quality Combining traces with an LM judge helps identify where the system is drifting, missing context, or producing weak results. It also gives a more practical way to iterate on prompts, retrieval, and orchestration instead of guessing. Practical takeaway: when your LLM app misbehaves, inspect the full trace before changing prompts or models. The failure is often in retrieval, tool use, or step ordering, not only in the final response. Watch the full video for the full story and the evaluation workflow: https://lnkd.in/evCiKZdD

  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    You've written API extraction code before. Pagination, retries, rate limits, nested JSON, none of that is hard. The problem is writing it again for the 15th API this month. That's the actual bottleneck: repetitive pipeline setup that takes 2 hours every time you add a new source. dltHub is built for this. Instead of rewriting pagination logic, retry mechanisms, and schema normalization for every API, you define the source once: - Pagination and retries are handled automatically - Nested JSON gets flattened without custom transforms - Schema inference and versioning work Learn how to use it here: https://lnkd.in/eKhz9YWV

  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    pipeline.run() looks like a single function call. But it's actually making dozens of decisions about your data structure. Here's what's happening behind that one line of code: 1. Extract The pipeline pulls raw data from the source. At this point, you usually get one table that mirrors the API structure. If your endpoint returns 100 books, you get 100 rows in a books table. 2. Normalize This is where things get interesting. The pipeline automatically reshapes nested JSON into relational tables: - Nested arrays become separate tables - Parent-child relationships are created with foreign keys - Each nesting level gets its own table 3. Metadata tracking The pipeline creates internal tables for: - Load history - Pipeline state - Schema versions This is what enables data lineage and reproducibility. 4. Load Everything gets written to your warehouse. Learn more about data ingestion in this workshop: https://lnkd.in/eKhz9YWV

  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    Python data engineers write the same pipeline setup over and over: - Extract from an API - Handle schema changes - Normalize nested data - Load to a warehouse dltHub is an open-source library that handles the repetitive parts so you can focus on the logic that matters. What it does: 🔸 Handles schema evolution automatically 🔸 Normalizes nested JSON without custom flattening code 🔸 Supports 30+ destinations (Snowflake, BigQuery, Postgres, DuckDB, etc.) 🔸 Runs anywhere Python runs The setup: 1. Define your source (API endpoint, database, file) 2. Configure your destination 3. Run it: pipeline.run() Example: pulling data from an API and loading to Snowflake takes ~10 lines instead of 100+. If you spend more time on pipeline plumbing than actual data work, worth checking out. Check out the full workshop recording to learn more about data ingestion with dlthub and using AI tools for data engineers: https://lnkd.in/eKhz9YWV

  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    90,000 people are now in the DataTalksClub Slack 🎉 Our Slack is a great place for data and AI professionals to ask questions, share what you're building, and learn with others. What you'll find inside: - Weekly live events + workshops - Project feedback and troubleshooting help - Career questions (resumes, interviews, role transitions) - Job posts and team hiring announcements - Book discussions with authors How to join: 1) Leave your email on our Slack invite page 2) Open the invite email and join the workspace If the invite email doesn't arrive or the link expired, use the help form and we'll add you manually (we check it daily). Good channels to start with: - #career-questions - #engineering - #datascience - #events - #jobs - #book-of-the-week Once you're in, say hi and share what you're working on right now (and what you want to learn next). Join here: https://lnkd.in/eTqriwY

    • A notification in a chat app announcing a new YouTube video about AI engineering tools and skills, including a viewing link.
  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    Data Engineering Open Forum 2026 is a community-driven conference for data engineers. 📅 April 16 ⏰ 8:30 AM - 7:00 PM 📍 San Francisco, The Contemporary Jewish Museum What you can expect: - Practical talks from practitioners building modern data platforms - Onsite recruiting from companies like Airbnb, Netflix, and OpenAI - Small-group networking designed to help people actually meet and talk You'll hear talks from people working on real production systems: Apache contributors + engineers running large-scale data infrastructure. Agenda: https://lnkd.in/epdgx-bk Get your ticket with 33% off with DATATALKSCLUB (valid until March 22): https://lnkd.in/ey93PDkY

    • Data Engineering Open Forum details featuring a bridge illustration, event date, location, and registration call-to-action.
  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    Join a live, hands-on workshop and build an end-to-end ETL/ELT pipeline directly in your browser. 📅 March 26, 2026 ⏰ 11:00 AM ET / 3:00 PM GMT Learn how Airflow 3 addresses scaling pipelines, handling changing data sources, and maintaining data quality: - Build reliable pipelines with SQL-based transformations and data quality gates - Create scalable workflows with asset-aware scheduling and dynamic task mapping - Use advanced features like human-in-the-loop approvals, DAG versioning, and backfills The workshop will be led by: - Volker Janz, Senior Developer Advocate at Astronomer - Kenten Danas, Senior Manager, Developer Relations at Astronomer Register to join: https://lnkd.in/ekepkVWf Thanks to the Astronomer team for partnering with us on this post and supporting the DataTalksClub community.

    • A person works on a laptop, focused on coding, while succulent plants are visible in the background, promoting an upcoming workshop.
  • Unternehmensseite für DataTalksClub anzeigen

    30.600 Follower:innen

    Our live podcast series continues! Join us as we talk with Leonid K. about Starting a Data Conference: The Data Makers Fest Story. It's going to be a great conversation! We’ll cover:  🔹 Leonid’s path from engineering to applied AI research 🔹 His work on machine learning in sports and applied data systems 🔹 How the idea for Data Makers Fest started 🔹 What it takes to launch a conference from scratch 🔹 Why practitioner communities are important in the data ecosystem 🔹 His experience as one of the early members of DataTalks.Club Register here: https://luma.com/5jpmh3o9

    • Kein Alt-Text für dieses Bild vorhanden

Ähnliche Seiten

Jobs durchsuchen