🐍Python 3.14 now allows the Global Interpreter Lock (GIL) to be disabled. For 💨 Airflow users, that’s big. The new free-threaded build (PEP 703) lets Python run true parallel threads in one process. No multiprocessing overhead. No fake concurrency. What this could mean for Airflow: • Scheduler: Parse DAGs in parallel using threads. Faster startup for large environments. • Executors: Replace process pools with thread pools. Lower memory use and faster task execution. • Sensors: Run many waiting tasks efficiently in one process. • PythonOperators: Real multi-threading inside tasks. All CPU cores finally put to use. Single-thread performance drops slightly (about 5–10%), but multi-threaded workloads scale far better. Airflow won’t switch overnight, but this opens new ground. Python 3.14 makes true concurrency possible. The next Airflow releases could get faster without new hardware.
Marcus-Silviu Ilisie, MSc’s Post
More Relevant Posts
-
I made a 1024x1024 matrix run 10x faster by just flipping two loops. You would think looping over a 2D array is always the same. But your CPU does not agree. In C, arrays are stored row by row in memory. When you loop by row, the CPU grabs big chunks at once. It feels smooth. But if you loop by column, the CPU has to jump all over memory. Every step is a new fetch. That slows things down by a factor of 10. I am not kidding-row-major order: 5ms. Column-major: 50ms. The same code, just different loop order. Now here’s where it gets weird. In Python (NumPy), C, and JavaScript, arrays are row-major. But in Fortran, MATLAB, and Julia, they are column-major. Get this wrong, and your fast code becomes slow for no clear reason. I learned this the hard way. I once wrote a NumPy loop that seemed fine, but it was crawling. The secret? I was looping the wrong way for the memory layout. The fix is simple: - Match your loop order to how your array is stored (row-major or column-major). - In NumPy, use order='C' for row-major, order='F' for column-major. - For 3D arrays, always keep the innermost loop on the last dimension. Another trap: copying data in slices or chaining too many array methods. Each copy eats memory and time. Use views or generators when you can. Pro developers profile first, optimize only what matters, and always think about data layout. That is where the real speed comes from. Bookmark it. Master it. Your CPU will thank you.
To view or add a comment, sign in
-
-
🚀 Python 3.14 is here — and it’s a game changer! While this release brings several powerful upgrades like: ✅ Sub-interpreters accessible directly from Python ✅ T-Strings for advanced string templating ✅ An experimental JIT compiler The arrival of GIL free python🔥 For decades, the Global Interpreter Lock (GIL) has simplified memory management—but it also limited true parallelism in CPU-bound tasks. With the GIL removed, Python can finally tap into all CPU cores simultaneously within a single process. This opens doors to: ⚡ Massive performance boosts for ML, data preprocessing & model training ⚙️ Simplified and more powerful implementations of existing libraries 🧠 New frameworks designed for real parallel execution 🚀 Better utilisation of modern multi-core hardware As developers, data scientists, and engineers, we’re looking at a transformational moment. The ecosystem will evolve quickly—tools, libraries, and architectures included. #Python #GIL #Multithreading #Performance #ML #Developers #Python314 #Innovation
To view or add a comment, sign in
-
⚡️Our CANedge XCP Python tool now also supports CCP! https://lnkd.in/djDeUK4X The tool lets you provide your CCP/XCP A2L file and a CSV with the measurements/signals you wish to record. Using this, the tool auto-generates a CANedge transmit list and DBC file. This lets you easily log your ECU data with the CANedge and analyze it with e.g. the asammdf GUI, Grafana dashboards, python-can, Vector tools, MATLAB and more. For CCP, the tool supports optionally the 'BYTES_ONLY' A2L flag (where initialization of multi-byte signals have to be done 1 byte at a time). For XCP on CAN, the tool supports CAN FD (incl. WRITE_DAQ_MULTIPLE). The Python tool also lets you add your DAQ transmit list to an existing CANedge Configuration File (incl. validation) and even lets you create transmit lists for multi-ECU communication. Today, the canedge-ccp-xcp tool is actively used by OEM engineers for both CCP/XCP data acquisition in prototype vehicles. If you have questions on using the tool, contact us - we are happy to provide detailed technical guidance to get you started. Learn more below: - CANedge: https://lnkd.in/eWf6ZFka - CCP/XCP intro: https://lnkd.in/ep9xYQ9p - A2L intro: https://lnkd.in/dCAbwk5C - canedge-ccp-xcp github: https://lnkd.in/dW-j2sbh - Contact us: https://lnkd.in/eK7dmrYy #ccp #xcp #canbus #diagnostics #automotiveengineering
To view or add a comment, sign in
-
-
'Long live the GIL, You will be missed' The option to disable the GIL in Python 3.14 is a potential game-changer for performance. You can now optionally disable the Global Interpreter Lock (GIL) by using the `-X nogil` flag. 🔒 With the GIL Think of the GIL as a master key for your program. Only one thread can hold this key to execute Python code at any given time. This creates a bottleneck for CPU-bound tasks, even on multi-core processors. 🚀 Without the GIL (-X nogil) The "master key" is gone. Multiple threads can now execute Python code on separate CPU cores simultaneously. This provides true parallelism and can significantly speed up your CPU-bound code. The Big Catch: Race Conditions The GIL inadvertently protected us from many concurrency bugs. Without it, we are fully responsible for ensuring thread safety. This code is NOT safe in no-GIL mode: import threading n = 0 # shared counter def increment(): global n # This looks like one step, but it's three: # 1. Read the value of n # 2. Add 1 to the value # 3. Write the new value back to n n += 1 # Two threads will race to update n, and # updates will be lost. The final result can potentially be incorrect. ✅ Fix: Use a Lock You must explicitly protect shared data with a `threading.Lock` to make the update atomic, meaning uninterruptible. import threading n = 0 lock = threading.Lock() # Lock to protect data def safe_increment(): global n with lock: # Only one thread can be in this block at a time n += 1 #Python #GIL #Performance #Concurrency
To view or add a comment, sign in
-
Python 3.14 introduces a significant development called free-threading, which allows disabling the Global Interpreter Lock (GIL) as an opt-in mode. The GIL has been a core bottleneck for CPU-bound multi-threaded programs, since one thread could execute Python bytecode at a time. With free-threaded support, Python now enables more straightforward multi-core concurrency with threads, eliminating the need to always rely on multiprocessing or external processes. Here are the following advantages: 1. True Parallelism: With the GIL removed, Python threads can now execute in parallel across multiple CPU cores, allowing for genuine multi-threading performance gains in CPU-bound tasks. 2. Simpler Concurrency: Developers can write concurrent applications more easily without switching to multiprocessing or external parallel frameworks. 3. Improved Scalability: Applications can scale better on multi-core systems by efficiently distributing workloads across multiple threads, leading to better resource utilization and potentially faster processing for large-scale tasks. For fields like machine learning, data engineering and high-throughput backend services, this change can pave the way for more scalable Python threading designs. It's exciting to see Python take this leap toward true parallelism. How do you think free-threading will impact your work or projects? #Python314 #Threading #Developers #ParallelComputing
To view or add a comment, sign in
-
-
𝗣𝘆𝘁𝗵𝗼𝗻 𝗰𝗮𝗻 𝗳𝗶𝗻𝗮𝗹𝗹𝘆 𝘂𝘀𝗲 𝗮𝗹𝗹 𝘆𝗼𝘂𝗿 𝗖𝗣𝗨 𝗰𝗼𝗿𝗲𝘀. For years, the Global Interpreter Lock (𝗚𝗜𝗟) was Python’s biggest limitation for CPU-bound tasks. Even if you had 8 cores, only one thread could truly execute Python code at a time, others just waited for their turn. Now, with 𝗣𝘆𝘁𝗵𝗼𝗻 𝟯.𝟭𝟰, that changes. The new free-threaded (no-GIL) interpreter finally lets multiple threads run Python code simultaneously, even for CPU-heavy workloads. So what actually changed inside 𝗖𝗣𝘆𝘁𝗵𝗼𝗻 to make this possible? Let’s look at the differences. 𝗢𝗯𝗷𝗲𝗰𝘁 𝗹𝗶𝗳𝗲𝘁𝗶𝗺𝗲: • 𝗢𝗹𝗱: One thread at a time, refcounts were safe by default. • 𝗡𝗲𝘄: Refcounts are atomic. Common objects like None and True are immortal, no locking, no slowdown. 𝗟𝗼𝗰𝗸𝗶𝗻𝗴 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝘆: • 𝗢𝗹𝗱: One giant GIL for everything. • 𝗡𝗲𝘄: Many tiny locks. Each subsystem guards itself like type caches, allocators, GC. Threads finally run side by side. 𝗚𝗮𝗿𝗯𝗮𝗴𝗲 𝗰𝗼𝗹𝗹𝗲𝗰𝘁𝗶𝗼𝗻: • 𝗢𝗹𝗱: Stop the world, clean up, resume. • 𝗡𝗲𝘄: Each generation has its own lock. GC can quietly run while your code keeps executing. 𝗜𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗲𝗿 𝘀𝘁𝗮𝘁𝗲: • 𝗢𝗹𝗱: Shared global state like builtins, modules, caches all tangled together. • 𝗡𝗲𝘄: Each interpreter has isolated state. Subinterpreters can run truly in parallel. 𝗖 𝗲𝘅𝘁𝗲𝗻𝘀𝗶𝗼𝗻𝘀: • 𝗢𝗹𝗱: Every extension assumed the GIL existed. • 𝗡𝗲𝘄: A new free-threaded C API + atomic helpers make extensions thread-safe again. 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: • 𝗢𝗹𝗱: One thread per process, no matter how many cores. • 𝗡𝗲𝘄: Slightly slower single-thread, but real parallel speedups for CPU-bound workloads. Now, Python can finally breathe across all cores. — 𝐏𝐲𝐂𝐨𝐝𝐞𝐓𝐞𝐜𝐡 #Python
To view or add a comment, sign in
-
𝗣𝘆𝘁𝗵𝗼𝗻 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗚𝗜𝗟, 𝘄𝗵𝘆 𝗲𝘃𝗲𝗿𝘆𝗼𝗻𝗲’𝘀 𝘁𝗮𝗹𝗸𝗶𝗻𝗴 𝗮𝗯𝗼𝘂𝘁 𝗶𝘁 If, like me, you had heard of the “GIL” for years but never quite understood why people complained about it… here’s the simple version 👇 Until now, 𝗣𝘆𝘁𝗵𝗼𝗻 𝗵𝗮𝗱 𝘀𝗼𝗺𝗲𝘁𝗵𝗶𝗻𝗴 𝗰𝗮𝗹𝗹𝗲𝗱 𝘁𝗵𝗲 𝗚𝗹𝗼𝗯𝗮𝗹 𝗜𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗲𝗿 𝗟𝗼𝗰𝗸 (𝗚𝗜𝗟). It’s basically a mutex that ensures that 𝘰𝘯𝘭𝘺 𝘰𝘯𝘦 𝘵𝘩𝘳𝘦𝘢𝘥 𝘦𝘹𝘦𝘤𝘶𝘵𝘦𝘴 𝘗𝘺𝘵𝘩𝘰𝘯 𝘣𝘺𝘵𝘦𝘤𝘰𝘥𝘦 𝘢𝘵 𝘢 𝘵𝘪𝘮𝘦, even on multi-core CPUs. So when you were running “multi-threaded” code in Python, you actually weren’t using multiple cores efficiently. You could have 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 𝘵𝘩𝘳𝘦𝘢𝘥𝘴, yes, but 𝘰𝘯𝘦 𝘢𝘤𝘵���𝘷𝘦 𝘢𝘵 𝘢 𝘵𝘪𝘮𝘦. That’s why people went for multiprocessing or async I/O to achieve real parallelism. 🐍 𝗣𝘆𝘁𝗵𝗼𝗻 𝟯.𝟭𝟰 𝗶𝘀 𝗳𝗶𝗻𝗮𝗹𝗹𝘆 𝗿𝗲𝗺𝗼𝘃𝗶𝗻𝗴 𝗶𝘁. That means: 🟢 True parallelism inside a single Python process 🟢 Major performance gains for data-intensive workloads 🟢 Simpler concurrency models — less boilerplate, fewer workarounds The technical challenge behind this removal is huge: the GIL was deeply woven into CPython internals. The new design introduces per-interpreter memory management, atomic reference counting, and fine-grained locks to make it work safely. 💡𝗜𝗻 𝘀𝗵𝗼𝗿𝘁: 𝗣𝘆𝘁𝗵𝗼𝗻 𝗶𝘀 𝗳𝗶𝗻𝗮𝗹𝗹𝘆 𝗰𝗮𝘁𝗰𝗵𝗶𝗻𝗴 𝘂𝗽 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝗵𝗮𝗿𝗱𝘄𝗮𝗿𝗲 𝗶𝘁’𝘀 𝗯𝗲𝗲𝗻 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 𝗼𝗻 𝗳𝗼𝗿 𝗱𝗲𝗰𝗮𝗱𝗲𝘀. For data scientists, ML engineers, and backend devs, this could mean faster model training, snappier APIs, and simpler concurrency code. What do you think, will you upgrade your python version to taste true parallelism? #Python #SoftwareEngineering #DataEngineering #Performance #Concurrency
To view or add a comment, sign in
-
-
🚀 Python 3.14 has officially arrived this month, and there are three key features that stand out: 🔹 Deferred annotations (PEP 649): Type hints are now evaluated lazily, simplifying forward references and reducing startup costs. 🔹 Official free-threaded support and improved concurrency: The “no-GIL” build is now officially supported, allowing for greater parallelism in CPU-intensive workloads. 🔹 Template string literals (“t-strings”, PEP 750): This new templating syntax enables interception or validation of interpolation at runtime. Additionally, there are several bonus improvements, including smarter error messages, standard library support for multiple sub interpreters, safer debugging hooks, and internal interpreter enhancements (tail-call style interpreter) that promise approximately 10 -15% faster execution in many benchmarks. 💸 FinOps angle: Even a modest 10% runtime gain can translate directly into lower GB-seconds (and $) on Lambda, especially on Graviton. A/B test 3.13 vs 3.14 with the same memory, then right-size using Lambda Power Tuning and trim package size to reduce cold starts. Small duration drops at scale ⇒ double-digit cost savings with no app changes. #python #serverless #finops
To view or add a comment, sign in
-
-
Python’s 30-Year Limitation - Finally SOLVED! 🐍🔥 Python 3.14 💥 removes the Global Interpreter Lock (GIL) [optional: not completely removed] - unlocking TRUE parallelism across multiple CPU cores 🧠⚙️ Before 🧱: Threads blocked by GIL After ⚡: Threads running truly in parallel 💡 Example: # Before (Python ≤3.13) # Only one thread runs at a time 😩 import threading def work(): for _ in range(10**7): pass threads = [threading.Thread(target=work) for _ in range(4)] [t.start() for t in threads] [t.join() for t in threads] # ~1.2s runtime # After (Python 3.14 🚀) # All 4 threads use real cores 💪 # ~0.47s runtime 🎯 📈 Results: 3.4x faster, real concurrency, zero bottlenecks! 💬 Python just entered the multithreaded era! 🧩 Python Developer Community #Python #GIL #Multithreading #Performance #AI #Developers #Innovation #ParallelComputing 🚀🐍
To view or add a comment, sign in
-
-
Using async in Python DOES NOT always make your code run faster... Synchronous code is like this: it 𝘮𝘶𝘴𝘵 finish task A before it can even 𝘵𝘩𝘪𝘯𝘬 about starting task B. If your code makes a network request, it may wait for the next 2-3 seconds, your CPU basically sitting idle. This is what asyncio is for. It lets you "pause" a task that's waiting and immediately switch to another task ready to run. But here's the catch: asyncio 𝗺𝗮𝗸𝗲𝘀 𝘆𝗼𝘂𝗿 𝗰𝗼𝗱𝗲 𝗳𝗮𝘀𝘁𝗲𝗿 𝗼𝗻𝗹𝘆 𝘄𝗵𝗲𝗻 𝗶𝘁'𝘀 𝗜/𝗢-𝗯𝗼𝘂𝗻𝗱 (𝘄𝗮𝗶𝘁𝗶𝗻𝗴), 𝗻𝗼𝘁 𝘄𝗵𝗲𝗻 𝗶𝘁'𝘀 𝗖𝗣𝗨-𝗯𝗼𝘂𝗻𝗱 (𝗰𝗮𝗹𝗰𝘂𝗹𝗮𝘁𝗶𝗻𝗴). • I/O-Bound: Your program is waiting for something EXTERNAL. Like: Making different API calls Reading multiple files from a network drive. • CPU-Bound: Your program is actively calculating something. Like: Training a machine learning model. Running complex transformations on a large pandas DataFrame. Also, asyncio operates on a single CPU core. Hence, It provides concurrency but not true parallelism. So, if your bottleneck is the CPU, asyncio won't help. You're looking for parallelism. You need to leverage multiple CPU cores, and that’s a job for Python's multiprocessing library. #python #asyncio #concurrency #parallelism #softwarearchitecture #backend #apis
To view or add a comment, sign in
Python 3.14's GIL removal is really going to change how we think about Airflow architecture. The scheduler and executor changes will need some thoughtful testing before rolling out.