Thrilled to unveil our latest work: multi-modal machine learning to forecast localized weather! We construct a graph neural network to learn dynamics at point locations, where typical gridded forecasts miss significant variation. Paper: https://lnkd.in/eBmfsJin Weather dataset: https://lnkd.in/ejCG8bKs Code: https://lnkd.in/eQg-JzQJ AI weather models have made huge strides, but most still emulate products like ERA5, which struggle to capture near-surface wind dynamics. The correlation between ERA5 and ground weather station data is low due to topography, buildings, vegetation, and other local factors. In this work, we forecast near-surface wind at localized off-grid locations using a message-passing graph neural network ("MPNN"). The graph is heterogeneous, integrating both global forecasts (ERA5) and historical local weather station data as different nodes. What do we find? First off, ERA5 interpolation performs poorly, failing to capture local wind variations, especially in coastal and inland regions with complex conditions. An MLP trained on historical data at a location performs better than ERA5 interpolation, as it learns from the station's past observations. However, it struggles with longer lead times and lacks the spatial context necessary to capture weather patterns. Meanwhile, our MPNN dramatically improves performance, reducing the error by over 50% compared to the MLP. This is because the MPNN incorporates spatial information through message passing, allowing it to learn local weather dynamics from both station data and global forecasts. Interestingly, adding ERA5 data to the MLP does not improve its performance significantly. The MLP struggles to integrate spatial information from global forecasts, while the MPNN excels, highlighting the importance of combining global and local data. Large improvements in forecast accuracy occur at both coastal and inland locations. Our model shows a 92% reduction in MSE relative to ERA5 interpolation overall. This work showcases the strength of machine learning in combining multi-modal data. By using a graph to integrate global and local weather data, we were able to generate much more accurate localized weather forecasts! Congrats to Qidong Yang and Jonathan Giezendanner for the great work, and thanks to Campbell Watson, Daniel Salles Chevitarese, Johannes Jakubik, Eric Schmitt, Anirban C., Jeremy Vila, Detlef Hohl, and Chris Hill for a wonderful collaboration. Thanks also to our partners at Amazon Web Services (AWS) for providing cloud computing and technical support!
Weather model accuracy and data shortages
Explore top LinkedIn content from expert professionals.
Summary
Weather model accuracy refers to how closely weather predictions match real-world conditions, while data shortages highlight the lack of information needed for reliable forecasting. Improving both is crucial for creating more trustworthy weather forecasts, especially in regions and situations where data is limited or local factors complicate predictions.
- Invest in data: Prioritize collecting and maintaining quality weather measurements to strengthen the foundation for model training and improve forecast reliability.
- Combine local and global: Integrate local station data with broader weather models to address gaps and capture unique regional patterns, especially in areas with complex terrain.
- Assess uncertainty: Regularly evaluate model performance and acknowledge uncertainty, especially when working in data-sparse regions or predicting extreme events.
-
-
You don’t often get second chances in project development. But here is a challenge: our wind farms are designed for a lifetime over 25 years. However, we typically only have a few years of wind measurement data… are those years representative? So, we blend real measurement data with modelled data from historical weather models. At RWE, we wanted to better understand the reliability of the modelled data. Thanks to a digitalisation and automation initiative from our Smart Data Pipeline team, colleagues Sam Williams and Gibson Kersting led one of the most thorough benchmarks of modelled wind data in our industry. We tested 9 datasets, including reanalysis, mesoscale and large-eddy simulation (LES), against 370+ wind measurements across 190+ sites in every major wind market. Each dataset was standardised, cleaned through our Smart Data Pipeline, and assessed using robust statistical metrics. The results provided valuable insight: ERA5, the most widely used reanalysis dataset, performed more reliably than often assumed, particularly offshore and in simple terrain. Mesoscale models offer added resolution and typically improve significantly on reanalysis, but accuracy varies by provider and setup. LES (as shown in the animation, the generated winds which capture the complex atmospheric phenomenon that govern the weather), demonstrates clear benefits in modelling large offshore clusters, complex onshore sites where small-scale atmospheric effects become decisive, and high‑quality turbulence estimates. However, for simpler sites, the added value is limited. This wasn’t an academic exercise. It was about understanding the tools we depend on, knowing when a model is good enough and when it isn’t. Modelled wind data is incredibly powerful, but like any tool, its value depends on how and where it’s applied. With this benchmarking, we’ve taken a major step toward using it with greater precision and confidence across our global portfolio. In a data-driven industry, precision isn’t a luxury. It’s a competitive edge. And that edge depends not just on having more data but on understanding it deeply.
-
DeepSeek’s AI breakthrough carries a powerful lesson, not just for the AI industry but for any field reliant on predictive modeling, including weather and climate. What DeepSeek has revealed is something many people miss: models are commodities, but data is the true differentiator. In the weather industry, we see parallels every day. Big tech and startups alike believe that throwing more AI models at the problem will unlock better forecasting, but the reality is no matter how advanced your AI, it’s only as good as the data feeding it (for both training and for initializing the models). So, while the AI industry debates GPUs and training costs, our real takeaway here, data will always be the ultimate moat. And in industries like weather, where the stakes couldn’t be higher, that moat is what sets leaders apart. The future of forecasting isn’t just about smarter models, it’s also about the data you need. So instead of deploying $ on compute CAPEX and being scared to do the same on data strategy, perhaps we should think about where we invest our money to advance weather forecasting as well as other domains.
-
Estimating extreme rainfall, like the 100-year 24-hour event used in infrastructure design, is genuinely hard. Station records are sparse in many countries. Reanalysis and satellite products carry large biases. NOAA Atlas doesn't exist for most of the planet. So we tried a different approach: train a machine learning model on 70,000+ weather station records to predict the parameters of a non-stationary extreme value distribution at ~1-km resolution, everywhere. Held-out validation (drop stations, retrain, compare) shows the model reproduces observed return levels well, except where you'd expect it to struggle: arid regions and data-sparse areas. And it's worth remembering that station-level extreme estimates carry irreducible uncertainty no downstream model can overcome. Explore the map: https://lnkd.in/g4hNCPhd (free, no ads) While you're there, you can compare the ML return levels against station-level fits with uncertainty bounds, against NOAA Atlas where it exists, and see how design return levels are projected to shift under different global warming levels. Read the methodology and see the model benchmarks: https://lnkd.in/g-E89CyQ Try finding the heaviest rainfall places on Earth. Northeast India and tropical islands with high mountains in cyclone paths are good places to start. Degree Day builds engineering-grade climate analytics for infrastructure. We were founded by scientists with over 15 years of publishing in top climate journals, so we know what these models can and cannot support. Our methods are public. Our uncertainty is visible. When the data won't support a defensible answer, we tell you, even when that's not what you want to hear. Read more about Degree Day: https://www.degreeday.org
-
Global #precipitation products leave gaps and biases, especially over mountains and at high rain rates. Check this new paper in Communications Earth & Environment where we introduce an open-access framework that merges multi-source observations using regional-scale intelligent optimization and explicit #topographic factors within an end-to-end neural pipeline. The approach reconstructs missing values and corrects biases in global, time-varying precipitation fields, yielding stronger correlations and reduced errors versus standard satellite estimates. Results suggest immediate value for #storm monitoring and #flood forecasting in data-sparse, complex terrain. Paper: Communications Earth & Environment (2025), doi:10.1038/s43247-025-02624-3. https://lnkd.in/dKp6NsyY Hohai University Consiglio Nazionale delle Ricerche
-
Can we predict rainfall everywhere on Earth—even where weather radars don’t exist? The research paper (Agrawal et al. 2025) introduces Global MetNet, an operational deep-learning system designed to produce high-resolution global precipitation nowcasts using satellite observations and numerical weather prediction data. Why is this important? Because the global weather observation network is extremely uneven. Today, most advanced short-term rainfall forecasting systems rely heavily on weather radar. But radar coverage is concentrated mainly in North America, Europe, Japan, and a few other regions. Large parts of the Global South—Africa, South America, and parts of Asia—have very sparse radar coverage, which means the quality of short-term precipitation forecasts there is often much lower. The new system aims to close that gap: Global MetNet combines several global data sources: • Geostationary satellite observations • Global Precipitation Measurement (GPM) CORRA datasets • Numerical weather prediction (NWP) model outputs • Radar observations where they are available Using these inputs, the AI model produces global precipitation forecasts up to 12 hours ahead, with a spatial resolution of about 0.05° (~5 km) and a temporal resolution of 15 minutes. Some impressive facts from the study: • The model can generate forecasts in under one minute, making it suitable for real-time operational use. • It shows significantly higher skill than standard hourly forecast products across multiple evaluation metrics such as the Critical Success Index (CSI) and Fractions Skill Score (FSS). • In regions with sparse observations, the system can outperform even high-resolution numerical models used in data-rich regions like the United States. Perhaps the most surprising fact: This research is not just experimental. The system is already running operationally and powering precipitation forecasts seen by millions of users in Google Search. The broader implication is fascinating. Weather forecasting is entering a new era where AI models integrate satellite observations, numerical models, and sparse ground observations to produce global real-time predictions. This could significantly improve early warnings for floods and extreme rainfall in regions where observational infrastructure is limited. And in the context of a warming climate—with more frequent high-impact rainfall events—such tools could become increasingly critical for protecting lives and infrastructure. Paper: https://lnkd.in/dySbu-gM #Meteorology #AI #WeatherForecasting #Nowcasting #ClimateScience #EarthObservation
-
Weather forecasting continues to change faster in two years than it has for two decades. More progress here this week towards the milestone of AI weather model initialisation directly from the raw observations. 🛰️📡🌡️ This matters because the state-of-the-art AI weather models still rely on the incumbent physical models for their training, and for their operational initialisation (called data assimilation). This reliance hampers AI models with the weaknesses of the physical models, including a slow update speed, and challenges in information gain important #satellite and radar data. The details are a bit thin in this early access paper, the model resolution is coarse, and there’s still a reliance on past physical model data for training. However, the authors do claim to more or less match the accuracy of global physical model forecasts from European Centre for Medium-Range Weather Forecasts - ECMWF and NOAA: National Oceanic & Atmospheric Administration This result came using an order of magnitude less observation data than the physical models. This suggests that once more observation data can be utilised, there will be more gains. No doubt we will see an avalanche of research published through the rest of this year on these topics of end-to-end forecasting, AI data assimilation, and hybrid physics+AI modelling. Ryan Keisler Andreas Schlueter Jesper Dramsch Daniel Rothenberg Kate Duffy Thomas Vandal
-
Each week brings another extreme weather event - but behind the headlines is a quieter, systemic crisis: a global shortage of high-frequency, high-quality atmospheric observations to power accurate forecasts where they’re needed most. Forecast accuracy starts with assimilating the right data into numerical weather prediction models - a process often misunderstood, but foundational to effective early warning systems worldwide. Hear from Ryan Honeyager, Tomorrow.io’s Senior Data Assimilation Scientist, on how this process works and why Tomorrow.io’s satellite-derived microwave data is uniquely positioned to close critical observational gaps for national meteorological agencies and resilience teams across the globe. Watch now: https://okt.to/dGZq6j #DataAssimilation #SatelliteData #ForecastAccuracy #DaaS #WeatherResilience #GlobalForecasting #EarlyWarningSystems