“It’s not whether you can build the system. It’s whether you can build the right one.” Modern tooling makes it easy to stand up production-grade data systems. The harder problem is defining shared concepts across an entire firm and ensuring those definitions hold as the organization grows. In our latest Substack post, we write about how data strategy has shifted from a technical challenge to a business one, and what that means in practice. 🔗 https://lnkd.in/ej3caCXp
Data Strategy Shifts from Technical to Business Challenge
More Relevant Posts
-
Gartner estimates that 80% of organisational data is unstructured, with a large portion locked away in PDFs and reports. This makes data extraction from unstructured sources critical for organisations. However, it remains one of the hardest problems to solve due to extreme variability in document layouts and formats. Even with recent advances in LLMs, achieving enterprise-grade reliability is still a major challenge. In practice, we recently built a data extraction pipeline to unlock data trapped in thousands of financial research PDFs, and it reinforced how different production reality is from demos. Brittany Bafandeh shares practical takeaways on how enterprise-grade reliability was achieved, and why plug-and-play PDF extraction tools often fail in production https://lnkd.in/diHybXMq
To view or add a comment, sign in
-
I once bragged about a database with 150 hand-crafted edit checks. In 2026, an AI-driven system would generate most of those automatically. That insight is at the heart of a new blog post from Herb Blecher at Enterprise Management Associates (EMA) on building a modern data and analytics practice in 2026. Data quality has evolved from manual rules to observability and now to AI-driven data reliability. The real question is no longer “Did it pass my rules?” but “Is the data behaving normally, and how fast can we fix it?” Herb explores how quality, observability, governance, lineage, security, and cost are rapidly converging and what that means for data teams this year. Read the full blog: https://lnkd.in/gN3CehMZ If you’re a vendor or practitioner navigating this shift, we’d love to hear what you’re seeing. What’s working? What’s falling short? What problems still don’t have real answers?
To view or add a comment, sign in
-
Flexible data models feel like a safe choice early on. They promise adaptability, speed, and fewer upfront decisions. In practice, they often become one of the most expensive parts of a system. At the beginning, flexibility looks like freedom. Schemas are loose, relationships are implicit, and edge cases are deferred “until later.” And for a while, it works. But as the system grows, that flexibility starts leaking into places where it’s hard to control. Business rules move into application code. Assumptions live in multiple services. Data meaning becomes contextual instead of explicit. The cost doesn’t show up as a single failure. It shows up as hesitation. Teams slow down because every change requires rediscovering what the data actually represents. Queries become harder to reason about. Migrations feel risky, not because they’re complex, but because nobody fully trusts the model anymore. In most systems I’ve seen, the problem wasn’t that the data model was wrong. It was that it stayed flexible for too long. Flexibility is valuable early. Clarity is valuable forever. A data model doesn’t just store information. It encodes decisions — about boundaries, ownership, and invariants. When those decisions remain implicit, the system pays interest on that ambiguity every time it changes. The question isn’t whether your data model is flexible. It’s whether it still tells the truth about how your system actually works.
To view or add a comment, sign in
-
Most data teams are solving the wrong problem, which is quietly eroding trust. The focus often lies on lakehouses, pipelines, and tools, while the critical question remains unaddressed: what actually breaks when a number is wrong? If a metric can change definition without prompting a serious conversation, it isn’t an asset; it’s an organisational risk. Good data architecture should not merely be about moving data faster. Instead, it should prioritise designing numbers that carry real consequences. Until we create systems centred on decisions rather than just dashboards, we will continue to deliver impressive platforms that lack genuine reliability.
To view or add a comment, sign in
-
-
Data teams are bleeding money and most don't even know it. This article nails the three hidden cost drivers crippling data engineering today: fragmented infrastructure with no owner, the "just add more compute" mentality, and wasteful transformations nobody audits. The fix isn't new tech, it's organizational discipline. Platform teams, cost attribution, and ruthless pipeline audits. Teams implementing these strategies are seeing 40-60% reductions in infrastructure spend. Worth the 6-minute read: https://lnkd.in/gvX2WZxd
To view or add a comment, sign in
-
Excellent blog post by our CEO Elliot Shmukler! Anomalo is uniquely positioned to handle these four key data quality requirements for enterprises: - Data that users can trust and access at all times - Systems that allow data teams to proactively address failures and issues, rather than being reactive - Governance and visibility across the entire data estate, including all data types - Interfaces and experiences that make trusted data usable by every team, not just technical experts Read more about it here: https://lnkd.in/eeSHR-Wh
To view or add a comment, sign in
-
#DataVault is sold as a “simple” methodology: separate Hubs (business keys), Links (relationships), and Satellites (descriptive history) and you get scalable, auditable enterprise data. In practice, the first real implementation teaches a different lesson: #DataVault is conceptually clean — operationally complex. Here’s where the complexity actually comes from: 1) Modeling isn’t the hard part. Interpretation is. The team debates what a true business key is, how stable it is, and how to handle key collisions, late arriving data, and survivorship. 2) Links explode faster than you expect. Once you model relationships properly (many-to-many, temporal, role-based, hierarchical), Links multiply. 3) Satellites become a governance problem, not a table. Satellite splitting (rate-of-change, source-based, privacy-based), multi-active satellites, schema drift, and attribute lineage create real overhead. 4) Loading patterns are unforgiving. Hash keys vs. natural keys, hashdiff logic, CDC, idempotency, reprocessing windows, error handling, reconciliation, and audit trails all need disciplined engineering. 5) The ‘#semantic gap’ is real. A Raw Vault is not what business users query. Without a Business Vault / Information Marts layer, adoption stalls because the model is correct but not consumable. Where #MedallionArchitecture helps Medallion (Bronze → Silver → Gold) doesn’t replace Data Vault. It organizes the journey: • Bronze = ingestion truth: land source data as-is with metadata and lineage. Great for replays and audits. • Silver = standardized integration: apply conformance rules, deduplication, survivorship, canonical formats — this is where Vault loading becomes deterministic. • Gold = consumption: publish marts, metrics, and semantic models without forcing users to understand Hubs/Links/Sats. Where #AI simplifies (and de-risks) #DataVault #AI doesn’t “auto-build” a Vault. But it can eliminate the most expensive friction points: • Automated mapping & pattern generation: source-to-vault mappings, load templates, hashdiff logic, and CDC handling created consistently. • Schema drift detection: identify breaking changes early, suggest Satellite evolution strategies, and generate migration scripts. • Entity resolution assistance: recommend business keys, matching rules, and confidence scoring for identity across systems. • Metadata & documentation at scale: attribute definitions, lineage narratives, test cases, and data quality rules produced as living artifacts. • Test acceleration: synthetic edge cases (late-arriving, duplicates, missing keys), reconciliation checks, and anomaly detection. #DataVault remains one of the best architectures for traceability and change resilience. The winning approach in 2026+ is not “Vault vs. Medallion vs. AI” — it’s using #Medallion to structure delivery, and #AI to industrialize the repetitive engineering and governance work that makes Vault hard. EXL Data Management
To view or add a comment, sign in
-
𝗧𝗵𝗲 𝗦𝗵𝗶𝗳𝘁 𝘁𝗼 𝗔𝗴𝗲𝗻𝘁-𝗡𝗮𝘁𝗶𝘃𝗲 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 2026 won’t be decided by “𝘸𝘩𝘰 𝘩𝘢𝘴 𝘵𝘩𝘦 𝘣𝘦𝘴𝘵 𝘮𝘰𝘥𝘦𝘭.” It’ll be decided by who builds the best 𝗮𝗴𝗲𝗻𝘁–𝗻𝗮𝘁𝗶𝘃𝗲 𝗶𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲. Agents don’t care about dashboards. They care about callable interfaces, deterministic semantics, and governed execution at scale. The control plane becomes the product: routing, state, policy enforcement, concurrency, and latency variance under thundering herd workloads. According to a recent article by Hariharan and colleagues, recent enterprise projects have shown that standardising business logic behind APIs enables agents to deliver major efficiency gains, such as an 85 percent reduction in testing timelines and significant improvements in workflow automation accuracy. However, there is still a fundamental challenge: we expect these agents to handle complex multi-store data joins, combining graphs, SQL, vectors, and documents, while delivering responses in under a second. The good news is that modern, purpose-built databases show that low-latency graph workloads can run in the sub-150ms p99 range in public benchmarks, while legacy approaches degrade by orders of magnitude. That gap isn’t just performance; it’s reliability, safety, and trust. And 𝘁𝗿𝘂𝘀𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗿𝗲𝗮𝗹 𝗺𝗼𝗮𝘁. If your platform can’t anchor data in persistent identity and time, including what was true versus what we knew at the time, you don’t get intelligence. You get confident-sounding fragmentation. That’s how you end up with 𝗔𝗜 𝘄𝗼𝗿𝗸𝘀𝗹𝗼𝗽: polished output that creates downstream rework and reputational risk. My takeaway: The winners over the next few years will build integrated platforms where identity + bitemporality + policy are first-class primitives, so agents can act quickly and defensibly. Curious how others are designing for agent-scale joins, entitlements, and auditability in 2026.
To view or add a comment, sign in
-
Traditional data handling isn't slow - your decision-making framework is. Most teams blame their data systems for delays. But the real bottleneck? Static frameworks that can't adapt to evolving patterns. Recursive Language Models (RLM) shift this entirely. Instead of waiting for manual analysis, RLM processes data dynamically - identifying patterns, surfacing insights, and enabling real-time decisions. The difference: → Traditional systems store and retrieve → RLM analyzes, adapts, and informs Companies that moved to AI-driven data management saw this shift firsthand. Real-time pattern recognition replaced static reports. Decision cycles compressed. Operational efficiency improved - not because data moved faster, but because the framework became responsive. The lesson: efficient data management isn't about speed. It's about building systems that learn, adapt, and support better decisions as your data evolves. If your team is still waiting on reports to act, you're not solving a data problem. You're solving a decision architecture problem.
To view or add a comment, sign in
-
-
Data Problems Are Usually Business Problems Most data problems don’t start as technical problems. They start as business problems. I often see situations where: - Finance doesn’t trust operational numbers - Operations challenge financial reports - Management delays decisions because data is “still being checked” - Meetings focus on whose numbers are right, not what to do next Technology is rarely the root cause. The real causes are usually: ✔️ Unclear definitions ✔️ No single source of truth ✔️ Fragile data flows ✔️ Lack of ownership ✔️ Architecture built around tools, not decisions When data is trusted, discussions change: - Fewer arguments - Faster decisions - Clear accountability - Better outcomes Modern data platforms exist to support better decisions, not prettier dashboards. #DataStrategy #Analytics #Azure #Fabric #Modernization #Shipping #Energy
To view or add a comment, sign in
-
Explore related topics
- The Importance of Data in Modern Manufacturing
- How Construction Firms can Use Data
- Overcoming Challenges With Real Data
- How to Build Data Product Ecosystems
- Why Production and Data Intelligence Environments Differ
- How to Build Competitive Advantage Using Quality Data
- How to Justify Data Science Work to Business Teams