Data platforms shouldn’t rely on reactive firefighting. 🚫🔥 That’s why we’re excited to introduce Apache Gravitino 1.2.0. This release brings: • Automated lakehouse maintenance with Table Maintenance Service • ClickHouse catalog for real-time analytics governance • End-to-end UDF management • Scan planning offload for faster queries • Expanded ecosystem across Trino, Apache Flink, and Delta Lake We are building the open metadata layer for the modern data stack. 🏗️ 🔗 Check out the full release notes here: https://lnkd.in/gdyiENq5 #ApacheGravitino #OpenSource #DataGovernance #ModernDataStack #DataEngineering #Lakehouse #BigData #Metadata #ClickHouse #Apache
Datastrato
Software Development
San Mateo, CA 1,404 followers
Original creator of Apache Gravitino. Unified metadata platform for AI - multi-cloud, multi-engine and multi-modal.
About us
Datastrato is building the open data fabric platform to accelerate trusted AI. The company is the original creator of Apache Gravitino, unified metadata platform for AI - multi-cloud, multi-engine and multi-modal.
- Website
-
https://datastrato.ai/
External link for Datastrato
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Mateo, CA
- Type
- Privately Held
- Founded
- 2023
- Specialties
- Apache Gravitino, Universal Data Catalog, Federated Metadata Lake, Unified Metadata Management, Iceberg REST Catalog, Agentic Data Architecture, Data Agents, AI Infrastructure, Model Registry, Multi-Modal Data Management, AI Data Fabric, Unstructured Data Management, Data Governance, Data Lakehouse, Single Source of Truth, Multi-cloud Data Management, Open Source Software, AI-Native Date Lakehouse, and Modern Data System
Locations
-
Primary
Get directions
San Mateo, CA 94402, US
-
Get directions
Palo Alto, US
Employees at Datastrato
Updates
-
Most data systems were built for humans, not AI. That’s why agents hallucinate and fail in production. Datastrato has just unveiled ADP (Agentic Data Protocol) — the "Missing Layer" that gives AI agents the eyes to understand business context and the governed interfaces to act safely. The Agentic Data era starts now. 🚀
-
It’s 2:00 AM. A critical pipeline breaks. Who fixes it? 🛠️ Traditionally, an engineer wakes up. In the "Agentic" future, an AI agent detects, analyzes, and resolves the issue before you even wake up. But for this to work, our data systems need a radical redesign. In his latest article, our founder Junping Du explores how the intersection of OpenClaw and open data architecture is paving the way for autonomous data operations. Discover why metadata and a unified control plane (like Apache Gravitino) are the keys to unlocking the next decade of data innovation. Read the full vision here: 🔗 https://lnkd.in/grNsSi5w #AI #DataEngineering #OpenSource #GenerativeAI #Datastrato #ApacheGravitino #AgenticAI
-
Streaming data management just got a whole lot easier. Learn how Apache Gravitino meets Apache Flink to empower your streaming pipelines with seamless metadata access and unified control. Shout out to FANNG 1 for this deep dive into streamlining the streaming stack! 🙌 If you're building modern data platforms, this is a must-read. Check it out: https://lnkd.in/gXgwbvcT #Datastrato #MetadataManagement #RealTimeData #DataEngineering #OSS
Empower Your Streaming Pipelines: Apache Gravitino meets Apache Flink! 🚀 🔗https://lnkd.in/gXgwbvcT Managing metadata across diverse sources in streaming architectures can be a major headache. In our latest Gravitino 101 series, we dive deep into how Apache Gravitino integrates with Apache Flink to simplify streaming data management. What’s inside: ✅ Seamlessly managing Flink catalogs with Gravitino. ✅ Enabling unified metadata access for real-time pipelines. ✅ Step-by-step guide to streamlining your streaming stack. Check out the full blog post to see how you can bring better governance to your streaming data! #ApacheGravitino #ApacheFlink #DataInfrastructure #Streaming #OpenSource #DataEngineering
-
What can a 1989 episode of Star Trek teach us about 2026 AI? 🛰️ Datatrato CEO Junping Du explains why the secret to successful AI Agent systems lies in the balance between: 🤖 Data: Computation, Precision, Automation 🧠 Picard: Judgment, Ethics, Governance It’s not Human vs. Machine—it’s Human + Machine. #AIAgents #DataScience #OpenClaw #StarTrek #Datastrato
-
👏The Linux Foundation Member Summit wrapped up yesterday in Napa. 🥂We’re grateful to have taken part in the conversations alongside engineers, maintainers, and community leaders from across the Bay Area and beyond. For us, showing up matters — listening, learning, and engaging in thoughtful discussions around open collaboration, governance, and long-term sustainability in open source. 🎉Thank you to everyone who shared insights and perspectives. We value the opportunity to participate and look forward to continuing the dialogue. #OpenSource #LFMemberSummit #BayAreaTech
-
-
Everyone talks about data lakes. No one talks about catalog sprawl. That’s the real bottleneck. Apache Gravitino introduces a new layer: A control plane across multimodal catalogs. Not another engine. A catalog of catalogs. 🎥 Watch the short: https://lnkd.in/gmuphqtR 💻 Explore the project on GitHub: https://lnkd.in/g6YpuiGu #AIAgents #AgenticAI #AIInfrastructure #AIDataStack #Metadata #DataGovernance #OpenSource
-
🚀 New Deployment Guide is Live! We’ve published a step-by-step guide on deploying Iceberg REST Catalog with access control using Apache Gravitino. 👉 Read the full article on DEV: https://lnkd.in/gnKKubMn Learn how to: • Enable RBAC for Iceberg tables • Configure Gravitino + REST service • Integrate with Spark • Implement table-level permissions If you're running Apache Iceberg in production, this guide is for you! #datacatalog #ApacheGravitino #ApacheIceberg #Lakehouse #DataGovernance #OpenSource #DataPlatform
-
-
💡A quick walkthrough on how Apache Gravitino can simplify metadata management for Apache Spark–based ETL pipelines.
🚀 Apache Gravitino 101: A New Tutorial on Apache Gravitino + Apache Spark for ETL 🔗 Read now: https://lnkd.in/gatnZvbq ETL pipelines are only as good as the metadata behind them. In our latest Apache Gravitino 101 tutorial, we explore how Apache Gravitino integrates with Apache Spark to enable more consistent, scalable ETL workflows. 💡In this tutorial, you’ll learn how to: - Integrate Apache Gravitino with Apache Spark for ETL workloads - Use a centralized catalog to manage schemas and tables - Reduce metadata inconsistency across Spark jobs - Design ETL pipelines with governance in mind We’ll continue sharing more hands-on Apache Gravitino tutorials focused on real-world data platform challenges. Feel free to share your thoughts or questions in the comments 👇 #ApacheGravitino #OpenSource #ApacheSpark #ETL #DataEngineering #Metadata #DataPlatform
-
📣 Join Us at Iceberg Summit 2026 Iceberg Summit 2026 is just around the corner — and we’d love to see you there. 👉 Register here: https://lnkd.in/gpmY4gAr 📍 San Francisco 📅 April 8–9, 2026 This year, Datastrato will be attending the Summit to connect with data engineers, architects, and industry leaders, and to exchange ideas on modern data platforms and real-world production use cases. Whether you’re looking to learn, share experiences, or expand your network, Iceberg Summit is a great place to engage with the data community and explore what’s next. Special thanks to the Iceberg Summit team and the Apache Iceberg community for bringing everyone together. #IcebergSummit #ApacheIceberg #DataEngineering #DataInfrastructure #OpenData #ModernDataStack #Datastrato
-