A data framework for biology. Makes your data queryable, traceable, reproducible, and FAIR. One API: lakehouse, lineage, feature store, ontologies, LIMS, ELN.
-
Updated
Nov 2, 2025 - Python
A data framework for biology. Makes your data queryable, traceable, reproducible, and FAIR. One API: lakehouse, lineage, feature store, ontologies, LIMS, ELN.
Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store. Supports Data Lakehouse Architecture. Clean Once and Forget.
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
Complete open-source data platform with Airbyte, Dremio, dbt, and Apache Superset - Documented in 18 languages
This project implements an end-to-end techstack for a data platform, for local development.
🚀 Scalable near-real-time data pipeline using Apache Iceberg, Spark, Kafka, and Trino. ACID-compliant JSON ingestion, processing, and analytics. Dockerized for easy deployment. #DataEngineering #DataLake
Инфраструктура для data engineer S3
STEDI project
This project is my graduation project of Bachelor degree at HUST. It's about mini data lakehouse. Just got an A on it.
Diploma thesis for ECE NTUA in the course 3189 Advanced Topics in Database Systems under the supervision of Prof. Dimitrios Tsoumakos.
Canalización desde MongoDB hacia un Data Lake de Amazon S3, creación de Data Warehouse en Amazon Redshift y visualización en Tableau.
Real‑time streaming lakehouse stack: Kafka produces JSON events that Kafka Connect (Iceberg Sink) writes transactionally into Apache Iceberg tables versioned by Nessie, stored in MinIO (S3 compatible), and queried with Trino. Includes automatic schema evolution (additive), table bootstrap, and a Python data simulator.
This project originated from my graduation thesis. It will be refactored, properly formatted, and enhanced with additional functionalities to evolve into a fully developed big data system.
DLH about NASA exoplanets
Add a description, image, and links to the data-lakehouse topic page so that developers can more easily learn about it.
To associate your repository with the data-lakehouse topic, visit your repo's landing page and select "manage topics."