OpenRefine is a free, open source power tool for working with messy data and improving it
-
Updated
May 1, 2025 - Java
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
OpenRefine is a free, open source power tool for working with messy data and improving it
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Statistical Machine Intelligence & Learning Engine
Java dataframe and visualization library
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Hopsworks - Data-Intensive AI platform with a Feature Store
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
ELKI Data Mining Toolkit
The premier open source Data Quality solution
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Categorical Query Language IDE
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Roadmap for Data Engineering
⛈️ RumbleDB 1.23.0 "Mountain Ash" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Blockchain2graph extracts blockchain data (bitcoin) and insert them into a graph database (neo4j).
Una introduccion al analisis de datos con R y R Studio
🔥 One of the most comprehensive open-source data annotation platform.
A Java Toolbox for Scalable Probabilistic Machine Learning
A point-and-click tool for creating and analyzing topic models produced by MALLET.