apache / spark
Apache Spark - A unified analytics engine for large-scale data processing
See what the GitHub community is most excited about today.
Apache Spark - A unified analytics engine for large-scale data processing
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
♞ lichess.org: the forever free, adless and open source chess server ♞
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Scala language server with rich IDE features 🚀
Source code for Twitter's Recommendation Algorithm
Modern Load Testing as Code
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
An open protocol for secure data sharing
A 'new look' for database access in Scala
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Chisel: A Modern Hardware Design Language
sbt, the interactive build tool
State of the Art Natural Language Processing
The pure asynchronous runtime for Scala
Mill is a fast build tool that supports Java, Scala, Kotlin and many other languages. 2-4x faster than Gradle and 4-10x faster than Maven for common workflows, Mill aims to make your project’s build process performant, maintainable, and flexible
A platform to build and run apps that are elastic, agile, and resilient. SDK, libraries, and hosted environments.
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more