RAG (Retrieval - Augmented Generation) project, implemented in pure Java without relying on frameworks like JFinal or spring - boot. It provides the RAG pipeline and Agent pattern, which makes it more convenient to adapt to the enterprise - level environment and more conducive to secondary development.
public void demoNaiveRAG() {
NaiveRAG naiveRAG = new NaiveRAG(
new Document("./202X Enterprise Plan.pdf"),
"Briefly summarize this article");
try {
naiveRAG
// Parsing
.parsering()
// Chunking
.chunking()
// Vectorization
.embedding()
// Sorting
.sorting()
// LLM response
.LLMChat();
} catch (Exception e) {
e.printStackTrace();
assert false : "error stack trace";
}
System.out.println(naiveRAG.getResponse());
}๐ฝ Database Storage
- Read and write for multi-turn conversations in Redis
- File storage in MinIO
- Search engine with Elastic Search
- OpenAI chat interface
- Ollama chat interface
- Chat with multi-turn conversations
๐ Document Parsing
- Word
- PPT
- EXCEL
- PPT
- Markdown, HTML
โ๏ธ Chunking
- Fixed size
- Sentence splitting
- Recursive splitting
- Semantic chunking
๐ Vectorization Models
- Jina-Cobert
- Baichuan
๐ Search
- Recall
- Sorting
- Re-ranking
๐ more pipeline
- Advanced RAG
- Modular RAG
- MASExample.java
๐ฐ balance
- RoundRobinLoadBalancer
- WeightedRandomLoadBalancer
Explanation
โโโ agent
โโโ chunk
โโโ constant
โโโ controler
โโโ demo
โโโ entity
โโโ parser
โโโ rag
โโโ search
โโโ service
โ โโโ LLM
โ โโโ balance
โ โโโ db
โ โโโ embedding
โโโ utils
โโโ web- Clone the code
git clone https://github.com/ChinaYiqun/java-rag.git- Enter the project directory
cd java-rag- Configure Maven dependencies
mvn clean install- Create relevant databases
sysctl -w vm.max_map_count=262144
# Create a docker network
docker network create elastic
# Pull Elasticsearch
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.11.4
# Run Elasticsearch
docker run --name es01 --net elastic -p 9200:9200 -it -m 2GB docker.elastic.co/elasticsearch/elasticsearch:8.11.4
# Reset password and enrollment token
docker exec -it es01 /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic
docker exec -it es01 /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana
# Install MinIO script
mkdir -p ~/minio/data
docker run \ -p 9000:9000 \ -p 9090:9090 \ --name minio \ -v ~/minio/data:/data \ -e "MINIO_ROOT_USER=ROOTNAME" \ -e "MINIO_ROOT_PASSWORD=CHANGEME123" \ quay.io/minio/minio server /data --console-address ":9090"- See Link for details.
- OpenAI-style LLM/Embedding interfaces
- Very simple dependency management with pom.xml (Maven)
- Support for multi-user and multi-knowledge base management
- Free arrangement of search strategies: multi-channel recall, rough sorting, fine sorting, re-ranking
- Free arrangement of file chunking: fixed size, sentence splitting, recursive splitting, semantic chunking
- Support for mainstream file parsing with Apache POI
- Integration of mainstream databases: Elastic Search, Redis, Mysql, MinIO
- Highly customizable configuration with Nacos
