Command line tool useful for migration, transformations, backup, and restore of documents stored inside cores of Apache Solr.
- Should work well in most common cases.
- If you find a bug or need a new feature, please check the issues in github and create one if it doesn't exists yet.
- Patches welcome!
- Use the command
solrcopy backupfor dumping documents from a Solr core into local zip files.- Use the switch
--queryfor filtering the documents extracted by using a Solr Query - Use the switch
--orderfor specifying the sorting of documents extracted. - Use the switches
--limitand--skipfor restricting the number of documents extracted. - Use the switches
--selectand--excludefor restricting the columns extracted.
- Use the switch
- Use the command
solrcopy restorefor uploading the extracted documents from local zip files into the same Solr core or another with same field names as extracted.- The documents are updated in the target core in the same format that they were extracted.
- The documents are inserted/updated based on their
uniqueKeyfield defined in core. - If you want to change the documents/columns use the switches in
solrcopy backupfor extracting more than one slice of documents to be updated.
The following environment variables can be used for common parameters:
SOLR_COPY_URLfor the url pointing to the Solr clusterSOLR_COPY_DIRfor the existing folder where the zip backup files containing the extracted documents are stored
These variables can also be stored in a .env file alongside the solrcopy binary. See .env.example
Extracting and updating documents in huge cores can be challenging. It can take too much time and can fail any time.
Bellow some tricks for dealing with such cores:
- For reducing time, you can use the switches
--readersand--writersfor executing operations in parallel. - When the number of docs to extract is huge,
backupsubcommand tend to slow as times goes and eventually fails. This is because Solr is suffers to get docs batches with hight skip/start parameters. For dealing with this:- Use the parameters
--iterate-bynbetweenand--stepfor iterating through parameter--querywith variables{begin}and{end}. - This way it will iterate and restrict by hour, day, range the docs being downloaded.
- For example:
--query 'date:[{begin} TO {end}]' --iterate-by day --between '2020-04-01' '2020-04-30T23:59:59'
- Use the parameters
- Use the parameter
--param shards=shard1for copying by each shard by name inbackkupsubcommand. - Use the parameter
--delayfor avoiding to overload the Solr server.
$ solrcopy --help
Command line tool for backup and restore of documents stored in cores of Apache Solr.
Solrcopy is a command for doing backup and restore of documents stored on Solr cores. It let you filter docs by using a expression, limit quantity, define order and desired columns to export. The data is stored as json inside local zip files. It is agnostic to data format, content and storage place. Because of this data is restored exactly as extracted and your responsible for extracting, storing and updating the correct data from and into correct cores.
Usage: solrcopy <COMMAND>
Commands:
backup Dumps documents from a Apache Solr core into local backup files
restore Restore documents from local backup files into a Apache Solr core
commit Perform a commit in the Solr core index for persisting documents in disk/memory
delete Removes documents from the Solr core definitively
generate Generates man page and completion scripts for different shells
help Print this message or the help of the given subcommand(s)
Options:
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
$ solrcopy backup --help
Dumps documents from a Apache Solr core into local backup files
Usage: solrcopy backup [OPTIONS] --core <core> --dir </path/to/output>
Options:
-u, --url <URL>
Url pointing to the Solr cluster
[env: SOLR_COPY_URL=]
[default: http://localhost:8983/solr]
-c, --core <core>
Case sensitive name of the core in the Solr server
-d, --dir </path/to/output>
Existing folder where the backuped files containing the extracted documents are stored
[env: SOLR_COPY_DIR=]
-q, --query <'f1:vl1 AND f2:vl2'>
Solr Query param 'q' for filtering which documents are retrieved See: <https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html>
-f, --fq <'f1:vl1 AND f2:vl2'>
Solr Filter Query param 'fq' for filtering which documents are retrieved
-o, --order <f1:asc,f2:desc,...>
Solr core fields names for sorting documents for retrieval
-k, --skip <quantity>
Skip this quantity of documents in the Solr Query
[default: 0]
-l, --limit <quantity>
Maximum quantity of documents for retrieving from the core (like 100M)
-s, --select <field1,field2,...>
Names of core fields retrieved in each document [default: all but _*]
-e, --exclude <field1,field2,...>
Names of core fields excluded in each document [default: none]
-i, --iterate-by <mode>
Slice the queries by using the variables {begin} and {end} for iterating in `--query` Used in bigger solr cores with huge number of docs because querying the end of docs is expensive and fails frequently
[default: day]
Possible values:
- none
- minute: Break the query in slices by a first ordered date field repeating between {begin} and {end} in the query parameters
- hour
- day
- range: Break the query in slices by a first ordered integer field repeating between {begin} and {end} in the query parameters
-b, --between <begin> <end> <begin> <end>
The range of dates/numbers for iterating the queries throught slices. Requires that the query parameter contains the variables {begin} and {end} for creating the slices. Use numbers or dates in ISO 8601 format (yyyy-mm-ddTHH:MM:SS)
--step <num>
Number to increment each step in iterative mode
[default: 1]
-p, --params <useParams=mypars>
Extra parameter for Solr Update Handler. See: <https://lucene.apache.org/solr/guide/transforming-and-indexing-custom-json.html>
-m, --max-errors <count>
How many times should continue on source document errors
[default: 0]
--delay-before <time>
Delay before any processing in solr server. Format as: 30s, 15min, 1h
--delay-per-request <time>
Delay between each http operations in solr server. Format as: 3s, 500ms, 1min
--delay-after <time>
Delay after all processing. Usefull for letting Solr breath
--num-docs <quantity>
Number of documents to retrieve from solr in each reader step
[default: 4k]
--archive-files <quantity>
Max number of files of documents stored in each archive file
[default: 40]
--archive-prefix <name>
Optional prefix for naming the archive backup files when storing documents
--archive-compression <compression>
Compression method to use for compressing the archive files [possible values: stored, zip, zstd ]
[default: zip]
--workaround-shards <count>
Use only when your Solr Cloud returns a distinct count of docs for some queries in a row. This may be caused by replication problems between cluster nodes of shard replicas of a core. Response with 'num_found' bellow the greatest value are ignored for getting all possible docs. Use with `--params shards=shard_name` for retrieving all docs for each shard of the core
[default: 0]
-r, --readers <count>
Number parallel threads exchanging documents with the solr core
[default: 1]
-w, --writers <count>
Number parallel threads syncing documents with the archives files
[default: 1]
--log-level <level>
What level of detail should print messages
[default: INFO]
--log-mode <mode>
Terminal output to print messages
[default: mixed]
--log-file-path <path>
Write messages to a local file
--log-file-level <level>
What level of detail should write messages to the file
[default: DEBUG]
-h, --help
Print help (see a summary with '-h')
$ solrcopy backup --url http://localhost:8983/solr --core demo --query 'price:[1 TO 400] AND NOT popularity:10' --order price:desc,weight:asc --limit 10000 --select id,date,name,price,weight,popularity,manu,cat,store,features --dir ./tmp
$ solrcopy restore --help
Restore documents from local backup files into a Apache Solr core
Usage: solrcopy restore [OPTIONS] --url <localhost:8983/solr> --core <core> --dir </path/to/output>
Options:
-u, --url <localhost:8983/solr> Url pointing to the Solr cluster [env: SOLR_COPY_URL=]
-c, --core <core> Case sensitive name of the core in the Solr server
-d, --dir </path/to/output> Existing folder where the zip backup files containing the extracted documents are stored [env: SOLR_COPY_DIR=]
-f, --flush <mode> Mode to perform commits of the documents transaction log while updating the core [possible values: none, soft, hard, <interval>] [default: hard]
--no-final-commit Do not perform a final hard commit before finishing
--disable-replication Disable core replication at start and enable again at end
-p, --params <useParams=mypars> Extra parameter for Solr Update Handler. See: https://lucene.apache.org/solr/guide/transforming-and-indexing-custom-json.html
-m, --max-errors <count> How many times should continue on source document errors [default: 0]
--delay-before <time> Delay before any processing in solr server. Format as: 30s, 15min, 1h
--delay-per-request <time> Delay between each http operations in solr server. Format as: 3s, 500ms, 1min
--delay-after <time> Delay after all processing. Usefull for letting Solr breath
-s, --search <core*.zip> Search pattern for matching names of the zip backup files
--order <asc | desc> Optional order for searching the zip archives
-r, --readers <count> Number parallel threads exchanging documents with the solr core [default: 1]
-w, --writers <count> Number parallel threads syncing documents with the zip archives [default: 1]
--log-level <level> What level of detail should print messages [default: info]
--log-mode <mode> Terminal output to print messages [default: mixed]
--log-file-path <path> Write messages to a local file
--log-file-level <level> What level of detail should write messages to the file [default: debug]
-h, --help Print help
$ solrcopy restore --url http://localhost:8983/solr --dir ./tmp --core demo
$ solrcopy delete --help
Removes documents from the Solr core definitively
Usage: solrcopy delete [OPTIONS] --query <f1:val1 AND f2:val2> --url <localhost:8983/solr> --core <core>
Options:
-u, --url <localhost:8983/solr> Url pointing to the Solr cluster [env: SOLR_COPY_URL=]
-c, --core <core> Case sensitive name of the core in the Solr server
-q, --query <f1:val1 AND f2:val2> Solr Query for filtering which documents are removed in the core.
Use '*:*' for excluding all documents in the core. There are no way of recovering excluded docs.
Use with caution and check twice
-f, --flush <mode> Wether to perform a commits of transaction log after removing the documents [default: soft]
--log-level <level> What level of detail should print messages [default: info]
--log-mode <mode> Terminal output to print messages [default: mixed]
--log-file-path <path> Write messages to a local file
--log-file-level <level> What level of detail should write messages to the file [default: debug]
-h, --help Print help
$ solrcopy delete --url http://localhost:8983/solr --core demo --query '*:*'
$ solrcopy commit --help
Perform a commit in the Solr core index for persisting documents in disk/memory
Usage: solrcopy commit [OPTIONS] --url <localhost:8983/solr> --core <core>
Options:
-u, --url <localhost:8983/solr> Url pointing to the Solr cluster [env: SOLR_COPY_URL=]
-c, --core <core> Case sensitive name of the core in the Solr server
--log-level <level> What level of detail should print messages [default: info]
--log-mode <mode> Terminal output to print messages [default: mixed]
--log-file-path <path> Write messages to a local file
--log-file-level <level> What level of detail should write messages to the file [default: debug]
-h, --help Print help
$ solrcopy commit --url http://localhost:8983/solr --core demo
- Error extracting documents from a Solr cloud cluster with corrupted shards or unreplicated replicas:
- Cause: In this case Cause: Solr reports diferent document count each time is answering the query.
- Fix: extract data pointing directly to the shard instance address, not for the cloud address.
- Also can use custom params to solr as
--params timeAllowed=15000&segmentTerminatedEarly=false&cache=false&shards=shard1
For compiling a version from source:
- Install rust following the instructions on https://rustup.rs
- Build with the command:
cargo build --release - Install locally with the command:
cargo install
For setting up a development environment:
For using Visual Studio Code:
- Install rust following the instructions on https://rustup.rs
- Install Visual Studio Code following the instructions on the microsoft site
- Install the following extensions in VS Code:
- vadimcn.vscode-lldb
- rust-lang.rust
- swellaby.vscode-rust-test-adapter
You can also use Intellij Idea, vim, emacs or you prefered IDE.
See also the testing in Visual Studio Code below.
For setting up a testing environment you will need:
- A server instance of Apache Solr
- A source core with some documents for testing the
solrcopy backupcommand. - A target core with same schema for testing the
solrcopy restorecommand. - Setting the server address and core names for the
solrcopyparameters in command line or IDE launch configuration.
- Select on your Solr server a existing source core or create a new one and fill with some documents.
- Clone a new target core with the same schema as the previous but without documents.
Check the Solr docker documentation for help in how to create a Solr container.
You can use cargo make to run all tasks to setup a Solr server, test source code agains the Solr server, and cleanup.
To create a local container using docker run the following cargo make command:
cargo make test-startAfter this you can test the source code agains the Solr server by running following cargo command:
cargo test --features testsolrTo create the local container, test source code and cleanup, run the following cargo make command:
cargo make testPlease also check all available tasks.
- Install docker stable for your platform
- Create the container and the cores for testing with the commands bellow.
- Check the cores created in the admin ui at
http://localhost:8983/solr
# This command creates the container with a solr server with two cores: 'demo' and 'target'
$ docker compose -f docker/docker-compose.yml up -d
# Run this command to insert some data into the cores
$ docker compose exec solr solr-ingest-all
# Run this command to test backup
$ cargo run -- backup --url http://localhost:8983/solr --core demo --dir $PWD
# Run this command to test restoring the backukp data into a existing empty core
$ cargo run -- restore --url http://localhost:8983/solr --search demo --core target --dir $PWDIts possible to create the solr container using just docker instead of docker compoose.
Follow these instructions if you'd rather prefer this way:
$ cd docker
# Pull solr latest solr image from docker hub
$ docker pull solr:slim
...
# 1. Create a container running solr and after
# 2. Create the **source** core with the name 'demo'
# 3. Import some docs into the 'demo' core
$ docker run -d --name solr4test -p 8983:8983 solr:slim solr-demo
...
# Create a empty **target** core named 'target'
$ docker exec -it solr4test solr create_core -c targetYou can use Cargo Make to execute the most common sequences required for testing, linting, and preparation for commiting.
In order to use it, you need install it before running the following command:
cargo install --force cargo-makeAfter installed, you can check all available tasks by running the following command:
$ cargo make --list-all-steps --quiet --hide-uninteresting
Basic
----------
all - Runs all lint checks and runs all tests against a local Solr container
check - Runs all lint checks and runs all basic tests possible without a Solr Server
lint - Verify the source code using all the checks configured
list - List all available tasks [aliases: default]
test - Runs tests against a local solr server created using docker compose
Lint
----------
check-compile - Check if the source code compiles
check-doc - Check if the source code has any documentation issues
check-fmt - Check if the source code follows the formatting rules
check-future - Check if the source code has any future incompatibilities
check-lint - Check if the source code has any language issues
check-msrv - Verify the minimum supported rust version
check-unused - Check if the source code has any unused dependencies
clean - Clean all compiled artifacts
Security
----------
audit - Check if the release build has any security issues and clean the compiled artifacts after
audit-release - Check if the release build has any security issues
Test
----------
test-basic - Runs tests that do not require a Solr container
test-cleanup - Cleanup the local Solr container after testing
test-solr - Runs tests against an existing local solr server
test-start - Setup a local Solr container and ingest some documents allowing to run tests manually after
Upgrade
----------
show - Show the installed and current rust toolchains
upgrade - Upgrade rustup, rust andthe rust toolchain
upgrade-check - Check if the rust toolchain is up to date
upgrade-rustup - Upgrade the rustup tool
upgrade-toolchain - Upgrade the stable rust toolchainThere are some pre-configured launch configurations in this repository for debugging solrcopy.
- Start the SOLR docker container with the procedures above.
- Run Solrcopy using one of the predefined lauch configuration.
- You will be asked for the program argumentls like:
- SolrURL
- Query
- You will be asked for the program argumentls like:
- You can also edit the settings file
.vscode/launch.jsonif you'd rather prefer:- Set the following parameters for specifying a query to extract documents:
--query--order--select--batch--skip--limit
- Check the Solr Query docs for understanding this parameters.
- Set the following parameters for specifying a query to extract documents:
- You can also run any query in Solr admin UI
Related projects and documentation: