Closed
Description
Is your feature request related to a problem or challenge?
The size of datafusion's binary has grown significantly in the last few releases
This likely leads to higher compile times as well as larger overall binary size
version | size of datafusion-cli binary |
---|---|
main at 57d1309 |
92M |
43.0.0 |
87M |
42.0.0 |
83M |
41.0.0 |
72M |
40.0.0 |
69M |
39.0.0 |
68M |
The sizes are measured like this:
git checkout version
cd datafusion-cli
cargo build --release
du -h target/release/datafusion-cli
Also, people such as @g3blv have noticed that the WASM build has increased 50%:
#9834 (comment)
Describe the solution you'd like
I would like to reduce the binary size of DataFusion if possible
At least I would like to understand where the code size comes from and offer hints about how to reduce the size if needed
Describe alternatives you've considered
A common source of code size is templated functions (as that generates multiple copies of the same function(s)).
Here is some fascianting information from running cargo bloat -p datafusion
File .text Size Crate Name
0.1% 0.3% 79.7KiB blake2 blake2::Blake2bVarCore::compress
0.1% 0.2% 70.7KiB blake2 blake2::Blake2sVarCore::compress
0.1% 0.2% 67.1KiB sqlparser <sqlparser::ast::Statement as core::fmt::Display>::fmt
0.1% 0.2% 61.4KiB blake3 _blake3_hash4_neon
0.1% 0.2% 56.4KiB chrono_tz <chrono_tz::timezones::Tz as chrono_tz::timezone_impl::TimeSpans>::timespans
0.1% 0.2% 44.7KiB arrow_cast <i64 as lexical_write_integer::api::ToLexical>::to_lexical
0.1% 0.1% 42.8KiB arrow_cast arrow_cast::cast::cast_with_options
0.0% 0.1% 35.9KiB rand <rand_chacha::chacha::ChaCha12Core as rand_core::block::BlockRngCore>::generate
0.0% 0.1% 34.9KiB arrow_cast lexical_parse_float::slow::parse_mantissa
0.0% 0.1% 33.1KiB arrow_cast lexical_parse_float::parse::parse_complete
0.0% 0.1% 33.1KiB arrow_cast lexical_parse_float::parse::parse_complete
0.0% 0.1% 29.0KiB regex_automata regex_automata::hybrid::search::find_fwd
0.0% 0.1% 27.6KiB blake3 blake3::portable::compress_in_place
0.0% 0.1% 27.1KiB aho_corasick aho_corasick::automaton::try_find_fwd
0.0% 0.1% 25.2KiB sqlparser <sqlparser::ast::Expr as core::fmt::Display>::fmt
0.0% 0.1% 23.8KiB datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array
0.0% 0.1% 23.7KiB datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array
0.0% 0.1% 23.7KiB datafusion_physical_expr datafusion_common::scalar::ScalarValue::iter_to_array
0.0% 0.1% 23.7KiB datafusion_functions_aggregate datafusion_common::scalar::ScalarValue::iter_to_array
0.0% 0.1% 22.0KiB arrow_cast <u64 as lexical_write_integer::api::ToLexical>::to_lexical
36.7% 97.4% 27.7MiB And 139272 smaller methods. Use -n N to show more.
37.7% 100.0% 28.4MiB .text section size, the file size is 75.4MiB
Additional context
No response