Skip to content

Datafusion binary size has been getting bigger #13816

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

The size of datafusion's binary has grown significantly in the last few releases

This likely leads to higher compile times as well as larger overall binary size

version size of datafusion-cli binary
main at 57d1309 92M
43.0.0 87M
42.0.0 83M
41.0.0 72M
40.0.0 69M
39.0.0 68M

The sizes are measured like this:

git checkout version
cd datafusion-cli
cargo build --release
du -h target/release/datafusion-cli

Also, people such as @g3blv have noticed that the WASM build has increased 50%:
#9834 (comment)

Describe the solution you'd like

I would like to reduce the binary size of DataFusion if possible

At least I would like to understand where the code size comes from and offer hints about how to reduce the size if needed

Describe alternatives you've considered

A common source of code size is templated functions (as that generates multiple copies of the same function(s)).

Here is some fascianting information from running cargo bloat -p datafusion

 File  .text    Size                          Crate Name
 0.1%   0.3% 79.7KiB                         blake2 blake2::Blake2bVarCore::compress
 0.1%   0.2% 70.7KiB                         blake2 blake2::Blake2sVarCore::compress
 0.1%   0.2% 67.1KiB                      sqlparser <sqlparser::ast::Statement as core::fmt::Display>::fmt
 0.1%   0.2% 61.4KiB                         blake3 _blake3_hash4_neon
 0.1%   0.2% 56.4KiB                      chrono_tz <chrono_tz::timezones::Tz as chrono_tz::timezone_impl::TimeSpans>::timespans
 0.1%   0.2% 44.7KiB                     arrow_cast <i64 as lexical_write_integer::api::ToLexical>::to_lexical
 0.1%   0.1% 42.8KiB                     arrow_cast arrow_cast::cast::cast_with_options
 0.0%   0.1% 35.9KiB                           rand <rand_chacha::chacha::ChaCha12Core as rand_core::block::BlockRngCore>::generate
 0.0%   0.1% 34.9KiB                     arrow_cast lexical_parse_float::slow::parse_mantissa
 0.0%   0.1% 33.1KiB                     arrow_cast lexical_parse_float::parse::parse_complete
 0.0%   0.1% 33.1KiB                     arrow_cast lexical_parse_float::parse::parse_complete
 0.0%   0.1% 29.0KiB                 regex_automata regex_automata::hybrid::search::find_fwd
 0.0%   0.1% 27.6KiB                         blake3 blake3::portable::compress_in_place
 0.0%   0.1% 27.1KiB                   aho_corasick aho_corasick::automaton::try_find_fwd
 0.0%   0.1% 25.2KiB                      sqlparser <sqlparser::ast::Expr as core::fmt::Display>::fmt
 0.0%   0.1% 23.8KiB              datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 23.7KiB              datafusion_common datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 23.7KiB       datafusion_physical_expr datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 23.7KiB datafusion_functions_aggregate datafusion_common::scalar::ScalarValue::iter_to_array
 0.0%   0.1% 22.0KiB                     arrow_cast <u64 as lexical_write_integer::api::ToLexical>::to_lexical
36.7%  97.4% 27.7MiB                                And 139272 smaller methods. Use -n N to show more.
37.7% 100.0% 28.4MiB                                .text section size, the file size is 75.4MiB

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions