Namespaces
Variants
Views
Actions

Talk:cpp/algorithm/sample

From cppreference.com

[edit] Misleading example output?

The example output of 'cdefg' being an exact in-order subset of the input range might be misleading? I think it might be more illustrative if the example had output that wasn't in order?

204.77.163.55 11:28, 5 November 2018 (PST)

good point, updated with a sample output I just got on wandbox. --Cubbi (talk) 11:39, 5 November 2018 (PST)

[edit] The algorithm is extremely slow

The complexity requirement of std::sample (and of std::ranges::sample) is insufficient when one wants to sample a small number of samples from a large population using random access iterators. As the population size tends to infinity, a naive implementation could easily provide the samples in amortized linear time with respect to the number of samples, whereas the standard requires the time to be linear only in the size of the population. In particular, as the population size grows, the naive sampling actually becomes faster, because resampling becomes less likely. In many applications, such as RANSAC, the population size can be several orders of magnitude larger than the sample size, rendering all major implementations of std::sample practically useless.

An example of a reasonably fast and flexible random sampling implementation can be found, e.g., from CPython's random module. Presumably, the complexity requirement in the C++ standard was left weak, because reasonable sampling generally requires some use of additional memory. However, I would argue that for the vast majority of users, the additional memory requirement for small sample sizes would be insignificant compared to the slowness that we currently experience, and for very small sample sizes the implementation could be specialized to use statically allocated arrays.

I would consider adding a warning to the article about the problems encountering when sampling a small number of samples from a large population.

--71.219.70.171 15:25, 16 September 2023 (PDT)