Generally speaking, I have to admit that I don't understand everything the code is doing and that I could use a little more time in ordeerorder to understand it. However, I hope I can still give you some hopefully useful pieces of advice:
You could use a little more C++ in your code, which could be considered a bit too C-ish. What comes to my mind is
BITS_PER_WORDwhich can actually be obtained withstd::numeric_limits:enum { BITS_PER_WORD = std::numeric_limits<word_t>::digits };The same goes for several other constants such as
UINT32_MAXwhich could bestd::numeric_limits<uint32_t>::max(). I have to admit that it makes the code longer too, but it may be easier to change the integer type with a basic find-and-replace if you ever need to.You could also use the C++ algorithms instead of the C library. For example, replace
std::memsetbystd::fill_n, which is likely to be optimized back into anstd::memsetanyway while being clearer:std::fill_n(target_segment, std::size_t((max_bit + CHAR_BIT) / CHAR_BIT), 0);It is likely that it has already been optimized by the compiler but since you compute both
index / BITS_PER_WORDandindex % BITS_PER_WORD, you may as well make even more sure that these two values are computed together by usingstd::divstd::div. It also applies tobits_to_sieve % segment_sizeandbits_to_sieve / segment_sizesomewhere else in the program.If you have access to a C++11 compiler, don't hesitate to declare some variables
constepxrconstexpr, likeSIEVE_BITSwhich can totally be computed at compile time.