I’m using ParallelMap on a very large dataset (millions of elements), but it quickly consumes all available RAM.
For example:
result = ParallelMap[func, data]
This causes RAM usage to fill up completely.
To reduce memory usage, I tried dividing the data into 20 or more parts, but it doesn’t seem to help. I’m not sure if this is the correct approach and I just need to divide into more chunks, or if there’s something fundamentally wrong with this method.
blockSize = Ceiling[Length[data]/20];
parts = Partition[data, blockSize, blockSize, {1, 1}, {}];
result = Reap[
Do[
partialRes = ParallelMap[func, part, Method -> "FinestGrained"];
Sow[partialRes];
Clear[partialRes];
,
{part, parts}
]
]
Even when splitting the data into 20 or more chunks, the RAM usage still becomes full.
Is there a better way to manage memory when using ParallelMap on large datasets?