21,954 questions
2
votes
1
answer
55
views
Dask client connects successfully but no workers are available
I am using Dask for some processing. The client starts successfully, but I am seeing zero workers.
This is how I am creating the client:
client = Client("tls://localhost:xxxx")
This is the ...
-3
votes
0
answers
47
views
How do multiple kernels and streams affect gpu utilisation if single kernel is not enough [closed]
I’m trying to reason about GPU utilisation and I feel like I’m missing something. If kernels in the default stream run sequentially, then how do we actually fully utilise the GPU? A single kernel ...
1
vote
1
answer
148
views
Why when I place I/O task before CPU-bound task, runs faster than place I/O task after CPU-bound task?
using System.Diagnostics;
const int TASKS = 100;
var mainSw = Stopwatch.StartNew();
var tasks = Enumerable.Range(0, TASKS).Select(i =>
Task.Run(async () =>
{
await Task.Delay(...
Best practices
0
votes
5
replies
111
views
Parallel.ForEach Returning Values
I need to process a list of objects (not the same shown on the sample), which I thought could be greatly improved by running it in parallel.foreach loop. However, the result is not what I expected. ...
Advice
1
vote
0
replies
45
views
What is the best pattern for triggering N sub-workflows in parallel and resuming main workflow when all complete?
I need to trigger a dynamic number of sub-workflows in parallel (around 100)
and wait for ALL of them to complete before continuing the main workflow.
I’ve implemented a solution but I’m wondering if ...
-4
votes
0
answers
44
views
What is the Global Interpreter Lock (GIL) in Python and why does it prevent true multithreading? [duplicate]
I’ve been reading about Python’s Global Interpreter Lock (GIL), and I’m a bit confused about how it actually works behind the scenes.
From what I understand, the GIL allows only one thread to execute ...
Advice
2
votes
2
replies
63
views
Efficient MPI Parallelization Strategies for Localized PDE Subproblems within a Globally Decomposed Domain
I am working on a global PDE problem that is solved using a standard domain-decomposition strategy (e.g., Scotch, METIS). This part of the computation is well balanced across all MPI processes.
...
Tooling
1
vote
3
replies
77
views
using persistent-memory gawk how variables can created to be local and issolated from other execution instances?
The idea of Persistent-Memory gawk is fabulous because it improves the performance, size, and clarity of many scripts on static and reference data.
However, I have a significant problem in adopting ...
1
vote
0
answers
84
views
How to share a large CustomObject to workers in Python multiprocessing on Windows (spawn)?
I'm trying to run calculations using multiple cores in Python on multiple platforms (Linux, macOS, Windows). I need to pass a large CustomClass Object and a dict (both readonly) to all workers. So far ...
0
votes
0
answers
46
views
Attribution Error when using Huggingface transformers Trainer with FSDP
I am now trying to use FSDP in Huggingface transformers Trainer. The training script is something like
train_dataset = Mydataset(...)
args = TrainingArguments(...)
model = LlamaForCausalLM....
0
votes
0
answers
22
views
OptimisticLockingException when using multiInstanceLoopCharacteristics for parallel execution of subprocess
I have the following process definition I try to execute on Camunda 7.24 / CibSeven 2.1 which currently logs during execution many OptimisticLockingException. I could already trace it down that it ...
0
votes
1
answer
126
views
Why are items not written to console immediately after being processed?
I have the following C# code :
var rand = new Random(1);
var range = Enumerable.Range(1, 8);
var partition = Partitioner.Create(range, EnumerablePartitionerOptions.NoBuffering);
foreach (var x in ...
0
votes
1
answer
100
views
Taking advantage of memory contiguousness in HLSL
This is a bit of a slog so bare with me.
I'm currently writing a 3D S(moothed) P(article) H(ydrodynamics) simulation in Unity with a parallel HLSL backend. It's a Lagrangian method of fluid simulation,...
Tooling
0
votes
0
replies
36
views
ComfyUI + Flux 1 dev + limited RAM + same workflow: With 2 GPUs?
I am running Flux 1 dev text to image model through ComfyUI in Kaggle. Everything works but I noticed that Kaggle offers a second GPU inside the notebook. If I try to run two instances of the ComfyUI ...
1
vote
0
answers
81
views
Intuition over TBB parallel scan/parallel prefix requirements
I am reading a paragraph about the tbb::parallel_scan algorithm from the book Intel Threading Building Blocks, and I understood what the operation does serially, but I am not understanding what are ...