2,233 questions
-3
votes
0
answers
43
views
Avoiding counter update contention under high write throughput [closed]
We maintain multiple counters where each incoming request increments or decrements one or more counters. These counters are bounded by a max value, once that is reached, we reject the requests.
...
0
votes
0
answers
28
views
Why does MixCoord keep routing requests to stale QueryNodes after a Kubernetes node reboot in Milvus?
MixCoord keeps routing requests to non-existent QueryNodes after a Kubernetes worker node reboot in Milvus
I’m running a Milvus 2.5.x cluster on Kubernetes, where each worker node hosts a full set of ...
0
votes
0
answers
52
views
Distributed Tensorflow with mulitple GPUS training MNIST with Optuna is stuck when training
I created a 5 GPU Cluster using three nodes/machines locally using the tensorflow.distributed.MultiWorkerMirrored Strategy. One machine has the Apple M1 Pro Metals GPU, the other two nodes has NVIDIA ...
0
votes
0
answers
44
views
How does clusters work in TensorFlow in the parameterServerStrategy?
I don't seem to understand how clusters work in the parameterServerStrategy in TensorFlow, and I need some clarifications.
I have read this tutorial, but they don't mention or explain clearly how to ...
0
votes
0
answers
82
views
Low CIFAR-10 Accuracy (60%) in Decentralized Federated Learning (DFL) - Seeking Improvement
I implemented an algorithm in a Decentralized Federated Learning (DFL) environment. When I experimented with MNIST and Fashion-MNIST, I achieved an accuracy of 80–90%. However, when testing with CIFAR-...
0
votes
0
answers
461
views
Facing issue with connecting to socket with DDP and Pytorch (single node, multi-GPU communication)
I am completely new to distributed programming and I have been trying to port the original code that ran on a multi-node cluster to single-node cluster with multiple GPUs. My goal is to simulate a ...
0
votes
0
answers
38
views
Better way to integrate Kafka with Akka Cluster Sharding
We have Kafka as the bus and Akka Cluster Sharding as the application distributed cluster. So we need to consume data from Kafka and process them in Akka Cluster.
For a now we implement separate ...
0
votes
0
answers
92
views
SLRUM: troch distributed: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED
I am new to pytorch-distributed, and any input will help. I have a code working with a single GPU. I am trying to make it distributed. I am getting a socket connect error. Below is the code ( I am ...
0
votes
1
answer
42
views
How to maintain synchronization between distributed python processes?
I have a number of workstations that run long processes containing sequences like this:
x = wait_while_current_is_set
y = read_voltage
z = z + y
The workstations must maintain synchronization with a ...
0
votes
0
answers
143
views
PyTorch Distributed and Multiprocessing: Processes execute from the start of the whole script not the place where them are created
Introduction
I'm new in PyTorch distributed and multiprocessing and I met the unexpected problems:
I have leant that processes created by spawn will execute the given function, but my processes ...
2
votes
2
answers
230
views
Client request failure in raft
Imagine a 3 node raft cluster. Each node is in sync has log [1,2,3] and entry 3 is committed by the leader.
Now leader receives an entry 4 but fails to commit it because of unreliable network and ...
1
vote
0
answers
211
views
Distributed SQL Caching in .Net 4.7.2
Has anyone used distributed SQL caching in .Net 4.7.2 ? I have seen many sample code for SQL caching with .Net Core but not with .Net Framework 4.7.2.
We are currently using Redis cache in the ...
-1
votes
1
answer
394
views
SeaweedFS S3 Gateway Stuck Connecting to Incorrect gRPC Port [closed]
I've been setting up SeaweedFS on a cluster of three nodes and encountered issues when configuring the S3 gateway. The S3 gateway tries to connect to the incorrect gRPC port 28888 instead of the ...
0
votes
1
answer
483
views
Insert into local table SELECT from distributed table in clickhouse caused default.local_table at other node not exists error
I need select data from some distributed and local table, insert into another stand alone local table。I use sql like this: INSERT into local_table SELECT FROM distributed_table WHERE ... . The ...
1
vote
2
answers
371
views
Uniswap use SDK to get historical rates (and current rate)
I am trying to use the Uniswap SDK to get historical rates between two coins on a pool. I believe the rate is simply just xy = k, where k is a constant. If someone buys n coins of x, the cost in terms ...