Skip to main content
0 votes
0 answers
28 views

MixCoord keeps routing requests to non-existent QueryNodes after a Kubernetes worker node reboot in Milvus I’m running a Milvus 2.5.x cluster on Kubernetes, where each worker node hosts a full set of ...
0 votes
0 answers
52 views

I created a 5 GPU Cluster using three nodes/machines locally using the tensorflow.distributed.MultiWorkerMirrored Strategy. One machine has the Apple M1 Pro Metals GPU, the other two nodes has NVIDIA ...
0 votes
0 answers
44 views

I don't seem to understand how clusters work in the parameterServerStrategy in TensorFlow, and I need some clarifications. I have read this tutorial, but they don't mention or explain clearly how to ...
0 votes
0 answers
82 views

I implemented an algorithm in a Decentralized Federated Learning (DFL) environment. When I experimented with MNIST and Fashion-MNIST, I achieved an accuracy of 80–90%. However, when testing with CIFAR-...
0 votes
0 answers
461 views

I am completely new to distributed programming and I have been trying to port the original code that ran on a multi-node cluster to single-node cluster with multiple GPUs. My goal is to simulate a ...
0 votes
0 answers
38 views

We have Kafka as the bus and Akka Cluster Sharding as the application distributed cluster. So we need to consume data from Kafka and process them in Akka Cluster. For a now we implement separate ...
0 votes
0 answers
92 views

I am new to pytorch-distributed, and any input will help. I have a code working with a single GPU. I am trying to make it distributed. I am getting a socket connect error. Below is the code ( I am ...
0 votes
0 answers
143 views

Introduction I'm new in PyTorch distributed and multiprocessing and I met the unexpected problems: I have leant that processes created by spawn will execute the given function, but my processes ...
1 vote
0 answers
211 views

Has anyone used distributed SQL caching in .Net 4.7.2 ? I have seen many sample code for SQL caching with .Net Core but not with .Net Framework 4.7.2. We are currently using Redis cache in the ...
0 votes
0 answers
106 views

I'm attempting to develop a broker in Java. Currently, I have created a server capable of posting it's services within the broker. Additionally, I have implemented a client that can invoke methods on ...
0 votes
0 answers
87 views

I want to be able to take a snapshot of a running javascript program, save it to a database, wait for some period of time, load back the state and continue the process execution. Is it possible?
0 votes
0 answers
52 views

I have a system that is clustered deploy on k8s, it will have multiple instances when it's deployed. My code sample like below import akka.actor.typed.pubsub.Topic import akka.actor.typed.scaladsl....
0 votes
0 answers
390 views

I installed qdrant helm chart in cluster with 1 node master, 4 node workers and I created a collection with shard_number=2, replication_factor=2. When I get cluster info with command: curl http://192....
1 vote
0 answers
407 views

I want to analyse the pytorch distributed backend interface but I don't know how to debug it.QAQ vscode python,debug+ gbd attach,python C++ debuuger Subprocesses can't be debugged? I'm wondering if ...
1 vote
0 answers
3k views

I encountered the following issues while using the device_map provided by Hugging Face for model parallel inference: I am running the code from the example code provided by Hugging Face, which can be ...

15 30 50 per page
1
2 3 4 5
20