-
Python
- Thread
- IO Bound tasks
- Context Switching
- Race Condition
Loading
gitGraph commit id: "main function" commit id: "init t1" branch thread1 checkout thread1 commit id: "load image 1" checkout main commit id: "init t2" checkout main branch thread2 commit id: "load image 2" checkout main merge thread1 id: "gil 1" merge thread2 id: "gil 2"
- Multiprocessing
- CPU Bound tasks
- May cause overhead
- Concurrent.futures
- Combine Threading and Multiprocessing Libraries
- Asyncio
- IO Bound tasks
- Similar to Threading, but one thread one process
Loading
gitGraph commit id: "main function" commit id: "async call 1" commit id: "async call 2" commit id: "await" type: HIGHLIGHT commit id: "complete"
- Spark
git clone https://github.com/zcemycl/systemDeploy.git cd systemDeploy/src/containers/docker/spark/ docker build -t cluster-apache-spark:latest . docker-compose up -d mkdir /tmp/data mkdir /tmp/apps cp *.csv /tmp/data cp *.py /tmp/apps mkdir /tmp/data/data-output docker exec -it spark-spark-master-1 bash -c "bin/spark-submit /opt/spark-apps/test.py"Loadingflowchart LR; A[main] --> B[master node]; B --> C[worker node 1] & D[worker node 2] - Ray
- Thread
-
Cuda (Compute Unified Device Architecture)
- CPU: Latency device with high local speed, small no. of cores, have optimisation hardware.
- GPU: Through put device with low lock speed, thousand of cores, no optimisation hardware. Context switching done by hardware, thread schedulers and dispatch units are implemented in hardware.
Loading
gitGraph commit id: "init resources" branch cpu commit id: "init data" branch gpu commit id: "transfer data" commit id: "kernel launch (grid/block)" checkout cpu merge gpu id: "transfer result" checkout main merge cpu id: "reclaim memory"- Thread, Block, Grid
B1z B1x B2x B3x B4x B2z B1x B2x B3x B4x B1y TTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTTB2y TTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTTTTT
TTT- Each T is a thread in a block.
- Each Block has (x,y,z) = (3,2,1) threads
- Grid dimension is (x,y,z) = (4,2,2)
tx + Bx*ty + Bx*By*bx + Bx*By*Gx*by + Bx*By*Gx*Gy*bz
- Warps (32 threads)
-
Software
0 1 ... 38 39 0 1 ... 38 39 -
Hardware
0 ... 31 | 32 ... 39 (14 idle)| 40 ... 71 | 72 ... 79 (14 idle) -
Therefore, a multiple of 32 in block size is avoiding the waste of resources.
-
- Thread, Block, Grid
- Nsight System
__PREFETCH=off /media/yui/Disk/nsight-systems-2021.3.1/bin/nsys profile -o noprefetch --stats=true ./runTutorials
- Nsight Compute
/NVIDIA-Nsight-Compute-2021.2/ncu -o profile_test_div -f -k "divergence_code" --target-processes all --section "WarpStateStats" --section "SourceCounters" --launch-count 1 ./div.out
-
C++ Parallelism
- Thread Guard, Lock Guard, Race Condition, Deadlock, Unique Lock, Async Future, Promise
- JThread, Stop Token (Macos clang does not support, Alternative is run with dockerfile), Coroutines
- Boost (Dockerfile)
- OpenMP
- OpenMPI
- Join vs Detach
Loading
gantt title Bar 2000 Detach, Foo 1000 Join dateFormat ss-SSS axisFormat %S-%L section t1 detach "print bar 1": 00-000, 0.005s sleep: 00-005, 2s "print bar 2": 02-005, 0.005s section t2 join "print bar 1": 00-005, 0.005s sleep: 00-010, 1s "print bar 2": 01-010, 0.005s section stdout 1. bar: 00-000, 0.005s 2. foo: 00-005, 0.005s bar: crit, 02-005, 0.005s 3. foo: 01-010, 0.005s section main program: a10, 00-000, 1.015s - Locks
Loading
gantt title Double locks dateFormat ss-SSS axisFormat %S-%L section t1 m1 lock: 00-000, 0.005s t1m1: 00-005, 0.005s sleep: 00-010, 1s wait m2 unlock: crit, 01-010, 0.51s m2 lock: 01-520, 0.005s t1m2: 01-525, 0.005s m1 m2 unlock: 01-530, 0.005s section t2 m2 lock: 00-005, 0.005s t2m2: 00-010, 0.005s sleep: 00-015, 1.5s m2 unlock: 01-515, 0.005s section stdout 1. t1m1: 00-005, 0.005s 2. t2m2: 00-010, 0.005s 3. t1m2: 01-525, 0.005s - Racing Conditions
Loading
gantt title Racing Condition dateFormat ss-SSS axisFormat %S-%L section Thread 1 op1: 00-000, 0.002s op2: 00-002, 0.002s op3: 00-004, 0.002s section Thread 2 op1: 00-001, 0.002s op2: 00-003, 0.002s op3: 00-005, 0.002s section Linked List [4][1,2,3],s-1: 00-000, 0.002s [5][1,2,3],s-1: 00-001, 0.002s [4,1,2,3],s-1: 00-002, 0.002s [5,1,2,3],s-1: 00-003, 0.002s [4,1,2,3],s-4: 00-004, 0.002s [5,1,2,3],s-5: 00-005, 0.002s section expected [5,4,1,2,3],s-5: 00-000, 0.007s section final [5,1,2,3],s-5: 00-000, 0.007s
-
Notifications
You must be signed in to change notification settings - Fork 0
zcemycl/distributeCompute
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Parallel Computing and Distributed Computing with C++ threads, Python threads+asyncio+multiprocessing and Spark, and Cuda.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published