asynchronous - Parallelization by threads vs Parallelization by processes on backend - Stack Overflow

admin2025-04-30 16

I used to work with usual sync programming and the archtecture implied that if you need anything to run a parallel, you queue it in message system and you would spawn extra process on the same or other VM to consume it : quite mild resource usage spikes, especially when you limit the number of certain processes on VM.

I have seen asynchronious programming when you parallelize (potentially but) your actions/work within the process which in its way leads to less controllable bursts of usage of resources (CPU, memory).

Am I wrong? Is there any criteria/guideline to prefer 2nd over 1st?

The question is about async programming on backend side.

Am I wrong? Is there any criteria/guideline to prefer 2nd over 1st?

The question is about async programming on backend side.

Share Improve this question edited Jan 4 at 21:22 asked Jan 4 at 21:05 user3791838 11 bronze badge

Async programming is particularly useful for operations having a significant latency like networking/storage operations. Using only parallelism is not enough to get good performance. – Jérôme Richard Commented Jan 4 at 21:51
Regarding the memory usage, you can preallocate data so to better control it. If you can know the amount of memory used by all tasks, you can even control the amount of concurrency based on the available space. Regarding the CPU, I hardly see how this is an issue (this is dependent of your context/work-domain which is not provided). Does latency matters in your case? Note that you often need to choose between throughput and latency (or mixed sub-optimal performance). – Jérôme Richard Commented Jan 4 at 21:53
You can write both asynchronous code and code spawning parallel processes in ways that cause CPU usage spikes (doing all work at once) and in ways that don't (but with a message queue where the number of concurrently processed items is limited). – Bergi Commented Jan 4 at 22:01

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The textbook answer is that threads share the memory context of the parent process. This makes them, faster to spawn, allows threads to share memory and exchange information very fast.

That also means that a failure or a memory leak on a single thread affects and persists for all of them. Depending on your code, you could have threads fighting for the memory bus, the L caches, and having a lot of context switches that can lead to significant slowdown.

Process on the other hand are given a new, fresh, private memory space. This means that they take longer to start, but they are a lot more isolated. Linux allows you to control process a lot more, limit memory and cpu for example. This can make them a lot easier to debug.

On the other hand, their interprocess communication (IPC) is slower, requiring shared memory which can be a complicated task, altough because they tend to share minimal data, they tend to lead to less synchronization errors (mutex, locks, etc)

Outside of that something that is not usually mentioned is:

Process are your only option for distributed, multi-node deployment.

Process can lead to a lot more control and utilization in NUMA cpus which are a lot more common in servers.

In general though, if you have to parallelize a section of your code, like a function, a loop, etc you use threads. If you need to parallelize a problem space, every entity does the same work on different chunks of data, you tend to use processes.

That is just a fast rule, people can disagree with it.

P.D.: if you are looking at python stuff, this changes as python threads are not really parallel under most python implementation because of something called the GIL (global interpreter lock)

转载请注明原文地址:http://anycun.com/QandA/1746027610a91541.html

asynchronous - Parallelization by threads vs Parallelization by processes on backend - Stack Overflow

1 Answer 1

asynchronousParallelization by threads vs Parallelization by processes on backendStack Overflow