I used to work with usual sync programming and the archtecture implied that if you need anything to run a parallel, you queue it in message system and you would spawn extra process on the same or other VM to consume it : quite mild resource usage spikes, especially when you limit the number of certain processes on VM.
I have seen asynchronious programming when you parallelize (potentially but) your actions/work within the process which in its way leads to less controllable bursts of usage of resources (CPU, memory).
Am I wrong? Is there any criteria/guideline to prefer 2nd over 1st?
The question is about async programming on backend side.
I used to work with usual sync programming and the archtecture implied that if you need anything to run a parallel, you queue it in message system and you would spawn extra process on the same or other VM to consume it : quite mild resource usage spikes, especially when you limit the number of certain processes on VM.
I have seen asynchronious programming when you parallelize (potentially but) your actions/work within the process which in its way leads to less controllable bursts of usage of resources (CPU, memory).
Am I wrong? Is there any criteria/guideline to prefer 2nd over 1st?
The question is about async programming on backend side.
The textbook answer is that threads share the memory context of the parent process. This makes them, faster to spawn, allows threads to share memory and exchange information very fast.
That also means that a failure or a memory leak on a single thread affects and persists for all of them. Depending on your code, you could have threads fighting for the memory bus, the L caches, and having a lot of context switches that can lead to significant slowdown.
Process on the other hand are given a new, fresh, private memory space. This means that they take longer to start, but they are a lot more isolated. Linux allows you to control process a lot more, limit memory and cpu for example. This can make them a lot easier to debug.
On the other hand, their interprocess communication (IPC) is slower, requiring shared memory which can be a complicated task, altough because they tend to share minimal data, they tend to lead to less synchronization errors (mutex, locks, etc)
Outside of that something that is not usually mentioned is:
Process are your only option for distributed, multi-node deployment.
Process can lead to a lot more control and utilization in NUMA cpus which are a lot more common in servers.
In general though, if you have to parallelize a section of your code, like a function, a loop, etc you use threads. If you need to parallelize a problem space, every entity does the same work on different chunks of data, you tend to use processes.
That is just a fast rule, people can disagree with it.
P.D.: if you are looking at python stuff, this changes as python threads are not really parallel under most python implementation because of something called the GIL (global interpreter lock)