So, you want to run your code in parallel so that your can process faster, or you can get better performance out of your code.

Python provides two ways to achieve this:

  1. Multithreading
  2. Multiprocessing

The basic difference between a thread and a process is that a process has a completely isolated memory space for its own purpose. No two process can share the same memory (whether heap or stack). whereas in threads, they can share the same heap memory of the process, but they have separate stack memory. So, chances are higher that they try to access the same variables/data. And, here comes operating system concepts of locks/semaphores.

Python GIL

Many other languages like Java has a great support for multithreading and providing lock mechanisms. But, in python there is a concept of GIL(Global Interpreter Lock) which restrict only one thread at a time to run. Even if you have multi-core CPU. You will not get real benefit from multithreading. But hold on. In simpler terms, this GIL restrict that only one thread can be interpreted at a time. At any point of time, the interpreter is with a single thread.

Multithreading and Multiprocessing

So python developers provided another way for parallelism: Multiprocessing. It allows you to create multiple processes from your program, and give you a behavior similar to multithreading. Since, there will be multiple processes running. Each process will have a different GIL.

Multithreading OR Multiprocessing - Which one to choose

This is a million dollor question, and is quite easy to answer.

  • If your program is CPU bound, then you should go for multiprocessing
  • If your program is IO bound, then you should go for multithreading IO bound process includes waiting for file transfer, doing http calls and waiting for result etc.

Pros and Cons

Multiprocessing Pros

  • Have full separate heap space
  • Code is understandable and simple
  • Since each process has different GIL, this issue will not be problematic.
  • Can take advantage of multiple CPU cores.
  • You can see each process by ps command, and can kill those processes too.

Multiprocessing Cons

  • Creating a process is heavy duty and require more resources than thread
  • This will effect in large memory consumption by your total program
  • Usually communicating between your main process and forked processes is bit tedius and complex.

Multithreading Pros

  • Low memory usage, everything is in same process
  • Each thread will share the heap memory, and hence can access the state of program. Note: This can be a disadvantage too
  • Great for IO bound processing

Multithreading Cons

  • GIL issue
  • Novice programmers find it hard to write thread-safe code.
  • Threads are not killable from outside
  • Synchronization issues/Deadlocks can happen

Multithreading Code

from threading import Thread
from time import sleep

def foo(n):
    for i in range(n):
        print('foo ', i)
        sleep(1)

def bar(n):
    for i in range(n):
        print('bar ', i)
        sleep(1)

t1 = Thread(target=foo, args=(3,))
t2 = Thread(target=bar, args=(5,))

t1.start()
t2.start()

t1.join()
print('foo finish')
t2.join()
print('bar finish')

Output

foo  0
bar  0
foo  1
bar  1
foo  2
bar  2
bar  3
foo finish
bar  4
bar finish

Note: If you do not use join() method. You will see foo finish, bar finish statements executed early. join() statement is used to wait for a thread to finish processing its task.

Problem with Many threads

Above code is simple when you are dealing with 1 or 2 threads. But, what if you are playing around with 10 or 20 threads. You probably would be using a list, and managing their instances and then deal with each individual join methods. It can become complex.

Thread Pool - ThreadPoolExecutor

For reference visit: https://docs.python.org/3/library/concurrent.futures.html

Python provides a manager kind of class which manages n number of threads, and you just have to manage single instance of that pool executor.


import time
from concurrent.futures import ThreadPoolExecutor as Executor

def square(a):
    time.sleep(1)
    return a*a

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

with Executor(max_workers=8) as workers:
    res = workers.map(square, data)
    print(list(res))

print('Finish')

Output

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Finish

Multiprocessing

from multiprocessing import Process
from time import sleep

def foo(n):
    for i in range(n):
        print('foo ', i)
        sleep(1)

def bar(n):
    for i in range(n):
        print('bar ', i)
        sleep(1)

t1 = Process(target=foo, args=(3,))
t2 = Process(target=bar, args=(5,))

t1.start()
t2.start()

t1.join()
print('foo finish')
t2.join()
print('bar finish')

This code is almost similar to thread code. Just that we have used multiprocessing module.

ProcessPoolExecutor

In above code, just replace ThreadPoolExecutor with ProcessPoolExecutor

import time
from concurrent.futures import ProcessPoolExecutor as Executor

def square(a):
    time.sleep(1)
    return a*a

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

with Executor(max_workers=8) as workers:
    res = workers.map(square, data)
    print(list(res))

print("Done")

And, the output will be same. It is jsut that multiple processes will be launched.