Pythongasm - Concurrency and Parallelism in Python

Copied to clipboard

Concurrency and Parallelism in Python

Jun 16, 2021 • 10 minutes • 2395 views

Introduction

Most of us have come across terms like multithreading, parallel processing, multiprocessing, concurrency, etc., though each of these terms has its own meaning. In a broader sense, we need these because we want to avoid some kind of latency (or have an illusion of doing so) in the execution of regular programs.

To do this, we try to write code that doesn’t necessarily runs in order (non-sequential) and all this boils down to two different concepts — concurrency and parallelism.

Concurrency

I came across concurrency when I was trying to download ~5000 images from the web. I had collected image URLs from Flickr, and these images had to be passed on to a team doing annotation (labelling).

This is how a sequential program to download images would look like:

import requests

def download_image(URL, filename):
    image = requests.get(URL)

    with open(f'{filename}.png','wb') as f:
        f.write(image.content)


flickr_URLs = [
    'https://live.staticflickr.com/6022/5941812700_0f634b136e_b.jpg',
    'https://live.staticflickr.com/3379/3492581209_485d2bfafc_b.jpg',
    'https://live.staticflickr.com/7309/27729662905_e896a3f604_b.jpg',
    'https://live.staticflickr.com/8479/8238430093_eb19b654e0_b.jpg',
    'https://live.staticflickr.com/5064/5618934898_659bc060cd_b.jpg',
    'https://live.staticflickr.com/3885/14877957549_ccb7e55494_b.jpg',
    'https://live.staticflickr.com/5473/11720191564_76f3f56f12_b.jpg',
    'https://live.staticflickr.com/2837/13546560344_835fc79871_b.jpg',
    'https://live.staticflickr.com/140/389494506_55bcdc3664_b.jpg',
    'https://live.staticflickr.com/5597/15253681909_0cc91c77d5_b.jpg',
    'https://live.staticflickr.com/1552/24085836504_3d850f03e7_b.jpg',
    'https://live.staticflickr.com/7787/26655921703_ee95e3be8e_b.jpg',
    'https://live.staticflickr.com/423/32290997650_416303457b_b.jpg',
    'https://live.staticflickr.com/4580/37683207504_053315d23f_b.jpg',
    'https://live.staticflickr.com/3225/2454495359_92828d8542_b.jpg',
    'https://live.staticflickr.com/7018/6640810853_22634c6667_b.jpg',
    'https://live.staticflickr.com/7681/17285538029_363c8760ea_b.jpg',
    'https://live.staticflickr.com/7630/16622584999_0654c8d564_b.jpg',
    'https://live.staticflickr.com/6160/6207543047_da2c66c2f6_b.jpg',
    'https://live.staticflickr.com/2921/14251116921_a97d7a46ce_b.jpg'
]

for url in flickr_URLs:
    filename = url.split(‘/')[-1]
    download_image(url, filename)

It does the job but we spent most of the time waiting for the source URLs to respond. When we scale this program to 5,000 images this “wait time” becomes humongous.

The above program sends a request to a URL, waits until the image loads (gets response from server), writes it to disk, and only then sends a new request until the list exhausts.

However, rather than waiting for the first URL to load, shouldn’t we send a new request in the meantime? Once we receive some response from a previously sent request, we can write the corresponding image to the disk. By doing this we are not letting the latency block our main program.

We can achieve this by starting a new “thread”, along with the main thread using built-in Python module called threading.

Here’s how you create a thread:

thread = threading.Thread(download_image, args=[url, filename])
thread.start()

creating-threads

So that’s one thread. You need to create many threads, so let’s loop over. Here’s what’s the threaded version of this program would look like:

import threading
import requests

def download_image(URL, filename):
    ...

flickr_URLs = [...]

threads = []
for url in flickr_URLs:
    filename = url.split('/')[-1]
    thread = threading.Thread(target=download_image, args=[url, filename])
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

We call .join() on a thread to join it to the main thread — telling Python to wait for a thread to terminate before moving further down in the file.

In this program, we created 20 threads. But how many threads are too many? If we have 5,000 URLs, should we start 5000 threads?

First of all, you should know that you can start multiple threads but they won’t be running simultaneously. It’s just that while one thread is waiting for some I/O operation, another one starts working in the meantime. juggler It might look like the juggler is juggling two balls but in reality, at any given point he only has one ball in his hand. Source: Library of Juggling

Since your OS continuously switches thread to thread, deciding which one should run at a given time, managing hundreds of threads will eat up a big chunk of resources.

So a simple workaround is this:

MAX_THREADS = 10
threads = []
for url in flickr_URLs:
    if len(threads)>MAX_THREADS:
        for thread in threads:
            thread.join()

        threads = []

    filename = url.split('/')[-1]   
    thread = threading.Thread(target=download_image, args=[url, filename])
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

We’re only starting MAX_THREADS at once, waiting for them to terminate (we're doing that by calling.join()), and then starting the next MAX_THREADS threads.

However, there’s a modern way of doing this which we will see later in this article.

Some implementations of Python like PyPy, IronPython can run multiple threads simultaneously but this isn’t the case with the default implementation, that’s CPython.

Notice how we didn’t need CPU power to speed up this task of downloading images. These are I/O bound tasks — it’s like cooking food, for example. 
If you’re preparing a dish, and at some point you need to preheat your microwave oven, you won’t be looking for more manpower to speed up your cooking. You’re better off utilising the preheating time into…maybe chopping down some veggies.

juggler Python cutting down veggies while the oven finishes up preheating

That’s concurrency. But what if a task is bottlenecked by the CPU, rather than networking and IO? That brings us to parallelism.

Parallelism

Now suppose you’re done with cooking, and it’s time to do the dishes. Can you apply the concept of concurrency here? Pick the knife and start cleaning it, switch over to the bowl you used for pouring milk, start washing it, and then move on to the plates, then back to the knife you were washing some time ago.

At best, it won’t make any difference to the execution time of this task. And in most of the cases, doing this will make the process slower as you are taking some time in switching back and forth to different utensils (yes, multithreading can also slow down tasks).

If you’re looking to expedite this task, you need manpower. Some friend who washes the bowl while you cleanse knives and forks — more friends, the better.

This is how a CPU heavy task looks like, where you can use multiple CPU cores to speed up the task. Let’s try to apply this using Python.

We’re trying to find out product of prime numbers upto a number n using a function productOfPrimes, such that productOfPrimes(10) -> 210 (product of 2, 3, 5, 7)

import time

def productOfPrimes(n):

    ALL_PRIMES_UPTO_N = []
    
    for i in range(2, n):

        PRIME = True
        for j in range(2, int(i/2)+1):
            if i%j==0:
                PRIME = False
                break

        if PRIME:
            ALL_PRIMES_UPTO_N.append(i)


    print(f"{len(ALL_PRIMES_UPTO_N)} PRIME NUMBERS FOUND")

    product = 1
    for prime in ALL_PRIMES_UPTO_N:
        product = product*prime

    return product

init_time = time.time()
LIMITS = [50328,22756,39371,44832]

for lim in LIMITS: 
    productOfPrimes(lim)
    
fin_time = time.time()
print(f"TIME TAKEN: {fin_time-init_time}")

Nested loops, lots of division — pretty heavy on the CPU. The execution took ~10 seconds on 1.1 GHz quad core Intel i5 processor. However, out of 4 cores, we just used one.

To manage multiple cores and processes, we use this module called multiprocessing in python.

Let’s see if multiprocessing improves this result. Syntactically, it’s quite similar to how we started threads:

import multiprocessing

def productOfPrimes():
    …

if __name__ == "__main__":  
    processes = []
    LIMITS = [50328,22756,39371,44832] 

    for lim in LIMITS:
        process = multiprocessing.Process(target=productOfPrimes, args=[lim])
        process.start()
        processes.append(process)

    for process in processes:
        process.join()

This does the same thing in a little over 4 seconds. productOfPrimes was simulatenously executed on multiple cores available in the CPU.

Now let's talk a bit about numbers. How many processes can a quad-core CPU can execute simultaneously? Shouldn't it be only 4? Yes, but that doesn't mean the OS can't hold more than 4 processes in memory. There's a difference in executing processes and just holding them in memory.

Run this bash command to see the number of processes running on your system:

ps -e | wc -l

So if you start 20 processes from Python, it won't throw an error saying you don't have enough cores. The OS will just manage these 20 processes over whatever cores are available.

Pool of Threads and Processes

We’ve seen how we can implement the concepts of concurrency and parallelism using threading and multiprocessing modules. However, we’ve a sexier, more Pythonic way of doing this using the concurrent.futures module (ships with Python).

import concurrent.futures

def download_image(URL, filename):
    ...

flickr_URLs = [...]
with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(download_image, flickr_URLs)

for result in results:
        print(result)

We can do something similar using ProcessPoolExecutor:

import concurrent.futures

def productOfPrimes(n):
    ...

LIMITS = [...]
    
if __name__ == "__main__": 
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = executor.map(productOfPrimes, LIMITS)

for result in results:
    print(result)

concurrent.futures

Threads, Processes, OS

Technically, threads run inside a process. When we create 4 threads, they share the same process, thus the same memory, and a lot of other OS level stuff (process control block, address space, etc.). The same is not true for processes. Each process has memory space of its own and run independently.

On the other hand, processes can run simultaneously — unlike in multithreading where the OS just keeps switching over and over to manage latency inside the same process.

Conclusion

We’ve seen how we can implement concurrency and parallelism in Python which are fundamentally very different, and have use cases of their own. There are more things to talk about like problems with threading, GIL, asynchronicity, etc.

Further Readings and Attributions


Permanent link to this article Read Next
pygs.me/007 Introduction to Scrapy: Web Scraping in Python

Comments

styling code quotes links  

Adam

Excellent article. Great job.

Shiva

Great. Short and easy to understand

Joseph

In this code on the "Pool of Threads and Processes" section, on the line that executes the download_image function:

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(download_image, flickr_URLs)

Where does the download_image function get the filename variable from in this instance?

[Author]

@Joseph, thanks for pointing out. Since we are taking the filename from the URL itself, the simpler way would be tweaking our function like this:
def download_image(URL):
    filename = url.split(/')[-1]
    ...

However, if you really want to pass multiple arguments from .map(), have a look at this thread.