Class 5: Multithreading and Multiprocessing
In this class, we will explore how to use multithreading and multiprocessing in Python to achieve parallelism and improve performance in our programs.
Introduction to Concurrency in Python:
Concurrency refers to the ability of a program to run multiple tasks at the same time, rather than executing them sequentially. Python provides several libraries for concurrency, including multithreading and multiprocessing, which allow us to perform multiple tasks simultaneously. These libraries are useful for speeding up time-consuming operations, such as I/O-bound or CPU-bound tasks.
Using Threads to Achieve Parallelism:
Threads are a lightweight way to achieve concurrency in Python. They allow us to run multiple tasks concurrently within the same process. By using threads, we can improve the performance of I/O-bound tasks, such as downloading data from the internet, by allowing other tasks to run while waiting for the I/O operation to complete.
Using Processes to Achieve Parallelism:
Processes are another way to achieve concurrency in Python. Unlike threads, processes run in separate memory spaces and are managed by the operating system. Because of this, processes are better suited for CPU-bound tasks, such as running machine learning algorithms, that require a lot of computation power. By using processes, we can distribute the workload across multiple CPU cores and achieve faster processing times.
Practice:
Create a Program that Uses Threads or Processes to Perform a Long-Running Computation: For practice, we will create a program that performs a long-running computation using threads or processes. This program will demonstrate how to use multithreading or multiprocessing in Python to improve performance.
To get started, we first need to identify a task that can benefit from parallel processing. For example, we could create a program that calculates the sum of a large list of numbers. We can then split the list into smaller chunks and process each chunk in parallel using threads or processes.
We can use the threading
module to create and manage threads in Python. Similarly, we can use the multiprocessing
module to create and manage processes in Python. We can then use these modules to perform the long-running computation in parallel.
Once we have implemented the program, we can measure its performance using tools like the time
module or profiling tools like cProfile
. We can compare the performance of the program with and without multithreading or multiprocessing to see the performance improvement achieved by parallel processing.
Here’s an example of a program that performs a long-running computation using threads:
import threading
import time
class ComputeThread(threading.Thread):
def __init__(self, name, start, end):
threading.Thread.__init__(self)
self.name = name
self.start = start
self.end = end
def run(self):
print(f"{self.name} starting...")
for i in range(self.start, self.end):
time.sleep(0.1)
print(f"{self.name} finished.")
def main():
# Define the computation ranges
range1 = (1, 10)
range2 = (11, 20)
# Create the threads
thread1 = ComputeThread("Thread 1", *range1)
thread2 = ComputeThread("Thread 2", *range2)
# Start the threads
thread1.start()
thread2.start()
# Wait for the threads to finish
thread1.join()
thread2.join()
print("Done.")
if __name__ == "__main__":
main()
This program creates two threads that each perform a computation that takes 1 second to complete. By running the computations in parallel using threads, the total time required to complete both computations is reduced from 2 seconds to 1 second.
To adapt this program to use processes instead of threads, you can replace the ComputeThread
class with a function that performs the computation and use the Process
class from the multiprocessing
module to create and start the processes.