Python multithreading and multiprocessing

Author : xuzhiping   2022-11-24 14:07:49 Browse: 1185
Category : Python

Abstract: In the actual programming process, the use of multithreading and multiprocessing still needs some attention. This article discus...

In the actual programming process, the use of multithreading and multiprocessing still needs some attention. This article discusses it from the perspective of Python.

GIL Mechanism of Python

Python

For each language, after the introduction of multithreading, it is necessary to face the synchronization problem between threads. This does not only refer to the synchronization in applications, but also the synchronization and management of internal data structures in the language runtime. Modern programming languages more or less have a certain runtime (even C!), in runtime, the low-level work such as resource initialization, program error handling, and final resource cleaning are often completed. The garbage collection mechanism is probably the most famous and widely used feature.

As the management of various data in runtime is actually more complex, the introduction of multithreading will not only make it easy to make mistakes, but also greatly reduce the running efficiency even if various synchronization methods (locks, atomic operations) are used to achieve the ability to run in parallel. Because in many cases, although there is no conflict, the cpu will still run those atomic operation instructions (slower). As an example of the garbage collection mechanism, Python uses the reference counting method, so it actually maintains a reference count for each variable. Every time a variable is out of the running range, it will be reduced by 1. This is a very simple situation, but what should we do when multithreading runs in parallel? If you perform a synchronization operation every time you subtract count, it will greatly affect the running speed of the program, so it is not advisable.

The gc mechanism is just an example. In fact, in order to avoid the problem of multithreading parallel access to data, and to maintain the clarity and simplicity of the source code (a bit like an excuse), Python only provides a fake multithreading mechanism, which directly uses a global lock (GIL) to ensure that only one thread is running at any time. In this way, there is no parallel running problem. However, in this way, the performance of Python programs on multi-core machines is similar to that on single core machines. However, Python's own performance may not matter, and the bottleneck of most applications is not computing speed, so it doesn't matter. If you really need to use multiple cores, you can consider using the multiprocessing multiprocessing mechanism (with many problems), or calling the C++parallel implementation library (Python's scientific computing library should do this).

Multi-thread or multi-process

Python itself provides the method of multi process and multi thread. As we have already said, multi thread is a fake multi thread that can only run on one core. However, it is not a big problem for IO bound applications such as Web servers. Only the application of compute intensive needs to consider the problem of maximizing the use of multiple accounting forces.

Generally speaking, if you need to use multi-core in Python, you can only use Python's multiprocessing library. When using this library, you need to pay attention to the method of starting new processes. According to the official documents of Python, there are mainly two methods: spawn and fork. The spawn method is equivalent to restarting a Python program and running the function you give, while the fork method can only be used on Linux. Its implementation should be similar to calling fork () directly and then running the given function. It can be seen that the spawn method is closer to the way we want to start a new process. Because the process created by fork inherits the file descriptor and does not perform any cleaning operations, some problems may occur in the actual operation. The official document specifically mentions that fork may have problems when used in combination with multiple threads, but there is no more description. The significance of the fork method is that it is very fast (the reason is related to the internal implementation of the Linux system). If you are sure that the fork will not cause problems such as resource leakage, you can use it, and the more safe method is to use spawn.

In addition, it should be mentioned that although multithreading can be used for IO bound programs, it is not the optimal solution in many cases, because a Python thread actually corresponds to a system thread, but the system thread is also relatively heavy in a sense. In Linux, the implementation of threads and processes are almost the same, and threads are only equivalent to "lightweight" processes in a sense. As for the Web server problem mentioned earlier, a better choice is to use a lightweight thread library such as gevent. This thread library is completely managed by the library. Since it does not involve system calls and takes up few resources, and does not take up system resources, it can support a lot of "threads" at the same time.

When consulting Python's multi-threaded and multiprocess related materials, I realized again that although there are many materials on the Internet for more basic knowledge, there are few materials when it comes to operating systems, programming languages, internal implementations, and other "biased" things. At present, if we can't find them on StackOverFlow, we can hardly find better materials. In Python, things related to process operation are not only related to the language itself, but more related to the operating system. The methods it provides can basically be regarded as an interface. If you need to fully control, you need to start from the behavior of the operating system itself to investigate. Perhaps this is also the importance of computer foundation. No matter how the high-level programming language tries to encapsulate the details of the underlying, When it comes to efficiency and some actual scenarios, we always have to go back to the bottom to understand its implementation more thoroughly, so as to write programs that are more in line with our expectations.

Label :
    Sign in for comments!
Comment list (0)

Powered by TorCMS (https://github.com/bukun/TorCMS).