Why isn't it recommended to deploy the database inside a Docker container?

Author : xuzhiping   2022-11-08 15:09:48 Browse: 1392
Category : Python

Abstract: Preface In the past two years, Docker is so hot that developers want to deploy all applications and software in Docker containers...

Preface

In the past two years, Docker is so hot that developers want to deploy all applications and software in Docker containers, but are you sure you also want to deploy the database in the container?

This problem is not false, because many kinds of operation manuals and video tutorials can be found on the Internet. The editor has sorted out some reasons why the database is not suitable for containerization for your reference, at the same time, I hope you can be more careful when using it.

So far, it is very unreasonable to containerize the database, but I believe all developers have tasted the benefits of containerization. It is hoped that with the development of technology, a more perfect solution can appear.

7 reasons why Docker is not suitable for deploying databases

1. Data security issues

Data security issues

Do not store data in containers, which is one of the official Docker container tips. The container can be stopped or deleted at any time, when the container is dropped by rm, the data in the container will be lost. To avoid data loss, users can use data volume mount to store data. But the Volumes of the container is designed to provide persistent storage around the Union FS mirror layer, data security is not guaranteed. If the container suddenly crashes and the database is not closed properly, the data may be corrupted. In addition, the sharing of data volume groups in the container also does great damage to the physical machine hardware.

Even if you want to store Docker data on the host, it still doesn't guarantee that the data will not be lost. Docker volumes is designed to provide persistent storage around the Union FS mirror layer, but it still lacks assurance. With the current storage driver, Docker is still at risk of being unreliable. If the container crashes and the database does not close correctly, the data may be corrupted.

2. Performance issues

As we all know, MySQL is a relational database with high IO requirements. When a physical machine runs multiple times, the IO will accumulate, resulting in an IO bottleneck and greatly reducing the read and write performance of MySQL.

In a special session on the top ten difficulties of Docker applications, an architect of a state-owned bank also proposed: "The performance bottleneck of the database generally appears on the IO, if you follow Docker's thinking, then multiple docker final IO requests will appear on the storage." Now the Internet database is mostly a share nothing architecture, which may be a factor in not considering migrating to Docker."

Performance problem

Some students may also have corresponding solutions to solve performance problems:

(1) Separation of database programs from data

If you use Docker to run MySQL, the database program needs to be separated from the data, stored in shared storage, and put the program in a container. If the container has an exception or an exception to the MySQL service, a brand new container is automatically started. In addition, it is recommended not to store data in the host, because the host and container share the volume group, which has a greater impact on the damage of the host.

(2) Run lightweight or distributed databases

To deploy lightweight or distributed databases in Docker, Docker itself recommends that the service be hung up and a new container be automatically started, rather than continuing to restart the container service.

(3) Reasonable layout and application

For applications or services with high IO requirements, it is appropriate to deploy the database on a physical machine or KVM. At present, TX Cloud's TDSQL and Alibaba's Oceanbase are deployed directly on physical machines, not Docker.

3. Network problems

To understand Docker networking, you must have a deep understanding of network virtualization. You must also be prepared to deal with the unexpected. You may need to make bug fixes with no support or additional tools.

Network problem

We know that databases require dedicated and persistent throughput to achieve higher loads. We also know that the container is an isolation layer behind the hypervisor and the host virtual machine. However, the network is very important for database replication, which requires a stable connection between master and slave databases. Unresolved Docker network issues remain unresolved in version 1.9.

Putting these issues together, containerization makes database containers difficult to manage. I know you're a top engineer, any problem can be solved. But how much time do you need to solve the Docker network problem? Wouldn't it be better to put the database in a dedicated environment? Save time to focus on the business goals that really matter.

4. Status

It's cool to pack stateless services in Docker, where you can orchestrate containers and solve single points of failure. But what about the database? If you put the database in the same environment, it will be stateful and make the scope of system failure wider. The next time your application instance or application crashes, it may affect the database.

knowledge points: In Docker, horizontal scaling can only be used for stateless computing services, not databases.

One of the important features of the rapid expansion of Docker is stateless, and those with data state are not suitable to be placed directly in Docker. If the database is installed in Docker, the storage service needs to be provided separately.

At present, TX Cloud's TDSQL (Financial distributed Database) and Aliyun's Oceanbase (distributed Database system) are both running directly on physical machines, not used on the easy-to-manage Docker.

5. Resource isolation

In terms of resource isolation, Docker is really not as good as virtual machine KVM, Docker, which uses Cgroup to achieve resource restrictions. You can only limit the maximum consumption of resources, but you cannot isolate other programs from occupying their own resources. If other applications take up too much physical machine resources, it will affect the read and write efficiency of MySQL in the container.

The more isolation levels you need, the more resource overhead you get. Compared with a dedicated environment, Docker has the advantage of easy horizontal scaling. However, in Docker, horizontal scaling can only be used for stateless computing services, and databases are not applicable.

We haven't seen any isolation function for the database, so why should we put it in the container?

6. Inapplicability of cloud platform

Most people start their projects through a shared cloud. The cloud simplifies the complexity of virtual machine operation and replacement, so there's no need to test new hardware environments at night or on weekends when no one is working time. When we can quickly start an instance, why do we need to worry about the environment in which the instance runs?

Inapplicability of cloud platform

That's why we pay a lot to cloud providers. When we place database containers for instances, these conveniences don't exist. Because of the data mismatch, the new instance will not be compatible with the existing instance, if we want to restrict the instance from using stand-alone services, we should let the DB use a non-containerized environment, we only need to reserve the ability to elastically scale for the compute service layer.

7. Environmental requirements for running database

It is common to see DBMS containers and other services running on the same host. However, the hardware requirements for these services are very different.

Databases, especially relational databases, have high IO requirements. The general database engine uses a dedicated environment to avoid concurrent resource competition. If you put your database in a container, you will waste the resources of your project. Because you need to configure a lot of additional resources for that instance. In the public cloud, when you need 34 GB of memory, the instance you launch must open 64 GB of memory. In practice, these resources are not fully utilized.

How to solve it? You can design hierarchically and use fixed resources to launch multiple instances at different levels. Horizontal scaling is always better than vertical scaling.

Summary

In response to the above question, does it mean that the database must not be deployed in containers? The answer is no. We can digitize data loss-insensitive services (search, burying), and use database sharding to increase the number of instances and thus increase throughput.

Docker is suitable for running lightweight or distributed databases, when the docker service is hooked, it will automatically start a new container instead of continuing to restart the container service. The database can be containerized using middleware and containerization systems to automatically scale, recover, switch, and bring multiple nodes.

Label :
    Sign in for comments!
Comment list (0)

Powered by TorCMS (https://github.com/bukun/TorCMS).