Deploy your containerized AI applications with nvidia docker

More and more products and services take advantage of AI's modeling and prediction capabilities. This article introduces the nvidia-docker tool for integrating artificial intelligence (AI) software into a microservices architecture. The main benefit explored here is the use of the host system's GPU (Graphical Processing Unit) resources to accelerate multiple containerized AI applications.

To understand the usefulness of nvidia docker, we'll start by describing what kind of AI can benefit from GPU acceleration. Second, we will present how to implement the nvidia-docker tool. Finally, we will describe what tools are available to use GPU acceleration in your applications and how to use them.

Why use GPUs in AI applications?

In the field of artificial intelligence, we have two main subfields that are used: machine learning and deep learning. The latter is part of a larger family of machine learning methods based on Artificial Nervous System.

In the context of deep learning, where operations are essentially matrix multiplications, GPUs are more efficient than CPUs (Central Processing Units). This is why the use of GPUs has increased in recent years. In fact, GPUs are considered the heart of deep learning due to their massively parallel architecture.

However, GPUs cannot run just any program. They actually use a specific language (CUDA for NVIDIA) to take advantage of their architecture. So how do you use and communicate with GPUs from your applications?

The NVIDIA CUDA technology

NVIDIA CUDA (Compute Unified Device Architecture) is a parallel computing architecture combined with an API for programming GPUs. CUDA translates application code into an instruction set that GPUs can execute.

A CUDA SDK and libraries such as cuBLAS (Basic Linear Algebra Subroutines) and cuDNN (Deep Neural Network) have been developed to communicate easily and efficiently with a GPU. CUDA is available in C, C++ and Fortran. There are wrappers for other languages, including Java, Python, and R. For example, deep learning libraries such as TensorFlow and Keras are based on these technologies.

Why use nvidia docker?

Nvidia Docker addresses the needs of developers who want to add AI functionality to their applications, maintain them, and deploy them on servers powered by NVIDIA GPUs.

The goal is to set up an architecture that allows the development and deployment of deep learning models in services accessible via an API. Thus, the utilization rate of GPU resources is optimized by making them available to multiple application instances.

In addition, we benefit from the advantages of containerized environments:

  • Isolation of instances of each AI model.
  • Collocation of multiple models with their specific dependencies.
  • Collocation of the same model across multiple versions.
  • Consistent distribution of models.
  • Model performance monitoring.

Using a GPU in a container requires installing CUDA in the container and providing access to the device. With this in mind nvidia-docker tool has been developed, which allows NVIDIA GPUs to be exposed in containers in an isolated and secure manner.

As I write this article, the latest version of nvidia-docker is v2. This version differs greatly from v1 in the following ways:

  • Version 1: Nvidia docker is implemented as an overlay to Docker. That is, to create the container you had to use nvidia-docker (Ex: nvidia-docker run ...) that performs the operations (including the creation of volumes) that make it possible to see the GPUs in the container.
  • Version 2: Deployment is simplified by replacing Docker volumes using Docker runtimes. In fact, to start a container it is now necessary to use the NVIDIA runtime via Docker (Ex: docker run --runtime nvidia ...)

Note that the two versions are not compatible due to their different architectures. An application written in v1 must be rewritten for v2.

Configure nvidia docker

The required elements to use nvidia docker are:

  • A container run.
  • An available GPU.
  • The NVIDIA Container Toolkit (main part of nvidia-docker).


Dock workers

A container run is required to run NVIDIA Container Toolkit. Docker is the recommended runtime, but Podman and containerd are also supported.

The official documentation provides the Docker installation procedure.

The NVIDIA driver

Drivers are required to use a GPU device. In the case of NVIDIA GPUs, the drivers corresponding to a given operating system can be obtained from NVIDIA driver download pageby filling in the GPU model information.

The installation of the drivers is done via the executable file. For Linux, use the following commands, substituting the name of the downloaded file:

chmod +x

Reboot the host computer at the end of the installation to take into account the installed drivers.

Installing nvidia docker

The Nvidia docker is available at GitHub project page. To install it, follow Installation manual depending on your server and architecture specifications.

We now have an infrastructure that allows us to have isolated environments that provide access to GPU resources. To use GPU acceleration in applications, several tools have been developed by NVIDIA (non-exhaustive list):

  • CUDA Toolkit: a set of tools for developing software/programs that can perform calculations with both CPU, RAM and GPU. It can be used on x86, Arm and POWER platforms.
  • NVIDIA cuDNN: a library of primitives to accelerate deep learning networks and optimize GPU performance for large frameworks such as Tensor flow and Keras.
  • NVIDIA CUBLAS: a library of GPU-accelerated linear algebra subroutines.

Using these tools in application code accelerates AI and linear algebra. With the GPUs now visible, the application can send data and operations to be processed on the GPU.

CUDA Toolkit is the lowest level option. It provides the most control (memory and instructions) for building custom applications. Libraries provide an abstraction of CUDA functionality. They allow you to focus on the application development rather than the CUDA implementation.

Once all these elements are implemented, the architecture using the nvidia-docker service is ready to use.

Here's a chart to summarize everything we've seen:



We have set up an architecture that allows the use of GPU resources from our applications in isolated environments. To summarize, the architecture is composed of the following bricks:

  • Operating system: Linux, Windows…
  • Docker: isolating the environment with Linux containers
  • NVIDIA driver: installation of the driver for the current hardware
  • NVIDIA container runtime: orchestration of the previous three
  • Applications on Docker containers:
    • CUDA
    • cuDNN
    • cuBLAS
    • Tensorflow/Keras

NVIDIA continues to develop tools and libraries around AI technology, with the goal of establishing itself as a leader. Other technologies may complement nvidia docker or may be more appropriate than nvidia docker depending on the use case.

#Deploy #containerized #applications #nvidia #docker

Source link

Leave a Reply