PyTorch

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. Automatic differentiation is done with a tape-based system at both a functional and neural network layer level. This functionality brings a high level of flexibility and speed as a deep learning framework and provides accelerated NumPy-like functionality. NGC Containers are the easiest way to get startedwith PyTorch. The PyTorch NGC Container comes with all dependencies included, providing an easy place to start developing common applications, such as conversational AI, natural language processing (NLP), recommenders, and computer vision.

The PyTorch NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container also contains software for accelerating ETL (DALI, RAPIDS), Training (cuDNN, NCCL), and Inference (TensorRT) workloads.

Prerequisites

Using the PyTorch NGC Container requires the host system to have the following installed:

For supported versions, see the Framework Containers Support Matrix and the NVIDIA Container Toolkit Documentation.

No other installation, compilation, or dependency management is required. It is not necessary to install the NVIDIA CUDA Toolkit.

The PyTorch NGC Container is optimized to run on NVIDIA DGX Foundry and NVIDIA DGX SuperPOD managed by NVIDIA Base Command Platform. Please refer to the Base Command Platform User Guide to learn more about running workloads on BCP clusters.

Running PyTorch Using Docker

To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.

Running PyTorch Using Base Command Platform

Jobs using the Pytorch NGC Container on Base Command Platform clusters can be launched either by using the NGC CLI tool or by using the Base Command Platform Web UI. To use the NGC CLI tool, configure the Base Command Platform user, team, organization, and cluster information using the ngc config command as described here.

An example command to launch the container on a single-GPU instance is:

ngc batch run --name "My-1-GPU-pytorch-job" --instance dgxa100.80g.1.norm --commandline "sleep infinity" --result /results --image "nvidia/pytorch:22.08-py3"

An example command to launch a two-node distributed job with a total runtime of 10 minutes (600 seconds) is:

ngc batch run --name "My-2-node-pytorch-job" --preempt RUNONCE --total-runtime 600s --instance dgxa100.80g.8.norm --commandline "sleep infinity" --result /results --array-type "PYTORCH" --replicas "2" --image "nvidia/pytorch:22.08-py3"

The PyTorch container includes JupyterLab in it and can be invoked as part of the job command for easy access to the container and exploring the capabilities of the container. Example to invoke JupyterLab as part of the job run on a single DGX node is:

ngc batch run --name "My-1-node-pytorch-jupyterlab-job" --instance dgxa100.80g.8.norm --commandline "jupyter lab --allow-root --ip=* --port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/ & sleep infinity" --result /results --image "nvidia/pytorch:22.08-py3"

What Is In This Container?

For the full list of contents, see the PyTorch Container Release Notes.

This container image contains the complete source of the version of PyTorch in /opt/pytorch. It is pre-built and installed in Conda default environment (/opt/conda/lib/python3.8/site-packages/torch/) in the container image. Visit pytorch.org to learn more about PyTorch.

The NVIDIA PyTorch Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration:

CUDA
cuBLAS
NVIDIA cuDNN
NVIDIA NCCL (optimized for NVLink)
RAPIDS
NVIDIA Data Loading Library (DALI)
TensorRT
Torch-TensorRT

The software stack in this container has been validated for compatibility, and does not require any additional installation or compilation from the end user. This container can help accelerate your deep learning workflow from end to end.

Link to Open Source Code

ETL

NVIDIA Data Loading Library (DALI) is designed to accelerate data loading and preprocessing pipelines for deep learning applications by offloading them to the GPU. DALI primary focuses on building data preprocessing pipelines for image, video, and audio data. These pipelines are typically complex and include multiple stages, leading to bottlenecks when run on CPU. Use this container to get started on accelerating data loading with DALI.

RAPIDS is a suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPU. RAPIDS focuses on common data preparation tasks for analytics and data science. The RAPIDS API is built to mirror commonly used data processing libraries like pandas, thus providing massive speedups with minor changes to a preexisting codebase. Use this container to get started on accelerating your data science pipelines with RAPIDS.

Training

NVIDIA CUDA Deep Neural Network Library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The version of PyTorch in this container is precompiled with cuDNN support, and does not require any additional configuration.

NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node communication primitives for NVIDIA GPUs and networking that take into account system and network topology. NCCL is integrated with PyTorch as a torch.distributed backend, providing implementations for broadcast, all_reduce, and other algorithms.

Inference

TensorRT is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module.

Security CVEs

To review known CVEs on this image, refer to the Security Scanning tab on this page.

License

By pulling and using the container, you accept the terms and conditions of this End User License Agreement.

PyTorch | NVIDIA NGC (2024)