Story #17240
Updated by Peter Amstutz almost 4 years ago
https://docs.nvidia.com/deploy/cuda-compatibility/index.html
h1. Nvidia says
The CUDA software environment consists of three parts:
* CUDA Toolkit (libraries, CUDA runtime and developer tools) - User-mode SDK used to build CUDA applications
* CUDA driver - User-mode driver component used to run CUDA applications (such as libcuda.so on Linux systems)
* NVIDIA GPU device driver - Kernel-mode driver component for NVIDIA GPUs
On Linux systems, the CUDA driver and kernel mode components are delivered together in the NVIDIA display driver package. This is shown in Figure 1.
...
1.3. Binary Compatibility
We define binary compatibility as a set of guarantees provided by the library, where an application targeting the said library will continue to work when dynamically linked against a different version of the library.
The CUDA Driver API has a versioned C-style ABI, which guarantees that applications that were running against an older driver (for example CUDA 3.2) will still run and function correctly against a modern driver (for example one shipped with CUDA 11.0). This is a stronger contract than an API guarantee - an application might need to change its source when recompiling against a newer SDK, but replacing the driver with a newer version will always work.
The CUDA Driver API thus is binary-compatible (the OS loader can pick up a newer version and the application continues to work) but not source-compatible (rebuilding your application against a newer SDK might require source changes). In addition, the binary-compatibility is in one direction: backwards.
...
Each version of the CUDA Toolkit (and runtime) requires a minimum version of the NVIDIA driver. The CUDA driver (libcuda.so on Linux for example) included in the NVIDIA driver package, provides binary backward compatibility. For example, an application built against the CUDA 3.2 SDK will continue to function even on today’s driver stack. On the other hand, the CUDA runtime has not provided either source or binary compatibility guarantees. Newer major and minor versions of the CUDA runtime have frequently changed the exported symbols, including their version or even their availability, and the dynamic form of the library has its shared object name (.SONAME in Linux-based systems) change every minor version.
h1. Notes
Inside the container: must include the correct nvidia runtime (if dynamically linked) or the application must be statically linked.
* runtime requires a minimum version of the driver -- this should be declared as a requirement
* nvidia-smi tells us some stuff?
* cubins (programs compiled directly for a GPU) target a specific "compute" capability and are only backwards compatible across minor revisions.
* required libraries: libcuda.so.* - the CUDA Driver
* required libraries: libnvidia-ptxjitcompiler.so.*
* Singularity support for bind mounting the nvidia driver exists. It apparently requires @nvidia-container-cli@
https://github.com/NVIDIA/libnvidia-container