Actions
Support #18606
closedGPU support on tordo cluster
Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Story points:
-
Description
Support GPUs on tordo:
- Build compute image with nvidia support
- Add instance type g4dn.xlarge to configuration
instance / GPUS / vCPU / Mem / GPU Mem / storage / network g4dn.xlarge 1 4 16 16 125 Up to 25
- CUDA section
- DeviceCount: 1
- DriverVersion: "11.4" (the version that was installed in the compute image)
- HardwareCapability: "7.5"
- this document states that g4 nodes have a T4 GPU: https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html
- this document stating that T4 GPUs have a compute capability of 7.5: https://developer.nvidia.com/cuda-gpus
- Check that "container shell" feature is enabled on tordo
Updated by Peter Amstutz about 3 years ago
- Description updated (diff)
- Subject changed from GPU support on dev cluster to GPU support on tordo cluster
Updated by Ward Vandewege about 3 years ago
- Status changed from New to In Progress
Updated by Ward Vandewege about 3 years ago
- Related to Feature #18325: Option to include CUDA tooling in cloud compute image added
Updated by Ward Vandewege about 3 years ago
- Related to Story #15957: GPU support added
Updated by Ward Vandewege about 3 years ago
The new image is in place, and this entry was added to `config.yml`:
g4dnxlarge: ProviderType: g4dn.xlarge VCPUs: 4 RAM: 16GiB IncludedScratch: 125GB Price: 0.526 CUDA: DriverVersion: "11.4" HardwareCapability: "7.5" DeviceCount: 1
Diagnostics are running at
https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-g5nv9tz7ez9vn2a#Status
Container shell support was already enabled.
Updated by Ward Vandewege about 3 years ago
- Status changed from In Progress to Resolved
The new image is in place, and this entry was added to `config.yml`:
g4dnxlarge: ProviderType: g4dn.xlarge VCPUs: 4 RAM: 16GiB IncludedScratch: 125GB Price: 0.526 CUDA: DriverVersion: "11.4" HardwareCapability: "7.5" DeviceCount: 1
Diagnostics completed successfully at
https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-g5nv9tz7ez9vn2a#Status
Container shell support was already enabled.
Actions