Using the Whole NVIDIA GPU Card for an Application¶

This section describes how to allocate the entire NVIDIA GPU card to a single application on the DCE 5.0 platform.

Prerequisites¶

DCE 5.0 container management platform has been deployed and is running properly.
The container management module has been connected to a Kubernetes cluster or a Kubernetes cluster has been created, and you can access the UI interface of the cluster.
GPU Operator has been offline installed and NVIDIA DevicePlugin has been enabled on the current cluster. Refer to Offline Installation of GPU Operator for instructions.
The GPU card in the current cluster has not undergone any virtualization operations or been occupied by other applications.

Procedure¶

Configuring via the User Interface¶

Check if the cluster has detected the GPU cards. Click the corresponding Clusters -> Cluster Settings -> Addon Plugins to see if it has automatically enabled and detected the corresponding GPU types. Currently, the cluster will automatically enable GPU and set the GPU Type as Nvidia GPU .
Deploy a workload. Click the corresponding Clusters -> Workloads , and deploy the workload using the image method. After selecting the type ( Nvidia GPU ), configure the number of physical cards used by the application:

Physical Card Count (nvidia.com/gpu): Indicates the number of physical cards that the current pod needs to mount. The input value must be an integer and less than or equal to the number of cards on the host machine.

If the above value is configured incorrectly, scheduling failures and resource allocation issues may occur.

Configuring via YAML¶

To request GPU resources for a workload, add the nvidia.com/gpu: 1 parameter to the resource request and limit configuration in the YAML file. This parameter configures the number of physical cards used by the application.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: full-gpu-demo
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: full-gpu-demo
  template:
    metadata:
      labels:
        app: full-gpu-demo
    spec:
      containers:
      - image: chrstnhntschl/gpu_burn
        name: container-0
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # Number of GPUs requested
          limits:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # Upper limit of GPU usage
      imagePullSecrets:
      - name: default-secret

Note

When using the nvidia.com/gpu parameter to specify the number of GPUs, the values for requests and limits must be consistent.