This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Workload management

Common tasks for managing workloads.

1: Deploy test workload
2: Add an ingress controller
3: Using NVIDIA GPU Operator with EKS Anywhere
4:
5:

1 - Deploy test workload

How to deploy a workload to check that your cluster is working properly

We’ve created a simple test application for you to verify your cluster is working properly. You can deploy it with the following command:

kubectl apply -f "https://anywhere.eks.amazonaws.com/manifests/hello-eks-a.yaml"

To see the new pod running in your cluster, type:

kubectl get pods -l app=hello-eks-a

Example output:

NAME                                     READY   STATUS    RESTARTS   AGE
hello-eks-a-745bfcd586-6zx6b   1/1     Running   0          22m

To check the logs of the container to make sure it started successfully, type:

kubectl logs -l app=hello-eks-a

There is also a default web page being served from the container. You can forward the deployment port to your local machine with

kubectl port-forward deploy/hello-eks-a 8000:80

Now you should be able to open your browser or use curl to http://localhost:8000 to view the page example application.

curl localhost:8000

Example output:

⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢

Thank you for using

███████╗██╗  ██╗███████╗
██╔════╝██║ ██╔╝██╔════╝
█████╗  █████╔╝ ███████╗
██╔══╝  ██╔═██╗ ╚════██║
███████╗██║  ██╗███████║
╚══════╝╚═╝  ╚═╝╚══════╝

 █████╗ ███╗   ██╗██╗   ██╗██╗    ██╗██╗  ██╗███████╗██████╗ ███████╗
██╔══██╗████╗  ██║╚██╗ ██╔╝██║    ██║██║  ██║██╔════╝██╔══██╗██╔════╝
███████║██╔██╗ ██║ ╚████╔╝ ██║ █╗ ██║███████║█████╗  ██████╔╝█████╗  
██╔══██║██║╚██╗██║  ╚██╔╝  ██║███╗██║██╔══██║██╔══╝  ██╔══██╗██╔══╝  
██║  ██║██║ ╚████║   ██║   ╚███╔███╔╝██║  ██║███████╗██║  ██║███████╗
╚═╝  ╚═╝╚═╝  ╚═══╝   ╚═╝    ╚══╝╚══╝ ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚══════╝

You have successfully deployed the hello-eks-a pod hello-eks-a-c5b9bc9d8-qp6bg

For more information check out
https://anywhere.eks.amazonaws.com

⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢

If you would like to expose your applications with an external load balancer or an ingress controller, you can follow the steps in Adding an external load balancer .

2 - Add an ingress controller

How to deploy an ingress controller for simple host or URL-based HTTP routing into workload running in EKS-A

While you are free to use any Ingress Controller you like with your EKS Anywhere cluster, AWS currently only supports Emissary Ingress. For information on how to configure a Emissary Ingress curated package for EKS Anywhere, see the Add Emissary Ingress page. You may also reference the official emissary documentation for further configuration details. Operators can also leverage the CNI chaining feature from Isovalent where in both Cilium as the CNI and another CNI can work in a chain mode .

Setting up Emissary-ingress for Ingress Controller

Deploy the Hello EKS Anywhere test application.

kubectl apply -f "https://anywhere.eks.amazonaws.com/manifests/hello-eks-a.yaml"

Set up a load balancer: Set up MetalLB Load Balancer by following the instructions here
Install Emissary Ingress: Follow the instructions here Add Emissary Ingress

Create Emissary Listeners on your cluster (This is a one time setup).

kubectl apply -f - <<EOF
---
apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: http-listener
  namespace: default
spec:
  port: 8080
  protocol: HTTP
  securityModel: XFP
  hostBinding:
    namespace:
      from: ALL
---
apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: https-listener
  namespace: default
spec:
  port: 8443
  protocol: HTTPS
  securityModel: XFP
  hostBinding:
    namespace:
      from: ALL
EOF

Create a Mapping, and Host for your cluster. This Mapping tells Emissary-ingress to route all traffic inbound to the /hello/ path to the Hello EKS Anywhere Service. The name of your hello-eks-anywhere service will be the same as the package name.
```
kubectl apply -f - <<EOF
---
apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
  name: hello-backend
  labels:
    examplehost: host 
spec:
  prefix: /hello/
  service: hello-eks-a
  hostname: "*"
EOF
```
Store the Emissary-ingress load balancer IP address to a local environment variable. You will use this variable to test accessing your service. You can find this if you’re using a setup with MetalLB by finding the namespace you launched your emissary service in, and finding the external IP from the service.
```
emissary-cluster        LoadBalancer   10.100.71.222   195.16.99.64   80:31794/TCP,443:31200/TCP

export EMISSARY_LB_ENDPOINT=195.16.99.64
```
Test the configuration by accessing the service through the Emissary-ingress load balancer.
```
curl -Lk http://$EMISSARY_LB_ENDPOINT/hello/
```
⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢

Thank you for using

███████╗██╗ ██╗███████╗ ██╔════╝██║ ██╔╝██╔════╝ █████╗ █████╔╝ ███████╗ ██╔══╝ ██╔═██╗ ╚════██║ ███████╗██║ ██╗███████║ ╚══════╝╚═╝ ╚═╝╚══════╝

█████╗ ███╗ ██╗██╗ ██╗██╗ ██╗██╗ ██╗███████╗██████╗ ███████╗ ██╔══██╗████╗ ██║╚██╗ ██╔╝██║ ██║██║ ██║██╔════╝██╔══██╗██╔════╝ ███████║██╔██╗ ██║ ╚████╔╝ ██║ █╗ ██║███████║█████╗ ██████╔╝█████╗ ██╔══██║██║╚██╗██║ ╚██╔╝ ██║███╗██║██╔══██║██╔══╝ ██╔══██╗██╔══╝ ██║ ██║██║ ╚████║ ██║ ╚███╔███╔╝██║ ██║███████╗██║ ██║███████╗ ╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═╝ ╚══╝╚══╝ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚══════╝

You have successfully deployed the hello-eks-a pod hello-eks-anywhere-95fb65657-vk9rz

For more information check out https://anywhere.eks.amazonaws.com

Amazon EKS Anywhere Run EKS in your datacenter version: v0.1.2-11d92fc1e01c17601e81c7c29ea4a3db232068a8

⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢⬡⬢

3 - Using NVIDIA GPU Operator with EKS Anywhere

How to use the NVIDIA GPU Operator with EKS Anywhere on bare metal

The NVIDIA GPU Operator allows GPUs to be exposed to applications in Kubernetes clusters much like CPUs. Instead of provisioning a special OS image for GPU nodes with the required drivers and dependencies, a standard OS image can be used for both CPU and GPU nodes. The NVIDIA GPU Operator can be used to provision the required software components for GPUs such as the NVIDIA drivers, Kubernetes device plugin for GPUs, and the NVIDIA Container Toolkit. See the licensing section of the NVIDIA GPU Operator documentation for information on the NVIDIA End User License Agreements.

In the example on this page, a single-node EKS Anywhere cluster on bare metal is used with an Ubuntu 20.04 image produced from image-builder without modifications and Kubernetes version 1.27.

1. Configure an EKS Anywhere cluster spec and hardware inventory

See the Configure for Bare Metal page and the Prepare hardware inventory page for details. If you use cluster spec sample below is used, your hardware inventory definition must have type=cp for the labels field in the hardware inventory for your server.

Expand for a sample cluster spec

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: gpu-test
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
  controlPlaneConfiguration:
    count: 1
    endpoint:
      host: "<my-cp-ip>"
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: gpu-test-cp
  datacenterRef:
    kind: TinkerbellDatacenterConfig
    name: gpu-test
  kubernetesVersion: "1.27"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
  name: gpu-test
spec:
  tinkerbellIP: "<my-tb-ip>"
  osImageURL: "https://<url-for-image>/ubuntu.gz"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: gpu-test-cp
spec:
  hardwareSelector: {type: "cp"}
  osFamily: ubuntu
  templateRef: {}

2. Create a single-node EKS Anywhere cluster

Replace hardware.csv with the name of your hardware inventory file
Replace cluster.yaml with the name of your cluster spec file

eksctl anywhere create cluster --hardware hardware.csv -f cluster.yaml

Expand for sample output

Warning: The recommended number of control plane nodes is 3 or 5
Warning: No configurations provided for worker node groups, pods will be scheduled on control-plane nodes
Performing setup and validations
Private key saved to gpu-test/eks-a-id_rsa. Use 'ssh -i gpu-test/eks-a-id_rsa <username>@<Node-IP-Address>' to login to your cluster node
✅ Tinkerbell Provider setup is valid
✅ Validate OS is compatible with registry mirror configuration
✅ Validate certificate for registry mirror
✅ Validate authentication for git provider
Creating new bootstrap cluster
Provider specific pre-capi-install-setup on bootstrap cluster
Installing cluster-api providers on bootstrap cluster
Provider specific post-setup
Creating new workload cluster
Installing networking on workload cluster
Creating EKS-A namespace
Installing cluster-api providers on workload cluster
Installing EKS-A secrets on workload cluster
Installing resources on management cluster
Moving cluster management from bootstrap to workload cluster
Installing EKS-A custom components (CRD and controller) on workload cluster
Installing EKS-D components on workload cluster
Creating EKS-A CRDs instances on workload cluster
Installing GitOps Toolkit on workload cluster
GitOps field not specified, bootstrap flux skipped
Writing cluster config file
Deleting bootstrap cluster
🎉 Cluster created!

3. Install Helm

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \ 
  && chmod 700 get_helm.sh \ 
  && ./get_helm.sh

4. Add NVIDIA Helm Repository

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ 
  && helm repo update

5. Configure kubectl to use EKS Anywhere cluster

Replace <path-to-cluster-folder> with the directory location where your EKS Anywhere cluster folder is located. This is typically in the same directory in which the eksctl anywhere command was run.
Replace <cluster-name> with the name of your cluster.

KUBECONFIG=<path-to-cluster-folder>/<cluster-name>-eks-a-cluster.kubeconfig

6. Install NVIDIA GPU Operator

helm install --wait --generate-name \ 
  -n gpu-operator --create-namespace \ 
  nvidia/gpu-operator

7. Validate the operator was installed successfully

kubectl get pods -n gpu-operator

NAME                                                              READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-6djnw                                       1/1     Running     0          5m25s
gpu-operator-1691443998-node-feature-discovery-master-55cfkzbl5   1/1     Running     0          5m55s
gpu-operator-1691443998-node-feature-discovery-worker-dw8m7       1/1     Running     0          5m55s
gpu-operator-59f96d7646-7zcn4                                     1/1     Running     0          5m55s
nvidia-container-toolkit-daemonset-c2mdf                          1/1     Running     0          5m25s
nvidia-cuda-validator-6m4kg                                       0/1     Completed   0          3m41s
nvidia-dcgm-exporter-jw5wz                                        1/1     Running     0          5m25s
nvidia-device-plugin-daemonset-8vjrn                              1/1     Running     0          5m25s
nvidia-driver-daemonset-6hklg                                     1/1     Running     0          5m36s
nvidia-operator-validator-2pvzx                                   1/1     Running     0          5m25s

8. Validate GPU specs

kubectl get node -o json | jq '.items[].metadata.labels'

{
... 
  "nvidia.com/cuda.driver.major": "535",
  "nvidia.com/cuda.driver.minor": "86",
  "nvidia.com/cuda.driver.rev": "10",
  "nvidia.com/cuda.runtime.major": "12",
  "nvidia.com/cuda.runtime.minor": "2",
  "nvidia.com/gfd.timestamp": "1691444179",
  "nvidia.com/gpu-driver-upgrade-state": "upgrade-done",
  "nvidia.com/gpu.compute.major": "7",
  "nvidia.com/gpu.compute.minor": "5",
  "nvidia.com/gpu.count": "2",
  "nvidia.com/gpu.deploy.container-toolkit": "true",
  "nvidia.com/gpu.deploy.dcgm": "true",
  "nvidia.com/gpu.deploy.dcgm-exporter": "true",
  "nvidia.com/gpu.deploy.device-plugin": "true",
  "nvidia.com/gpu.deploy.driver": "true",
  "nvidia.com/gpu.deploy.gpu-feature-discovery": "true",
  "nvidia.com/gpu.deploy.node-status-exporter": "true",
  "nvidia.com/gpu.deploy.nvsm": "",
  "nvidia.com/gpu.deploy.operator-validator": "true",
  "nvidia.com/gpu.family": "turing",
  "nvidia.com/gpu.machine": "PowerEdge-R7525",
  "nvidia.com/gpu.memory": "15360",
  "nvidia.com/gpu.present": "true",
  "nvidia.com/gpu.product": "Tesla-T4",
  "nvidia.com/gpu.replicas": "1",
  "nvidia.com/mig.capable": "false",
  "nvidia.com/mig.strategy": "single"
}

9. Run Sample App

Create a gpu-pod.yaml file with the following and apply it to the cluster

apiVersion: v1 
kind: Pod 
metadata: 
  name: gpu-pod 
spec: 
  restartPolicy: Never 
  containers: 
   - name: cuda-container 
     image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 
     resources: 
       limits: 
         nvidia.com/gpu: 1 # requesting 1 GPU 
  tolerations: 
    - key: nvidia.com/gpu operator: Exists 
      effect: NoSchedule

kubectl apply -f gpu-pod.yaml

10. Confirm Sample App Succeeded

kubectl logs gpu-pod

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

4 -

Warning: The recommended number of control plane nodes is 3 or 5
Warning: No configurations provided for worker node groups, pods will be scheduled on control-plane nodes
Performing setup and validations
Private key saved to gpu-test/eks-a-id_rsa. Use 'ssh -i gpu-test/eks-a-id_rsa <username>@<Node-IP-Address>' to login to your cluster node
✅ Tinkerbell Provider setup is valid
✅ Validate OS is compatible with registry mirror configuration
✅ Validate certificate for registry mirror
✅ Validate authentication for git provider
Creating new bootstrap cluster
Provider specific pre-capi-install-setup on bootstrap cluster
Installing cluster-api providers on bootstrap cluster
Provider specific post-setup
Creating new workload cluster
Installing networking on workload cluster
Creating EKS-A namespace
Installing cluster-api providers on workload cluster
Installing EKS-A secrets on workload cluster
Installing resources on management cluster
Moving cluster management from bootstrap to workload cluster
Installing EKS-A custom components (CRD and controller) on workload cluster
Installing EKS-D components on workload cluster
Creating EKS-A CRDs instances on workload cluster
Installing GitOps Toolkit on workload cluster
GitOps field not specified, bootstrap flux skipped
Writing cluster config file
Deleting bootstrap cluster
🎉 Cluster created!

5 -

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: gpu-test
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
  controlPlaneConfiguration:
    count: 1
    endpoint:
      host: "<my-cp-ip>"
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: gpu-test-cp
  datacenterRef:
    kind: TinkerbellDatacenterConfig
    name: gpu-test
  kubernetesVersion: "1.27"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
  name: gpu-test
spec:
  tinkerbellIP: "<my-tb-ip>"
  osImageURL: "https://<url-for-image>/ubuntu.gz"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: gpu-test-cp
spec:
  hardwareSelector: {type: "cp"}
  osFamily: ubuntu
  templateRef: {}