Release Notes

This document provides information about the 3.6.2 release for the Ultima Enterprise product running on Baremetal, VMware vSphere, AWS and GCP.

What’s New

The release has the following software versions:

  • Kubernetes 1.25.14

  • CRI-O version 1.25

  • Kubevirt version 0.58.0

  • Rocky OS 8.6

Resource Maximums

The release supports the following maximum number of resources (per node):

Resources (per node)

Pods

110

Volumes

2048(total)/64(active)*

Storage controllers

64

Remote Storage Controllers

64

The maximum number of volumes on a node is 2048. This number is equal to the sum of all mirrors of all volumes, snapshots, linked clones, etc. A snapshot is considered a volume.

The maximum number of snapshots per volume is 16.

The maximum number of linked-clone volumes on a snapshot is up to the maximum number of volumes on a node.

The number of volumes exposed to a host is 64 (active volumes). The number of volumes a node can serve as a target is 64. Therefore, at any time, a node can expose 64 volumes to the host, as well as serve 64 volumes as a target. For example, in a three-node cluster:

  • The number of simple volumes (single mirror): 6K

  • The number of 2-way mirrored volumes: 3K

  • The number of 3-way mirrored volumes: 2K

Release Requirements

Diamanti 3.6.2 uses Diamanti OS Release rocky 8.6.0-44 on

  • Baremetal machine

  • AWS AMI

  • GCP VM Image

  • VMware vSphere OVA package.

The following machine types are supported as Diamanti cluster nodes:

AWS

Machine Types

vCPU

Memory

m5d.16xlarge

64

256GiB

i4i.16xlarge

64

512GiB

GCP

Machine Types

vCPU

Memory

n1-highmem-32

32

208G

n1-standard-32

32

120G

Supported Regions

This section lists the supported regions in cloud for Diamanti 3.6.2 release.

AWS

us-east-1

ca-central-1

eu-west-3

ap-southeast-2

us-east-2

eu-central-1

eu-north-1

ap-northeast-1

us-west-1

eu-west-1

ap-south-1

ap-northeast-2

us-west-2

eu-west-2

ap-southeast-1

ap-northeast-3

GCP

us-central1

europe-central2

asia-east1

asia-southeast1

us-east1

europe-north1

asia-east2

asia-southeast2

us-east4

europe-west1

asia-northeast1

australia-southeast1

us-west1

europe-west2

asia-northeast2

australia-southeast2

us-west2

europe-west3

asia-northeast3

northamerica-northeast1

us-west3

europe-west4

asia-south1

northamerica-northeast2

us-west4

europe-west6

asia-south2

southamerica-east1

Known Issues

This section lists the known issues for the Diamanti 3.6.2 release.

Summary: A pod with a volume may get stuck during termination.

Description: On pod termination, kubelet sometimes fails to unmount CSI volumes and the following log is seen in kubelet:

1E0422 13:28:12.214727 1861 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "persistentvolumemount-6wg4ywt2xwwgkyygj6tyo2xhrm" (UniqueName: "kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-97172dfa-9e15-11eb-8ecd-26ee97ae76aa") pod "dc3f11b7-d671-464c-a734-65e1760ed7df" (UID: "dc3f11b7-d671-464c-a734-65e1760ed7df") : UnmountVolume.NewUnmounter failed for volume "persistentvolumemount-6wg4ywt2xwwgkyygj6tyo2xhrm" (UniqueName: "kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-97172dfa-9e15-11eb-8ecd-26ee97ae76aa") pod "dc3f11b7-d671-464c-a734-65e1760ed7df" (UID: "dc3f11b7-d671-464c-a734-65e1760ed7df") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/dc3f11b7-d671-464c-a734-65e1760ed7df/volumes/kubernetes.io~csi/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/dc3f11b7-d671-464c-a734-65e1760ed7df/volumes/kubernetes.io~csi/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/vol\_data.json]: open /var/lib/kubelet/pods/dc3f11b7-d671-464c-a734-65e1760ed7df/volumes/kubernetes.io~csi/pvc-b27ae1f5-bfa6-4951-aebb-ff33280796df/vol\_data.json: no such file or directory

This is a known Kubernetes issue (https://github.com/kubernetes/kubernetes/issues/101378). kubelet logs shows “UnmountVolume.NewUnmounter failed for volume”(https://github.com/kubernetes/kubernetes/issues/101911). Once this bug is triggered the Pod is stuck in Terminating state and the above error message is seen continuously in the kubelet log.

Workaround: Delete the pod forcefully. This will not affect volume since the volume is not in use anymore.

Summary: It may take longer for the kubevirt VM to get terminated before it comes back up when the node is rebooted; or restarted on a failure.

Description: Based on the current behavior of Kubevirt, VM does not switch to another node unless there is a clean shutdown/reboot.

Workaround: In this case, the VM pod has to be forced-deleted.

You can take the following steps to recover from this. Check that the virtual machine is terminating

$ kubectl get pod

NAME                         READY   STATUS        RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
virt-launcher-centos-kbd95   1/1     Terminating   0          41m   172.46.0.10   static-n3   <none>           1/1

Run the following command, to force delete the VM, . After it is deleted forcefully, the pod enters container creation mode.

$kubectl delete pod virt-launcher-centos-kbd95 --force

Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "virt-launcher-centos-kbd95" force deleted
[diamanti@static-n1 kubevirt-vm-spec]$ kgp
NAME                         READY   STATUS              RESTARTS   AGE   IP       NODE        NOMINATED NODE   READINESS GATES
virt-launcher-centos-zh7sw   0/1     ContainerCreating   0          6s    <none>   static-n1   <none>           1/1

During the container creation, the volume attachment may stay in the creating state for upto 6 minutes until it is deleted by Kubernetes. Manually deleting volume attachments will speed up pod startup.

Note

Delete the volume attachment only if the node is failed node. In this case, static-n3 is the failed node and static-n1 is the new node on which new VM pod is scheduled, so it is safe to delete the attachment.

kubectl delete volume attachment csi-d5f636881867fd7be98e44eea6fe5ce5a0fe68fe4c9beef19da829128d17a892

On node static-n1, the pod enters the running state.

$kubectl get pod

NAME                         READY   STATUS    RESTARTS   AGE     IP           NODE        NOMINATED NODE   READINESS GATES
virt-launcher-centos-zh7sw   1/1     Running   0          6m31s   172.46.8.7   static-n1   <none>           1/1

Summary: A kubevirt/virtio-container-disk image does not load by default in an air-gapped cluster.

Description: The image kubevirt/virtio-container-disk for virtio drivers is not loaded by default in an air-gapped cluster when a VM is made using an ISO file in Kubevirt.

Workaround: Manually load the kubevirt/virtio-container-disk image on all the nodes that have Kubevirt enabled.

Summary: Since shutdown is not supported so if the node is shutdown, a plex becomes unusable.

Description: Since shutdowns are not supported, the mirror plex present on the shutdown node needs to be removed from the volume and then added again.

Workaround: Nodes shutting down may result in unusable plexes that needs to be removed from the mirrored volume and drives that need to be formatted. A new plex can be added back to maintain the number of plexes on the volume. For more information. see Recovery from node Shutdown for AWS and Recovery from node Shutdown for GCP.

Summary: Rebooting may cause a target plex of a mirrored volume to go out of sync.

Description: Target plexes of mirrored volumes go out of sync if one of the target nodes is rebooted, owing to inconsistencies with the other plexes of the same mirrored volume.

Workaround: You need to detach and attach the plex back and let it resynchronize.

This example shows how to find the plex that is out of sync and how to detach and reconnect it. Identify the volume that is out of sync by describing the volume.

 $ dctl volume describe test-vol8

 Name                                          : test-vol8
 Size                                          : 38.3GB
 Encryption                                    : false
 Node                                          : [ip-172-31-1-245.ec2.internal ip-172-31-3-166.ec2.internal ip-172-31-2-171.ec2.internal]
 Label                                         : diamanti.com/pod-name=default/test-vol8-attached-manually
 Node Selector                                 : mirror=true
 Phase                                         : Available
 Status                                        : Down
 Attached-To                                   : ip-172-31-3-166.ec2.internal
 Device Path                                   : /dev/nvme8n1
 Age                                           : 0d:2h:27m
 Perf-Tier                                     : best-effort
 Mode                                          : Filesystem
 Fs-Type                                       : ext4
 Scheduled Plexes / Actual Plexes              : 3/3

Plexes:

    NAME           NODES                          STATE     CONDITION   OUT-OF-SYNC-AGE   RESYNC-PROGRESS   DELETE-PROGRESS

    ----           -----                          -----     ---------   ---------------   ---------------   ---------------
    test-vol8.p0   ip-172-31-1-245.ec2.internal   Up        OutOfSync    0d:0h:13m
    test-vol8.p1   ip-172-31-3-166.ec2.internal   Up        InSync
    test-vol8.p2   ip-172-31-2-171.ec2.internal   Up        InSync
[diamanti@softserv84 bin]$

Delete the plex that is out of sync using the command below
$ dctl volume plex-detach test-vol8 p0
Attach the plex back using the command below
$ dctl volume plex-attach  test-vol8   p0