June 27, 2017 by James

VMWare consolidation issue – Could not open/create change tracking file.

Troubleshooting a VM that was failing snapshot consolidation with the error:

b372d49a-dfc9-b8ac6f98410c/GUESTVM/GUESTVM_1-ctk.vmdk: Could not open/create change tracking file. 2017-05-23T07:51:36.671Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/4ea7e864-b372d49a-dfc9-b8ac6f98410c/GUESTVM/GUESTVM_1-flat.vmdk" : closed.

Ordinarily in this case, the fix involves moving the ctk file to a temp directory, which would allow disk consolidation to complete, as detailed here: VMware: Unable to consolidate VM

However, on this occasion, the VM still wouldn’t start:

An error was received from the ESX host while powering on VM GUESTVM.
Failed to start the virtual machine.
Module DiskEarly power on failed.
Cannot open the disk '/vmfs/volumes/4ea7e864-b372d49a-dfc9-b8ac6f98410c/GUESTVM/GUESTVM_2.vmdk' or one of the snapshot disks it depends on.
Could not open/create change tracking file.

SSH onto the host and looking at the /vmfs/LUN/hostname folder, there were lots less files due to the consolidation.

However, one ctk file remained, and attempting to (harmlessly) modify it failed:

/vmfs/volumes/4ea7e864-b372d49a-dfc9-b8ac6f98410c/GUESTVM # touch *
touch: GUESTVM-ctk.vmdk.bak: Device or resource busy
touch: GUESTVM_1-000001-delta.vmdk: Device or resource busy
touch: GUESTVM_2-000001-delta.vmdk: Device or resource busy

using ps and lsof didn’t offer up any clues, but using vmkfstools it was possible to determine the cause of the lock:

/vmfs/volumes/4ea7e864-b372d49a-dfc9-b8ac6f98410c/GUESTVM # vmkfstools -D GUESTVM_2-000001-delta.vmdk

Lock [type 10c00001 offset 33779712 v 37184, hb offset 3207168 gen 15, mode 2, owner 00000000-00000000-0000-000000000000 mtime 3479469 num 1 gblnum 0 gblgen 0 gblbrk 0] RO Owner[0] HB Offset 3448832 5787aab6-a86eeafb-d2d5-d067e5f051a2 Addr <4, 41, 38>, gen 36627, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 16883712, nb 17 tbz 0, cow 0, newSinceEpoch 17, zla 1, bs 1048576

/vmfs/volumes/4ea7e864-b372d49a-dfc9-b8ac6f98410c/GUESTVM # vmkfstools -D GUESTVM_1-000001-delta.vmdk

Lock [type 10c00001 offset 33804288 v 36698, hb offset 3207168 gen 15, mode 1, owner 5787befc-b348adf2-2ea5-001018f4ef3c mtime 30845193 num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr <4, 41, 50>, gen 36689, links 1, type reg, flags 0, uid 0, gid 0, mode 600 len 18878464, nb 19 tbz 0, cow 0, newSinceEpoch 19, zla 1, bs 1048576

/vmfs/volumes/4ea7e864-b372d49a-dfc9-b8ac6f98410c/GUESTVM # vmkfstools -D GUESTVM-ctk.vmdk.bak

Lock [type 10c00001 offset 33886208 v 79047, hb offset 3207168 gen 15, mode 1, owner 5787befc-b348adf2-2ea5-001018f4ef3c mtime 107885874 num 0 gblnum 0 gblgen 0 gblbrk 0] Addr <4, 41, 90>, gen 79036, links 1, type reg, flags 0, uid 0, gid 0, mode 600len 2621952, nb 3 tbz 0, cow 0, newSinceEpoch 3, zla 1, bs 1048576

The highlighted sections are MAC addresses.  Using the C client HOST/CONFIGURATION/NETWORK ADAPTERS page, it was possible to identity the host with the lock.  The host was then evacuated and rebooted.  One lock file remained, and of course we could boot the vm up on that host, to potentially save emptying and rebooting that.  This process worked and the VM was successfully started up.

Manual VM load rebalancing was then performed.