Compute

Unable to install patch on Cisco UCS Server and Pure Storage environment

haewon83 2023. 5. 15. 16:00

 

안녕하세요. 이전에 공유드렸던 Locking Mechanism인 ATS와 관련된 내용으로 Storage 이슈로 인해 ESXi Update에 실패했던 케이스를 공유합니다.

 

[문제 증상]

vCenter에서 vSphere Lifecycle Manager를 통해 ESXi 7.0U2e 19290878에서 7.0U3d 19482537로 패치 시도 시 실패

 

[진행 내역]

1. 먼저 Update에 실패한 사례이기 때문에, 실패한 Host에서 esxupdate.log 를 확인

esxupdate.log 확인 시, 특정 Volume의 Filesystem Access 관련 오류 기록

2023-03-02T04:57:50Z esxupdate: 116246091: vmware.runcommand: INFO: runcommand called with: args = 'localcli system visorfs ramdisk list | grep /esx-update-6246091 && localcli system visorfs ramdisk remove -t /tmp/esx-update-6246091', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/esxupdt-6246091
2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/tpmesxup-6246091
2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/weaselin-6246091
2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/loadesx-6246091
 
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Traceback (most recent call last):
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Utils/HostInfo.py", line 217, in _getFileSystems
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/lib64/python3.8/site-packages/esxutils.py", line 220, in runCli
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: esxutils.EsxcliError: Errors:
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110, skipping. ### <--
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110, skipping. ### <--
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: During handling of the above exception, another exception occurred:
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Traceback (most recent call last):
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/usr/lib/vmware/esxcli-software", line 794, in <module>
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:     main()
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/usr/lib/vmware/esxcli-software", line 785, in main
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:     ret = CMDTABLE[command](options)
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/usr/lib/vmware/esxcli-software", line 252, in ProfileUpdateCmd
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:     res = t.UpdateProfileFromDepot([], opts.profile, depotUrls=depoturls,
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 365, in UpdateProfileFromDepot
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:     return t.InstallVibsFromProfile(metadataUrls, profileName, depotUrls,
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1771, in InstallVibsFromProfile
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1215, in _installVibs
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1384, in _validateAndInstallProfile
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/ImageManager/HostSeeding.py", line 995, in __init__
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/ImageManager/HostSeeding.py", line 1032, in _loadCachedVibs
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Utils/HostInfo.py", line 260, in GetVmfslFileSystems
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:   File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Utils/HostInfo.py", line 220, in _getFileSystems
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: esximage.Errors.InstallationError: Failed to query file system stats: Errors:
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110, skipping.
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110, skipping.
2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR:

 

2. 수집된 자료에서 df 결과 확인 시, esxupdate.log 메시지와 유사하게 올바르게 Filesystem 정보를 획득하지 못하는 결과 확인

df.txt
VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/60939c4a-1389d9ec-a3f8-0025b5120110
VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/60939c76-9f8ffda2-3d7e-0025b5120110
VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110
VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/630c69c1-cda00a36-05b2-0025b5120110
VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/630c69cf-30dc3244-8913-0025b5120110
Error when running esxcli, return status was: 1
Errors:
Error getting data for filesystem on '/vmfs/volumes/60939c4a-1389d9ec-a3f8-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/60939c4a-1389d9ec-a3f8-0025b5120110, skipping.
Error getting data for filesystem on '/vmfs/volumes/60939c76-9f8ffda2-3d7e-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/60939c76-9f8ffda2-3d7e-0025b5120110, skipping.
Error getting data for filesystem on '/vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110, skipping.
Error getting data for filesystem on '/vmfs/volumes/630c69c1-cda00a36-05b2-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69c1-cda00a36-05b2-0025b5120110, skipping.
Error getting data for filesystem on '/vmfs/volumes/630c69cf-30dc3244-8913-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69cf-30dc3244-8913-0025b5120110, skipping.

 

3. 최초에는 Filesystem 자체 이슈로 의심하여 Filesystem의 정합성을 확인하려 했으나, 다음과 같이 전체 Disk Device에 대해 동일한 오류가 기록된 것을 확인

# grep -B 1 "nmp_ThrottleLogForDevice:3869: H:0x7 D:0x0 P:0x0" vmkernel.log
2023-03-07T22:04:27.586Z cpu14:2098047)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0x28 (0x45b9494ac500, 2099967) to dev "naa.624a9370e793c401a188430c00014dac" on path "vmhba2:C0:T2:L252" Failed:
2023-03-07T22:04:27.586Z cpu14:2098047)NMP: nmp_ThrottleLogForDevice:3869: H:0x7 D:0x0 P:0x0 . Act:EVAL. cmdId.initiator=0x4306b6cd9b40 CmdSN 0xe3910
--
2023-03-07T22:04:57.796Z cpu0:2098047)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0x28 (0x45b919ad2f40, 2099967) to dev "naa.624a9370e793c401a188430c00019383" on path "vmhba2:C0:T2:L247" Failed:
2023-03-07T22:04:57.796Z cpu0:2098047)NMP: nmp_ThrottleLogForDevice:3869: H:0x7 D:0x0 P:0x0 . Act:EVAL. cmdId.initiator=0x4306b6f583c0 CmdSN 0xdbeee
 
오류 메시지가 발생하는 총 10개의 Physical Disk Device 목록
naa.624a9370e793c401a188430c00014dac
naa.624a9370e793c401a188430c00019383
naa.624a9370e793c401a188430c00019384
naa.624a9370e793c401a188430c00019385
naa.624a9370e793c401a188430c00014db6
naa.624a9370e793c401a188430c00019382
naa.624a9370e793c401a188430c00014da2
naa.624a9370e793c401a188430c00014daa
naa.624a9370e793c401a188430c00014daf
naa.624a9370e793c401a188430c00014dad

 

4. 추가로 Cisco UCS 서버에서 사용하는 FCoE Adapter Driver인 fnic에서 다음과 같은 오류 확인

# grep nfnic vmkernel.log
2023-03-07T22:04:27.586Z cpu2:2097472)nfnic: <2>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1829: in mismatch actual len: 1048576 residual: 0
2023-03-07T22:04:27.586Z cpu2:2097472)WARNING: nfnic: <2>: fnic_fcpio_icmnd_cmpl_handler: 1872: sc: 0x45b9494ac500 tag: 0x1e7 hdr status: FCPIO_DATA_CNT_MISMATCH IO failure!

 

5. Cisco 측에서 해당 오류 메시지를 조치하기 위해 Locking Mode를 ATS+SCSI 에서 ATS로 변경 요청

 

6. 문제는 특이하게도 아래 명령어로 조회 시 계속해서 "Locking Mode" 컬럼의 결과 값이 ATS 또는 ATS+SCSI로 변경되는 이슈를 확인

# esxcli storage vmfs lockmode list

 

7. 자세한 로그를 추가로 확인은 못하였으나, 서로 다른 버전의 Pure Storage 중 한 Pure Storage와 연결된 ESXi Host에서만 Locking Mode가 조회시 마다 변경

A. 스토리지 버전: 6.3.6 ### <-- 이슈 Storage 버전

B. 스토리지 버전: 5.3.15

 

8. 추가적인 분석은 제외하고 ESXi를 Update하기 위해서 사용하는 Datastore를 모두 비운 상태에서 Unmount 한 후, Locking Mode를 ATS로 변경하고 정상적으로 Update 진행 완료