Unable to install patch on Cisco UCS Server and Pure Storage environment
안녕하세요. 이전에 공유드렸던 Locking Mechanism인 ATS와 관련된 내용으로 Storage 이슈로 인해 ESXi Update에 실패했던 케이스를 공유합니다.
[문제 증상]
vCenter에서 vSphere Lifecycle Manager를 통해 ESXi 7.0U2e 19290878에서 7.0U3d 19482537로 패치 시도 시 실패
[진행 내역]
1. 먼저 Update에 실패한 사례이기 때문에, 실패한 Host에서 esxupdate.log 를 확인
esxupdate.log 확인 시, 특정 Volume의 Filesystem Access 관련 오류 기록
2023-03-02T04:57:50Z esxupdate: 116246091: vmware.runcommand: INFO: runcommand called with: args = 'localcli system visorfs ramdisk list | grep /esx-update-6246091 && localcli system visorfs ramdisk remove -t /tmp/esx-update-6246091', outfile = 'None', returnoutput = 'True', timeout = '0.0'. 2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/esxupdt-6246091 2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/tpmesxup-6246091 2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/weaselin-6246091 2023-03-02T04:57:51Z esxupdate: 116246091: Ramdisk: INFO: Unmounting manual tardisk /tardisks.noauto/loadesx-6246091 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Traceback (most recent call last): 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Utils/HostInfo.py", line 217, in _getFileSystems 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/lib64/python3.8/site-packages/esxutils.py", line 220, in runCli 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: esxutils.EsxcliError: Errors: 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110, skipping. ### <-- 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110, skipping. ### <-- 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: During handling of the above exception, another exception occurred: 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Traceback (most recent call last): 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/usr/lib/vmware/esxcli-software", line 794, in <module> 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: main() 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/usr/lib/vmware/esxcli-software", line 785, in main 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: ret = CMDTABLE[command](options) 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/usr/lib/vmware/esxcli-software", line 252, in ProfileUpdateCmd 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: res = t.UpdateProfileFromDepot([], opts.profile, depotUrls=depoturls, 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 365, in UpdateProfileFromDepot 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: return t.InstallVibsFromProfile(metadataUrls, profileName, depotUrls, 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1771, in InstallVibsFromProfile 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1215, in _installVibs 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1384, in _validateAndInstallProfile 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/ImageManager/HostSeeding.py", line 995, in __init__ 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/ImageManager/HostSeeding.py", line 1032, in _loadCachedVibs 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Utils/HostInfo.py", line 260, in GetVmfslFileSystems 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: File "/tmp/esx-update-6246091/lib64/python3.8/site-packages/vmware/esximage/Utils/HostInfo.py", line 220, in _getFileSystems 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: esximage.Errors.InstallationError: Failed to query file system stats: Errors: 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110, skipping. 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: Error getting data for filesystem on '/vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69df-101cb7ae-805f-0025b5120110, skipping. 2023-03-02T04:57:51Z esxupdate: 116246091: root: ERROR: |
2. 수집된 자료에서 df 결과 확인 시, esxupdate.log 메시지와 유사하게 올바르게 Filesystem 정보를 획득하지 못하는 결과 확인
df.txt VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/60939c4a-1389d9ec-a3f8-0025b5120110 VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/60939c76-9f8ffda2-3d7e-0025b5120110 VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110 VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/630c69c1-cda00a36-05b2-0025b5120110 VmFileSystem: Slow refresh failed: Unable to get FS Attrs for /vmfs/volumes/630c69cf-30dc3244-8913-0025b5120110 Error when running esxcli, return status was: 1 Errors: Error getting data for filesystem on '/vmfs/volumes/60939c4a-1389d9ec-a3f8-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/60939c4a-1389d9ec-a3f8-0025b5120110, skipping. Error getting data for filesystem on '/vmfs/volumes/60939c76-9f8ffda2-3d7e-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/60939c76-9f8ffda2-3d7e-0025b5120110, skipping. Error getting data for filesystem on '/vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69b0-b367e957-e0e3-0025b5120110, skipping. Error getting data for filesystem on '/vmfs/volumes/630c69c1-cda00a36-05b2-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69c1-cda00a36-05b2-0025b5120110, skipping. Error getting data for filesystem on '/vmfs/volumes/630c69cf-30dc3244-8913-0025b5120110': Unable to get FS Attrs for /vmfs/volumes/630c69cf-30dc3244-8913-0025b5120110, skipping. |
3. 최초에는 Filesystem 자체 이슈로 의심하여 Filesystem의 정합성을 확인하려 했으나, 다음과 같이 전체 Disk Device에 대해 동일한 오류가 기록된 것을 확인
# grep -B 1 "nmp_ThrottleLogForDevice:3869: H:0x7 D:0x0 P:0x0" vmkernel.log 2023-03-07T22:04:27.586Z cpu14:2098047)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0x28 (0x45b9494ac500, 2099967) to dev "naa.624a9370e793c401a188430c00014dac" on path "vmhba2:C0:T2:L252" Failed: 2023-03-07T22:04:27.586Z cpu14:2098047)NMP: nmp_ThrottleLogForDevice:3869: H:0x7 D:0x0 P:0x0 . Act:EVAL. cmdId.initiator=0x4306b6cd9b40 CmdSN 0xe3910 -- 2023-03-07T22:04:57.796Z cpu0:2098047)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0x28 (0x45b919ad2f40, 2099967) to dev "naa.624a9370e793c401a188430c00019383" on path "vmhba2:C0:T2:L247" Failed: 2023-03-07T22:04:57.796Z cpu0:2098047)NMP: nmp_ThrottleLogForDevice:3869: H:0x7 D:0x0 P:0x0 . Act:EVAL. cmdId.initiator=0x4306b6f583c0 CmdSN 0xdbeee 오류 메시지가 발생하는 총 10개의 Physical Disk Device 목록 naa.624a9370e793c401a188430c00014dac naa.624a9370e793c401a188430c00019383 naa.624a9370e793c401a188430c00019384 naa.624a9370e793c401a188430c00019385 naa.624a9370e793c401a188430c00014db6 naa.624a9370e793c401a188430c00019382 naa.624a9370e793c401a188430c00014da2 naa.624a9370e793c401a188430c00014daa naa.624a9370e793c401a188430c00014daf naa.624a9370e793c401a188430c00014dad |
4. 추가로 Cisco UCS 서버에서 사용하는 FCoE Adapter Driver인 fnic에서 다음과 같은 오류 확인
# grep nfnic vmkernel.log 2023-03-07T22:04:27.586Z cpu2:2097472)nfnic: <2>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1829: in mismatch actual len: 1048576 residual: 0 2023-03-07T22:04:27.586Z cpu2:2097472)WARNING: nfnic: <2>: fnic_fcpio_icmnd_cmpl_handler: 1872: sc: 0x45b9494ac500 tag: 0x1e7 hdr status: FCPIO_DATA_CNT_MISMATCH IO failure! |
5. Cisco 측에서 해당 오류 메시지를 조치하기 위해 Locking Mode를 ATS+SCSI 에서 ATS로 변경 요청
6. 문제는 특이하게도 아래 명령어로 조회 시 계속해서 "Locking Mode" 컬럼의 결과 값이 ATS 또는 ATS+SCSI로 변경되는 이슈를 확인
# esxcli storage vmfs lockmode list
7. 자세한 로그를 추가로 확인은 못하였으나, 서로 다른 버전의 Pure Storage 중 한 Pure Storage와 연결된 ESXi Host에서만 Locking Mode가 조회시 마다 변경
A. 스토리지 버전: 6.3.6 ### <-- 이슈 Storage 버전
B. 스토리지 버전: 5.3.15
8. 추가적인 분석은 제외하고 ESXi를 Update하기 위해서 사용하는 Datastore를 모두 비운 상태에서 Unmount 한 후, Locking Mode를 ATS로 변경하고 정상적으로 Update 진행 완료