Compute
ESXi host was unresponsive due to missing DNS records
haewon83
2023. 8. 29. 13:51
[구성 환경]
vCenter : 192.168.1.3
ESXi Host : 192.168.1.10
ESXi FQDN : abc.contoso.com
DNS Server : 192.168.1.1, 192.168.1.2
[문제 증상]
vCenter에서 신규 추가 ESXi Host들이 간헐적으로 연결 끊김 상태로 변경
[분석 로그]
1. vpxd.log에서 NO_RESPONSE Keyword를 이용하여 어느 시점에 ESXi Host가 연결 끊김 상태로 변경되는지 확인
/var/log/vmware/vpxd/vpxd.log
2023-08-22T03:58:09.177Z warning vpxd[09040] [Originator@6876 sub=IO.Connection opID=HostSync-host-11937-5da8183] Failed to connect; <io_obj p:0x00007f3448430658, h:58, <TCP '192.168.1.3 : 53728'>, <TCP '192.168.1.10 : 443'>>, e: 113(No route to host), duration: 3061msec 2023-08-22T03:58:09.178Z warning vpxd[09040] [Originator@6876 sub=HttpConnectionPool-000000 opID=HostSync-host-11937-5da8183] Failed to get pooled connection; <cs p:00007f3434008940, TCP:abc.contoso.com:443>, (null), duration: 3062msec, N7Vmacore15SystemExceptionE(No route to host) 2023-08-22T03:58:09.179Z info vpxd[09040] [Originator@6876 sub=IO.Http opID=HostSync-host-11937-5da8183] Set user agent error; state: 1, (null), N7Vmacore15SystemExceptionE(No route to host) 2023-08-22T03:58:09.181Z error vpxd[09051] [Originator@6876 sub=Vmomi opID=HostSync-host-11937-5da8183] Got vmacore exception when invoking VMOMI method; <<last binding: <<TCP '192.168.1.3 : 40670'>, <TCP '192.168.1.10 : 443'>>>, /vpxa>, vpxapi.VpxaService.getChanges, N7Vmacore15SystemExceptionE(No route to host) 2023-08-22T03:58:09.186Z warning vpxd[09051] [Originator@6876 sub=VpxProfiler opID=HostSync-host-11937-5da8183] DoHostSync:host-11937 [GetChangesTime] took 3072 ms 2023-08-22T03:58:09.186Z warning vpxd[09051] [Originator@6876 sub=VpxProfiler opID=HostSync-host-11937-5da8183] DoHostSync:host-11937 [DoHostSyncTime] took 3072 ms 2023-08-22T03:58:09.186Z warning vpxd[09051] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-11937-5da8183] Exception occurred during host sync; Host communication failed; [vim.HostSystem:host-11937,abc.contoso.com], e: N5Vmomi5Fault17HostCommunication9ExceptionE(Fault cause: vmodl.fault.HostCommunication 2023-08-22T03:58:09.192Z info vpxd[09051] [Originator@6876 sub=QuickStats opID=HostSync-host-11937-5da8183] Host [vim.HostSystem:host-11937,abc.contoso.com] should not be polled 2023-08-22T03:58:09.194Z warning vpxd[09051] [Originator@6876 sub=MoHost opID=HostSync-host-11937-5da8183] host [vim.HostSystem:host-11937,abc.contoso.com] connection state changed to NO_RESPONSE 2023-08-22T03:58:09.194Z info vpxd[09051] [Originator@6876 sub=QuickStats opID=HostSync-host-11937-5da8183] Host [vim.HostSystem:host-11937,abc.contoso.com] should not be polled 2023-08-22T03:58:09.181Z error vpxd[09040] [Originator@6876 sub=IO.Http opID=HostSync-host-11937-5da8183] User agent failed to send request; (null), N7Vmacore15SystemExceptionE(No route to host) 2023-08-22T03:58:09.200Z info vpxd[09051] [Originator@6876 sub=QuickStats opID=HostSync-host-11937-5da8183] Host [vim.HostSystem:host-11937,abc.contoso.com] should not be polled 2023-08-22T03:58:09.200Z warning vpxd[09051] [Originator@6876 sub=VpxProfiler opID=HostSync-host-11937-5da8183] InvtHostSyncLRO::StartWork [HostSyncTime] took 3086 ms |
2. vpxd.log를 보면, ESXi Host의 FQDN인 abc.contoso.com에 대해서 "Failed to resolve address" 메시지가 연결 끊김 상태로 변경될 때마다 기록된 것으로 확인
2023-08-22T03:28:05.240Z warning vpxd[08464] [Originator@6876 sub=IO.Connection opID=TaskLoop-host-11937] Failed to resolve address; <resolver p:0x00007f347c726d80, 'abc.contoso.com:443', next:(null)>, e: 1(Host not found (authoritative)), async: true, duration: 2msec <snip> 2023-08-23T05:43:09.665Z warning vpxd[08029] [Originator@6876 sub=IO.Connection opID=HB-host-11937@2272-49f7df80-SWI-3a822666] Failed to resolve address; <resolver p:0x00007f3574043050, 'abc.contoso.com:443', next:(null)>, e: 1(Host not found (authoritative)), async: true, duration: 2msec |
3. DNS 서버 주소 확인
/etc/systemd/resolved.conf
[Resolve] LLMNR=false DNS=127.0.0.1 192.168.1.1 192.168.1.2 |
4. 확인 결과, 192.168.1.1 DNS 서버에는 abc.contoso.com에 해당하는 A/PTR Record가 존재하나 192.168.1.2 DNS 서버에는 A/PTR Record가 부재
이에, 192.168.1.2 DNS 서버에 abc.contoso.com에 해당하는 A/PTR Record 추가 후 문제 증상 해소