지난 번 살펴봤던 "No-neighbor"(https://haewon83.tistory.com/205)에 이어서 "No-neighbor" Count가 증가할 때 어떤 다른 현상들이 관측되는지 살펴보겠습니다.
[Symptom]
Edge에서 특정 Logical Router로 이동한 후, "get neighbor" 명령어를 실행하면 ARP entry들을 확인 가능
고객사 logical router에서 "get neighbor" 명령어 실행 시 다음과 같이, "incomp" 상태의 entry가 존재
고객사에서 확인 결과 "incmp" 상태의 entry에 있는 IP Address인 xxx.xxx.xxx.31과 xxx.xxx.xxx.101은 작년 12월경 제거된 VM이 사용하던 IP Address
[Troubleshooting Notes]
1. Edge Support Bundle에서 고객사에서 확인한 내용 검증
edge/logical-routers 파일에서 arp section을 확인하면 동일하게 두 개의 IP Address가 "incomp" 상태
./edge/logical-routers { "uuid": "480aceea-64fe-4b53-8b12-e6aad505e3a1", "mp_router_id": "480aceea-64fe-4b53-8b12-e6aad505e3a1", "name": "DR-xxxxxx-xxxxxx-gw", >>> "vrf": 3, "peer_vrf": 2, "vdr": 18, "type": "DISTRIBUTED_ROUTER_TIER1", ... { "ifuuid": "afd1d0c1-7962-43ef-87bb-7e6bae217997", >>> "ifuid": 298, "type": "lif", "ptype": "downlink", "name": "infra-332509b1-0b26-4415-aac8-4", "lrouter": "480aceea-64fe-4b53-8b12-e6aad505e3a1", "mac": "", "admin": "up", "internal_operation": "up", "overlay_vni": xxxxx, "ipns": [ "xxx.xxx.xxx.1/24" >>> ], ... "arp": [ { "ifuuid": "afd1d0c1-7962-43ef-87bb-7e6bae217997", "ip": "xxx.xxx.xxx.101", >>> "vlan": xxxx, "mac": "00:00:00:00:00:00", "state": "incomp", >>> "mheld_cnt": 2, "timeout": 1, "last event": "2024-02-13 05:23:34.859", "stats": { "pkt_out": 0, "icmp_out": 0, "pkt_out_fail": 0, "solicit_out": 3, "solicit_out_fail": 0, "solicit_in": 0, "unsolicit_in": 0, "ip_solicit_out": 0, "ip_solicit_out_fail": 0, "announce_out": 0, "announce_out_fail": 0 } }, { "ifuuid": "afd1d0c1-7962-43ef-87bb-7e6bae217997", >>> "ip": "xxx.xxx.xxx.31", >>> "vlan": xxxx, "mac": "00:00:00:00:00:00", "state": "incomp", >>> "mheld_cnt": 2, "timeout": 1, "last event": "2024-02-13 05:23:34.858", "stats": { "pkt_out": 0, "icmp_out": 0, "pkt_out_fail": 0, "solicit_out": 3, "solicit_out_fail": 0, "solicit_in": 0, "unsolicit_in": 0, "ip_solicit_out": 0, "ip_solicit_out_fail": 0, "announce_out": 0, "announce_out_fail": 0 } }, |
2. Edge의 syslog에 다음과 같은 로그 확인
두 개의 "incomp" 상태와 연관된 Interface가 Downlink Interface이기 때문에 해당 Interface의 UUID인 afd1d0c1-7962-43ef-87bb-7e6bae217997로 검색
반복적으로 ary entry 생성이 시도되고, 해당 entry의 상태는 incomp에서 failed로 전환
./var/log/syslog $ grep "afd1d0c1-7962-43ef-87bb-7e6bae217997" syslog 2024-02-12T01:27:05.249Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.31) state incomp -> failed 2024-02-12T01:27:06.616Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.101) state incomp -> failed 2024-02-12T01:27:07.206Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.31) is created 2024-02-12T01:27:08.652Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.101) is created ... |
3. "incomp" 상태의 정의 확인
Edge DP LRP stats description
https://kb.vmware.com/s/article/96507
"Packet dropped due to ARP failure"
4. 정의를 기반으로 내부 LAB 환경에서 다음과 같은 테스트 수행 시, 동일하게 "incomp" 상태의 entry 확인 가능
4-1. 172.31.1.0/24 Subnet에서 사용하지 않고 있는 IP Address인 172.31.1.20에 대해서 Ping Command 수행
C:\>ping 172.31.1.20 -t Pinging 172.31.1.20 with 32 bytes of data: Reply from 172.31.1.1: Destination host unreachable. Reply from 172.31.1.1: Destination host unreachable. |
4-2. Ping Command 수행하는 동안 172.31.1.0/24 Overlay Segment가 연결된 Tier-1 DR(Distributed Router)에 대해서 Entry 확인
edge-node-01(vrf[5])> get neighbor Tue Feb 27 2024 UTC 10:35:57.371 Logical Router UUID : 5bc895a7-8cc3-4332-9b1a-abe3bd64fa22 VRF : 5 LR-ID : 8 Name : DR-tier1-01 Type : DISTRIBUTED_ROUTER_TIER1 Neighbor Interface : a62f9b69-c532-44a8-89a0-3e42c6292d94 IP : 172.31.1.20 >>> MAC : 00:00:00:00:00:00 State : incomp >>> Timeout : 1 Interface : a62f9b69-c532-44a8-89a0-3e42c6292d94 IP : 172.31.1.40 MAC : 02:50:56:00:30:00 State : reach Timeout : 1092 |
4-3. Tier-1 DR(Distributed Router)의 172.31.1.0/24 Overlay Segment에 연결된 Interface에서 Packet 수집
edge-node-01(vrf[5])> get interfaces Tue Feb 27 2024 UTC 10:41:02.472 Logical Router UUID VRF LR-ID Name Type 5bc895a7-8cc3-4332-9b1a-abe3bd64fa22 5 8 DR-tier1-01 DISTRIBUTED_ROUTER_TIER1 Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable) Interface : 6ad31edf-449d-5871-b330-d15908ac64b0 Ifuid : 297 Mode : cpu Port-type : cpu Enable-mcast : false Interface : 810bd7e8-d6ce-5f8d-8f0d-9ff1986ae2ee Ifuid : 298 Mode : blackhole Port-type : blackhole Interface : a62f9b69-c532-44a8-89a0-3e42c6292d94 >>> Ifuid : 299 Name : infra-overlay-seg-3101-dlrp Fwd-mode : IPV4_ONLY Mode : lif Port-type : downlink IP/Mask : 172.31.1.1/24 >>> MAC : 02:50:56:56:44:52 VNI : 71680 Access-VLAN : untagged LS port : 44c4d076-1e45-4632-bbf6-c458e1b3c6dc Urpf-mode : STRICT_MODE DAD-mode : LOOSE RA-mode : SLAAC_DNS_TRHOUGH_RA(M=0, O=0) Admin : up Op_state : up Enable-mcast : True MTU : 1500 arp_proxy : edge-node-01(vrf[5])> exit edge-node-01> start capture interface a62f9b69-c532-44a8-89a0-3e42c6292d94 expr arp 10:41:51.557398 02:50:56:56:44:52 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 172.31.1.20 (ff:ff:ff:ff:ff:ff) tell 172.31.1.1, length 28 <base64>////////AlBWVkRSCAYAAQgABgQAAQJQVlZEUqwfAQH///////+sHwEU</base64> 10:41:52.556741 02:50:56:56:44:52 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 172.31.1.20 (ff:ff:ff:ff:ff:ff) tell 172.31.1.1, length 28 <base64>////////AlBWVkRSCAYAAQgABgQAAQJQVlZEUqwfAQH///////+sHwEU</base64> |
4-4. 해당 Tier-1 DR과 연결된 Tier-0 SR의 Uplink에서 Packet 수집
※ Tier-0 SR의 Uplink에서 Packet을 수집하면 Client IP Address인 192.168.1.2에서 ICMP Packet이 수신되는 것을 알 수 있음
edge-node-01> get logical-routers Tue Feb 27 2024 UTC 10:43:13.261 Logical Router UUID VRF LR-ID Name Type Ports Neighbors 736a80e3-23f6-5a2d-81d6-bbefb2786666 0 0 TUNNEL 4 6/5000 4f22c0b3-4a2f-4840-a8d2-cf8c797c087b 1 1 DR-Tier0-01 DISTRIBUTED_ROUTER_TIER0 5 2/50000 1eefa746-7662-4e21-8431-39dfc1f57394 2 2 SR-Tier0-01 SERVICE_ROUTER_TIER0 6 2/50000 d533b216-a47a-4200-9eb3-007e68c3a024 4 9 SR-tier1-01 SERVICE_ROUTER_TIER1 5 2/50000 5bc895a7-8cc3-4332-9b1a-abe3bd64fa22 5 8 DR-tier1-01 DISTRIBUTED_ROUTER_TIER1 4 5/50000 96486497-be55-4cc3-8ae1-bbc7fe391d4b 6 11 SR-one-arm SERVICE_ROUTER_TIER1 5 2/50000 edge-node-01> vrf 2 edge-node-01(tier0_sr[2])> get interfaces Tue Feb 27 2024 UTC 10:43:18.535 Logical Router UUID VRF LR-ID Name Type 1eefa746-7662-4e21-8431-39dfc1f57394 2 2 SR-Tier0-01 SERVICE_ROUTER_TIER0 Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable) Interface : c9755577-209f-4850-8510-65c63d8d388c >>> Ifuid : 286 Name : edge01-uplink02 Fwd-mode : IPV4_ONLY Internal name : uplink-286 Mode : lif Port-type : uplink IP/Mask : 192.168.13.11/24 MAC : 00:50:56:a6:ab:70 VLAN : 1613 Access-VLAN : untagged LS port : 4e65dc04-96bd-4925-97a3-b3d84deaae02 Urpf-mode : STRICT_MODE DAD-mode : LOOSE RA-mode : SLAAC_DNS_TRHOUGH_RA(M=0, O=0) Admin : up Op_state : up Enable-mcast : False MTU : 1500 arp_proxy : edge-node-01> exit edge-node-01> start capture interface c9755577-209f-4850-8510-65c63d8d388c 10:45:12.159045 00:50:56:a6:5c:43 > 00:50:56:a6:ab:70, ethertype IPv4 (0x0800), length 74: 192.168.1.2 > 172.31.1.20: ICMP echo request, id 1, seq 361, length 40 <base64>AFBWpqtwAFBWplxDCABFAAA8OqYAAH8Bkj3AqAECrB8BFAgAS/IAAQFpYWJjZGVmZ2hpamtsbW5vcHFyc3R1dndhYmNkZWZnaGk=</base64> |
4-5. Edge의 /var/log/syslog 확인 시, 고객사와 동일한 로그 기록 확인
2024-02-27T10:47:24.909Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) state incomp -> failed 2024-02-27T10:47:24.914Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) is created 2024-02-27T10:47:27.915Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) state incomp -> failed 2024-02-27T10:47:27.921Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) is created |
[Conclusion]
1. get neighbor 결과에서 "incomp" state 상태로 확인되는 entry는 ARP Resolution에 실패했기 때문에 생성되는 것으로 누가 해당 entry의 IP Address에 대해서 연결을 시도하는지 확인 필요
[References]
Interpreting NSX Edge Interface stats (96507)
https://kb.vmware.com/s/article/96507?lang=en_US
'Networking' 카테고리의 다른 글
[NSX] false-positive alarm : Edge node NIC eth0 link is down (1) | 2024.03.26 |
---|---|
Packets keep going through load balancer to downed member server (0) | 2024.03.12 |
Dataplaned process cannot start due to lack of malloc_heap (0) | 2024.03.03 |
What does "No-neighbor" mean? (1) | 2024.02.24 |
[NSX] Failed to bring up one of vNICs after vMotion of VM edge (0) | 2024.02.13 |