본문 바로가기

Networking

"incomp" state entries are detected due to ARP resolution failure

 

지난 번 살펴봤던 "No-neighbor"(https://haewon83.tistory.com/205)에 이어서 "No-neighbor" Count가 증가할 때 어떤 다른 현상들이 관측되는지 살펴보겠습니다.

 

[Symptom]

Edge에서 특정 Logical Router로 이동한 후, "get neighbor" 명령어를 실행하면 ARP entry들을 확인 가능

고객사 logical router에서 "get neighbor" 명령어 실행 시 다음과 같이, "incomp" 상태의 entry가 존재
고객사에서 확인 결과 "incmp" 상태의 entry에 있는 IP Address인 xxx.xxx.xxx.31과 xxx.xxx.xxx.101은 작년 12월경 제거된 VM이 사용하던 IP Address

 

[Troubleshooting Notes]
1. Edge Support Bundle에서 고객사에서 확인한 내용 검증
edge/logical-routers 파일에서 arp section을 확인하면 동일하게 두 개의 IP Address가 "incomp" 상태

./edge/logical-routers
 
        {
            "uuid": "480aceea-64fe-4b53-8b12-e6aad505e3a1",
            "mp_router_id": "480aceea-64fe-4b53-8b12-e6aad505e3a1",
            "name": "DR-xxxxxx-xxxxxx-gw", >>>
            "vrf": 3,
            "peer_vrf": 2,
            "vdr": 18,
            "type": "DISTRIBUTED_ROUTER_TIER1",
...
                {
                    "ifuuid": "afd1d0c1-7962-43ef-87bb-7e6bae217997", >>>
                    "ifuid": 298,
                    "type": "lif",
                    "ptype": "downlink",
                    "name": "infra-332509b1-0b26-4415-aac8-4",
                    "lrouter": "480aceea-64fe-4b53-8b12-e6aad505e3a1",
                    "mac": "",
                    "admin": "up",
                    "internal_operation": "up",
     
                    "overlay_vni": xxxxx,
                    "ipns": [
                        "xxx.xxx.xxx.1/24" >>>
                ],
...
            "arp": [
 
                {
                    "ifuuid": "afd1d0c1-7962-43ef-87bb-7e6bae217997",
                    "ip": "xxx.xxx.xxx.101", >>>
                    "vlan": xxxx,
                    "mac": "00:00:00:00:00:00",
                    "state": "incomp", >>>
                    "mheld_cnt": 2,
                    "timeout": 1,
                    "last event": "2024-02-13 05:23:34.859",
                    "stats": {
                        "pkt_out": 0,
                        "icmp_out": 0,
                        "pkt_out_fail": 0,
                        "solicit_out": 3,
                        "solicit_out_fail": 0,
                        "solicit_in": 0,
                        "unsolicit_in": 0,
                        "ip_solicit_out": 0,
                        "ip_solicit_out_fail": 0,
                        "announce_out": 0,
                        "announce_out_fail": 0
                    }
                },
     
 
                {
                    "ifuuid": "afd1d0c1-7962-43ef-87bb-7e6bae217997", >>>
                    "ip": "xxx.xxx.xxx.31", >>>
                    "vlan": xxxx,
                    "mac": "00:00:00:00:00:00",
                    "state": "incomp", >>>
                    "mheld_cnt": 2,
                    "timeout": 1,
                    "last event": "2024-02-13 05:23:34.858",
                    "stats": {
                        "pkt_out": 0,
                        "icmp_out": 0,
                        "pkt_out_fail": 0,
                        "solicit_out": 3,
                        "solicit_out_fail": 0,
                        "solicit_in": 0,
                        "unsolicit_in": 0,
                        "ip_solicit_out": 0,
                        "ip_solicit_out_fail": 0,
                        "announce_out": 0,
                        "announce_out_fail": 0
                    }
               },

 

2. Edge의 syslog에 다음과 같은 로그 확인

두 개의 "incomp" 상태와 연관된 Interface가 Downlink Interface이기 때문에 해당 Interface의 UUID인 afd1d0c1-7962-43ef-87bb-7e6bae217997로 검색

반복적으로 ary entry 생성이 시도되고, 해당 entry의 상태는 incomp에서 failed로 전환

./var/log/syslog
 
$ grep "afd1d0c1-7962-43ef-87bb-7e6bae217997" syslog
 
2024-02-12T01:27:05.249Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.31) state incomp -> failed
2024-02-12T01:27:06.616Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.101) state incomp -> failed
 
2024-02-12T01:27:07.206Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.31) is created
2024-02-12T01:27:08.652Z xxx NSX 4740 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(afd1d0c1-7962-43ef-87bb-7e6bae217997, xxx.xxx.xxx.101) is created

...

 

3. "incomp" 상태의 정의 확인
Edge DP LRP stats description
https://kb.vmware.com/s/article/96507
"Packet dropped due to ARP failure"

 

4. 정의를 기반으로 내부 LAB 환경에서 다음과 같은 테스트 수행 시, 동일하게 "incomp" 상태의 entry 확인 가능
4-1. 172.31.1.0/24 Subnet에서 사용하지 않고 있는 IP Address인 172.31.1.20에 대해서 Ping Command 수행

C:\>ping 172.31.1.20 -t
 
Pinging 172.31.1.20 with 32 bytes of data:
Reply from 172.31.1.1: Destination host unreachable.
Reply from 172.31.1.1: Destination host unreachable.

 

4-2. Ping Command 수행하는 동안 172.31.1.0/24 Overlay Segment가 연결된 Tier-1 DR(Distributed Router)에 대해서 Entry 확인

edge-node-01(vrf[5])> get neighbor
Tue Feb 27 2024 UTC 10:35:57.371
Logical Router
UUID        : 5bc895a7-8cc3-4332-9b1a-abe3bd64fa22
VRF         : 5
LR-ID       : 8
Name        : DR-tier1-01
Type        : DISTRIBUTED_ROUTER_TIER1
Neighbor
    Interface   : a62f9b69-c532-44a8-89a0-3e42c6292d94
    IP          : 172.31.1.20 >>>
    MAC         : 00:00:00:00:00:00
    State       : incomp >>>
    Timeout     : 1
 
    Interface   : a62f9b69-c532-44a8-89a0-3e42c6292d94
    IP          : 172.31.1.40
    MAC         : 02:50:56:00:30:00
    State       : reach
    Timeout     : 1092

 

4-3. Tier-1 DR(Distributed Router)의 172.31.1.0/24 Overlay Segment에 연결된 Interface에서 Packet 수집

edge-node-01(vrf[5])> get interfaces
Tue Feb 27 2024 UTC 10:41:02.472
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
5bc895a7-8cc3-4332-9b1a-abe3bd64fa22   5      8      DR-tier1-01                       DISTRIBUTED_ROUTER_TIER1
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
    Interface     : 6ad31edf-449d-5871-b330-d15908ac64b0
    Ifuid         : 297
    Mode          : cpu
    Port-type     : cpu
    Enable-mcast  : false
 
    Interface     : 810bd7e8-d6ce-5f8d-8f0d-9ff1986ae2ee
    Ifuid         : 298
    Mode          : blackhole
    Port-type     : blackhole
 
    Interface     : a62f9b69-c532-44a8-89a0-3e42c6292d94 >>>
    Ifuid         : 299
    Name          : infra-overlay-seg-3101-dlrp
    Fwd-mode      : IPV4_ONLY
    Mode          : lif
    Port-type     : downlink
    IP/Mask       : 172.31.1.1/24 >>>
    MAC           : 02:50:56:56:44:52
    VNI           : 71680
    Access-VLAN   : untagged
    LS port       : 44c4d076-1e45-4632-bbf6-c458e1b3c6dc
    Urpf-mode     : STRICT_MODE
    DAD-mode      : LOOSE
    RA-mode       : SLAAC_DNS_TRHOUGH_RA(M=0, O=0)
    Admin         : up
    Op_state      : up
    Enable-mcast  : True
    MTU           : 1500
    arp_proxy     :
 
edge-node-01(vrf[5])> exit
edge-node-01> start capture interface a62f9b69-c532-44a8-89a0-3e42c6292d94 expr arp
10:41:51.557398 02:50:56:56:44:52 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 172.31.1.20 (ff:ff:ff:ff:ff:ff) tell 172.31.1.1, length 28
<base64>////////AlBWVkRSCAYAAQgABgQAAQJQVlZEUqwfAQH///////+sHwEU</base64>
 
10:41:52.556741 02:50:56:56:44:52 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 172.31.1.20 (ff:ff:ff:ff:ff:ff) tell 172.31.1.1, length 28
<base64>////////AlBWVkRSCAYAAQgABgQAAQJQVlZEUqwfAQH///////+sHwEU</base64>

 

4-4. 해당 Tier-1 DR과 연결된 Tier-0 SR의 Uplink에서 Packet 수집

※ Tier-0 SR의 Uplink에서 Packet을 수집하면 Client IP Address인 192.168.1.2에서 ICMP Packet이 수신되는 것을 알 수 있음

edge-node-01> get logical-routers
Tue Feb 27 2024 UTC 10:43:13.261
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4       6/5000
4f22c0b3-4a2f-4840-a8d2-cf8c797c087b   1      1      DR-Tier0-01                       DISTRIBUTED_ROUTER_TIER0    5       2/50000
1eefa746-7662-4e21-8431-39dfc1f57394   2      2      SR-Tier0-01                       SERVICE_ROUTER_TIER0        6       2/50000
d533b216-a47a-4200-9eb3-007e68c3a024   4      9      SR-tier1-01                       SERVICE_ROUTER_TIER1        5       2/50000
5bc895a7-8cc3-4332-9b1a-abe3bd64fa22   5      8      DR-tier1-01                       DISTRIBUTED_ROUTER_TIER1    4       5/50000
96486497-be55-4cc3-8ae1-bbc7fe391d4b   6      11     SR-one-arm                        SERVICE_ROUTER_TIER1        5       2/50000
 
edge-node-01> vrf 2
edge-node-01(tier0_sr[2])> get interfaces
Tue Feb 27 2024 UTC 10:43:18.535
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
1eefa746-7662-4e21-8431-39dfc1f57394   2      2      SR-Tier0-01                       SERVICE_ROUTER_TIER0
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
 
    Interface     : c9755577-209f-4850-8510-65c63d8d388c >>> 
    Ifuid         : 286
    Name          : edge01-uplink02
    Fwd-mode      : IPV4_ONLY
    Internal name : uplink-286
    Mode          : lif
    Port-type     : uplink
    IP/Mask       : 192.168.13.11/24
    MAC           : 00:50:56:a6:ab:70
    VLAN          : 1613
    Access-VLAN   : untagged
    LS port       : 4e65dc04-96bd-4925-97a3-b3d84deaae02
    Urpf-mode     : STRICT_MODE
    DAD-mode      : LOOSE
    RA-mode       : SLAAC_DNS_TRHOUGH_RA(M=0, O=0)
    Admin         : up
    Op_state      : up
    Enable-mcast  : False
    MTU           : 1500
    arp_proxy     :
 
edge-node-01> exit
edge-node-01> start capture interface c9755577-209f-4850-8510-65c63d8d388c
10:45:12.159045 00:50:56:a6:5c:43 > 00:50:56:a6:ab:70, ethertype IPv4 (0x0800), length 74: 192.168.1.2 > 172.31.1.20: ICMP echo request, id 1, seq 361, length 40
<base64>AFBWpqtwAFBWplxDCABFAAA8OqYAAH8Bkj3AqAECrB8BFAgAS/IAAQFpYWJjZGVmZ2hpamtsbW5vcHFyc3R1dndhYmNkZWZnaGk=</base64>

 

4-5. Edge의 /var/log/syslog 확인 시, 고객사와 동일한 로그 기록 확인

2024-02-27T10:47:24.909Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) state incomp -> failed
2024-02-27T10:47:24.914Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) is created
2024-02-27T10:47:27.915Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) state incomp -> failed
2024-02-27T10:47:27.921Z edge-node-01.contoso.com NSX 5320 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="INFO"] dynamic arp entry(a62f9b69-c532-44a8-89a0-3e42c6292d94, 172.31.1.20) is created

 

[Conclusion]
1. get neighbor 결과에서 "incomp" state 상태로 확인되는 entry는 ARP Resolution에 실패했기 때문에 생성되는 것으로 누가 해당 entry의 IP Address에 대해서 연결을 시도하는지 확인 필요

[References]
Interpreting NSX Edge Interface stats (96507)
https://kb.vmware.com/s/article/96507?lang=en_US