NSX에서 LB를 사용하는 환경에서 Connection이 종료되지 않은 상태에서 Virtual Server의 IP Address(VIP)나 Port를 변경하는 경우, Virtual Server에 남아 있는 Current Session 값이 초기화 되지 않는 현상이 있습니다.
현재 본 이슈는 Bug로 판명되어 Load Balancer 개발팀에서 Code를 Fix 중이며, Workaround로는 LB를 Hosting하는 Docker를 재시작 하기 위해서 Edge를 Maintenance Mode로 전환했다가 해제하는 방법을 가이드 드리고 있습니다.
[Symptom]
NSX Load Balancer에서 Virtual Server와 Pool에서 보이는 Current Session의 수는 동일한데, Pool 하위의 Pool Member들이 가지고 있는 Current Session의 수는 동일하지 않음
Pool에서 확인한 Current Session 수는 148370
Pool UUID : 42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2 Display-Name : XXX Type : L4 Sessions : (Cur, Max, Total, Rate) : (148370, 148404, 1027448, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (1144156441, 1729952455) Packets : (In, Out) : (4003525, 5550714) |
Pool Member별 Current Session 수
Pool Member Display-Name : XXXap03 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175836, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (228991968, 345598564) Packets : (In, Out) : (805708, 1109895) Pool Member Display-Name : XXXap02 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175840, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (227630363, 346534980) Packets : (In, Out) : (776665, 1110719) Pool Member Display-Name : XXXap05 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175816, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (229180877, 346081266) Packets : (In, Out) : (806194, 1110180) Pool Member Display-Name : XXXap01 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175790, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (229328278, 345542175) Packets : (In, Out) : (808904, 1109685) Pool Member Display-Name : XXXap04 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175796, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (229024955, 346195470) Packets : (In, Out) : (806054, 1110235) |
[Troubleshooting Notes]
1. 내부 LAB 환경에서 재현 테스트
1-1. 테스트 환경
Client : 172.31.1.30 Virtual Server : 172.31.1.50 Pool Member#1 : 172.31.1.51 Pool Member#2 : 172.31.1.52 |
1-2. 테스트 환경 구성 현황
1-3. VIP를 사용하여 Web Server(Nginx) 접속 테스트
1-4. Virtual Server, Pool, Pool Member별 통계치 확인
※ Virtual Server와 Pool의 Current Session 수는 동일하고, Pool Member들이 Current Session 수의 합과 같음
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Mon Mar 25 2024 UTC 05:50:16.280 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT l4lb-0 0000000000000009 tcp 192.168.1.2 54054 172.31.1.50 80 100.64.120.1 4100 172.31.1.51 80 l4lb-0 000000000000000a tcp 192.168.1.2 54055 172.31.1.50 80 100.64.120.1 4101 172.31.1.52 80 l4lb-0 000000000000000b tcp 192.168.1.2 54056 172.31.1.50 80 100.64.120.1 4101 172.31.1.51 80 edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Mon Mar 25 2024 UTC 05:02:11.391 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.50:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (5686, 197536) Packets : (In, Out) : (53, 164) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Mon Mar 25 2024 UTC 05:02:14.030 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (197536, 5998) Packets : (In, Out) : (164, 59) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Mon Mar 25 2024 UTC 05:02:14.030 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (197536, 5998) Packets : (In, Out) : (164, 59) Pool Member Display-Name : 172.31.1.52 IP : 172.31.1.52 Port : 80 Sessions : (Cur, Max, Total, Rate) : (1, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (96860, 2712) Packets : (In, Out) : (82, 27) Pool Member Display-Name : 172.31.1.51 IP : 172.31.1.51 Port : 80 Sessions : (Cur, Max, Total, Rate) : (2, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (100676, 3286) Packets : (In, Out) : (82, 32) |
2. 고객사 문제 증상에 대해 Edge Support Bundle에서 통계치 검증
Edge Support Bundle에서도 UI와 같이 문제 증상 확인
$ cat ./etc/nsx_issue version: 3.2.3.1.0.22104642 node-type: nsx-edge build-type: release export-type: unrestricted ./edge/lb-virtual-server 228 { 229 "display_name": "XXX", 230 "enabled": true, 231 "l4_curr_sess": 42710340, 232 "l4_max_sess": 42713880, 233 "l4_sess_rate": 134, 234 "l4_total_sess": 279413485, 235 "l7_curr_sess": 198, 236 "l7_max_sess": 377, 237 "l7_sess_rate": 17, 238 "l7_total_sess": 20410527, 239 "pools": [ 3101 { 3102 "bytes_in": 1144162501, 3103 "bytes_in_rate": -1, 3104 "bytes_out": 1729959240, 3105 "bytes_out_rate": -1, 3106 "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3107 "display_name": "XXX", 3108 "drop_pkt_by_acl": -1, 3109 "drop_sess_by_rule": -1, 3110 "max_sess": 148404, 3111 "members": [ 3112 { 3112 { 3113 "bytes_in": 228991968, 3114 "bytes_in_rate": -1, 3115 "bytes_out": 345598564, 3116 "bytes_out_rate": -1, 3117 "curr_sess": 0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3118 "display_name": "XXX", 3119 "drop_pkt_by_acl": -1, 3120 "drop_sess_by_rule": -1, 3121 "id": "XXX", 3122 "max_sess": 7, 3123 "packets_in": 805708, 3124 "packets_out": 1109895, 3125 "req_rate": -1, 3126 "sess_rate": 0, 3127 "total_req": -1, 3128 "total_sess": 175836, 3129 "type": "primary" 3130 }, 3131 { 3132 "bytes_in": 227631878, 3133 "bytes_in_rate": -1, 3134 "bytes_out": 346536676, 3135 "bytes_out_rate": -1, 3136 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3137 "display_name": "XXX", 3138 "drop_pkt_by_acl": -1, 3139 "drop_sess_by_rule": -1, 3140 "id": "XXX", 3141 "max_sess": 7, 3142 "packets_in": 776670, 3143 "packets_out": 1110726, 3144 "req_rate": -1, 3145 "sess_rate": 0, 3146 "total_req": -1, 3147 "total_sess": 175841, 3148 "type": "primary" 3149 }, 3150 { 3151 "bytes_in": 229182392, 3152 "bytes_in_rate": -1, 3153 "bytes_out": 346082963, 3154 "bytes_out_rate": -1, 3155 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3156 "display_name": "XXX", 3157 "drop_pkt_by_acl": -1, 3158 "drop_sess_by_rule": -1, 3159 "id": "XXX", 3160 "max_sess": 7, 3161 "packets_in": 806199, 3162 "packets_out": 1110187, 3163 "req_rate": -1, 3164 "sess_rate": 0, 3165 "total_req": -1, 3166 "total_sess": 175817, 3167 "type": "primary" 3168 }, 3169 { 3170 "bytes_in": 229329793, 3171 "bytes_in_rate": -1, 3172 "bytes_out": 345543871, 3173 "bytes_out_rate": -1, 3174 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3175 "display_name": "XXX", 3176 "drop_pkt_by_acl": -1, 3177 "drop_sess_by_rule": -1, 3178 "id": "XXX", 3179 "max_sess": 7, 3180 "packets_in": 808909, 3181 "packets_out": 1109692, 3182 "req_rate": -1, 3183 "sess_rate": 0, 3184 "total_req": -1, 3185 "total_sess": 175791, 3186 "type": "primary" 3187 }, 3188 { 3189 "bytes_in": 229026470, 3190 "bytes_in_rate": -1, 3191 "bytes_out": 346197166, 3192 "bytes_out_rate": -1, 3193 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3194 "display_name": "XXX", 3195 "drop_pkt_by_acl": -1, 3196 "drop_sess_by_rule": -1, 3197 "id": "XXX", 3198 "max_sess": 7, 3199 "packets_in": 806059, 3200 "packets_out": 1110242, 3201 "req_rate": -1, 3202 "sess_rate": 0, 3203 "total_req": -1, 3204 "total_sess": 175797, 3205 "type": "primary" 3206 } 3207 ], 3208 "packets_in": 4003545, 3209 "packets_out": 5550742, 3210 "req_rate": -1, 3211 "sess_rate": 0, 3212 "total_req": -1, 3213 "total_sess": 1027452, 3214 "type": "l4", 3215 "uuid": "42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2", 3216 "vss": { 3217 "50c759f7-a641-4f30-926d-6c2bc4fc9536": "XXX" 3218 } 3219 }, 11137 "size": "LARGE", 11138 "sr_ha_state": "active", 11139 "uuid": "0324f7ba-f22a-41ff-b78a-493089e729c6", 11140 "virtual_servers": [ 12043 { 12044 "bytes_in": 1686116560, 12045 "bytes_in_rate": -1, 12046 "bytes_out": 1144162501, 12047 "bytes_out_rate": -1, 12048 "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 12049 "display_name": "XXX", 12050 "drop_pkt_by_acl": 0, 12051 "drop_sess_by_rule": -1, 12052 "ip_address": "XXX", 12053 "ip_protocol": "TCP", 12054 "max_sess": 148404, 12055 "packets_in": 4820030, 12056 "packets_out": 4003545, 12057 "port": "XXX", 12058 "req_rate": -1, 12059 "sess_rate": 0, 12060 "total_req": -1, 12061 "total_sess": 1027452, 12062 "type": "l4", 12063 "uuid": "50c759f7-a641-4f30-926d-6c2bc4fc9536" 12064 }, |
3. 개발팀을 통해 문제 재현 Step 확인하여, 내부 Lab에서 문제 재현
3-1. 양 쪽 웹서버에 Javascript로 Timeout 이용
# cat index.html <!doctype html> <html> <head> <title>JS Hello World</title> </head> <body> <script> for(var i=0; i < 5000000000; i++); </script> <p>test</p> </body> </html> |
3-2. Web Brower에서 VIP를 이용하여 Web Server 접속
3-3. 테스트 전 통계 자료 확인
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Tue Apr 09 2024 UTC 13:22:45.773 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT l4lb-0 000000000000001b tcp 192.168.1.2 64322 172.31.1.50 80 100.64.120.1 4109 172.31.1.51 80 l4lb-0 000000000000001c tcp 192.168.1.2 64321 172.31.1.50 80 100.64.120.1 4110 172.31.1.51 80 edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Tue Apr 09 2024 UTC 13:22:48.423 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.50:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 3, 23, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (6221, 20503) Packets : (In, Out) : (76, 80) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Tue Apr 09 2024 UTC 13:22:51.314 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 3, 29, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (413048, 20317) Packets : (In, Out) : (433, 237) Pool Member Display-Name : 172.31.1.52 IP : 172.31.1.52 Port : 80 Sessions : (Cur, Max, Total, Rate) : (0, 2, 14, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (111102, 8395) Packets : (In, Out) : (148, 102) Pool Member Display-Name : 172.31.1.51 IP : 172.31.1.51 Port : 80 Sessions : (Cur, Max, Total, Rate) : (2, 2, 15, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (301946, 11922) Packets : (In, Out) : (285, 135) |
3-4. Connection이 제거되기 전, Virtual Server의 IP Address 변경
3-5. Server Pool Member 제거
3-6. Web Server 중지 후 통계 자료 확인
Pool Member 정보는 현재 Web Server가 중지되어 있어 확인할 수 없음
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Tue Apr 09 2024 UTC 14:05:21.984 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Tue Apr 09 2024 UTC 14:05:24.676 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.53:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Tue Apr 09 2024 UTC 14:05:29.027 24304: Internal Error: Query LB Datapath Failed. pool 138614e9-2a81-446c-9329-db96c8358545 is not valid edge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error |
3-7. Web Server 시작 및 Server Pool Member 다시 추가
5-8. 통계치 다시 확인
※ Pool Member들의 Current Session은 0이지만, Virtual Server와 Pool의 Current Session이 줄어들지 않음
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Tue Apr 09 2024 UTC 14:10:04.392 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Tue Apr 09 2024 UTC 14:10:07.704 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.53:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Tue Apr 09 2024 UTC 14:10:10.946 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) Pool Member Display-Name : 172.31.1.52 IP : 172.31.1.52 Port : 80 Sessions : (Cur, Max, Total, Rate) : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) Pool Member Display-Name : 172.31.1.51 IP : 172.31.1.51 Port : 80 Sessions : (Cur, Max, Total, Rate) : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) |
[Conclusion]
1. 현재 Code Fix 중
2. Workaround로 Edge를 Maintenance Mode로 전환하여 LB Container 재시작
'Networking' 카테고리의 다른 글
[NSX] High CPU 분석을 위한 perf tool 사용 방법 (0) | 2024.05.02 |
---|---|
[NSX] Network Namespace in NSX (1) | 2024.04.25 |
[NSX] Statistics|Received Packets dropped|Cumulative metric has increased (0) | 2024.04.06 |
Python and Powershell can be used for NSX Support Bundle (0) | 2024.04.02 |
[NSX] false-positive alarm : Edge node NIC eth0 link is down (1) | 2024.03.26 |