[NSX] The number of current sessions of virtual server doesn't match with summation of current sessions across pool members.
NSX에서 LB를 사용하는 환경에서 Connection이 종료되지 않은 상태에서 Virtual Server의 IP Address(VIP)나 Port를 변경하는 경우, Virtual Server에 남아 있는 Current Session 값이 초기화 되지 않는 현상이 있습니다.
현재 본 이슈는 Bug로 판명되어 Load Balancer 개발팀에서 Code를 Fix 중이며, Workaround로는 LB를 Hosting하는 Docker를 재시작 하기 위해서 Edge를 Maintenance Mode로 전환했다가 해제하는 방법을 가이드 드리고 있습니다.
[Symptom]
NSX Load Balancer에서 Virtual Server와 Pool에서 보이는 Current Session의 수는 동일한데, Pool 하위의 Pool Member들이 가지고 있는 Current Session의 수는 동일하지 않음
Pool에서 확인한 Current Session 수는 148370
Pool UUID : 42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2 Display-Name : XXX Type : L4 Sessions : (Cur, Max, Total, Rate) : (148370, 148404, 1027448, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (1144156441, 1729952455) Packets : (In, Out) : (4003525, 5550714) |
Pool Member별 Current Session 수
Pool Member Display-Name : XXXap03 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175836, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (228991968, 345598564) Packets : (In, Out) : (805708, 1109895) Pool Member Display-Name : XXXap02 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175840, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (227630363, 346534980) Packets : (In, Out) : (776665, 1110719) Pool Member Display-Name : XXXap05 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175816, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (229180877, 346081266) Packets : (In, Out) : (806194, 1110180) Pool Member Display-Name : XXXap01 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175790, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (229328278, 345542175) Packets : (In, Out) : (808904, 1109685) Pool Member Display-Name : XXXap04 IP : XXX Port : XXX Sessions : (Cur, Max, Total, Rate) : (0, 7, 175796, 0) >>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (229024955, 346195470) Packets : (In, Out) : (806054, 1110235) |
[Troubleshooting Notes]
1. 내부 LAB 환경에서 재현 테스트
1-1. 테스트 환경
Client : 172.31.1.30 Virtual Server : 172.31.1.50 Pool Member#1 : 172.31.1.51 Pool Member#2 : 172.31.1.52 |
1-2. 테스트 환경 구성 현황
1-3. VIP를 사용하여 Web Server(Nginx) 접속 테스트
1-4. Virtual Server, Pool, Pool Member별 통계치 확인
※ Virtual Server와 Pool의 Current Session 수는 동일하고, Pool Member들이 Current Session 수의 합과 같음
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Mon Mar 25 2024 UTC 05:50:16.280 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT l4lb-0 0000000000000009 tcp 192.168.1.2 54054 172.31.1.50 80 100.64.120.1 4100 172.31.1.51 80 l4lb-0 000000000000000a tcp 192.168.1.2 54055 172.31.1.50 80 100.64.120.1 4101 172.31.1.52 80 l4lb-0 000000000000000b tcp 192.168.1.2 54056 172.31.1.50 80 100.64.120.1 4101 172.31.1.51 80 edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Mon Mar 25 2024 UTC 05:02:11.391 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.50:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (5686, 197536) Packets : (In, Out) : (53, 164) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Mon Mar 25 2024 UTC 05:02:14.030 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (197536, 5998) Packets : (In, Out) : (164, 59) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Mon Mar 25 2024 UTC 05:02:14.030 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (197536, 5998) Packets : (In, Out) : (164, 59) Pool Member Display-Name : 172.31.1.52 IP : 172.31.1.52 Port : 80 Sessions : (Cur, Max, Total, Rate) : (1, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (96860, 2712) Packets : (In, Out) : (82, 27) Pool Member Display-Name : 172.31.1.51 IP : 172.31.1.51 Port : 80 Sessions : (Cur, Max, Total, Rate) : (2, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (100676, 3286) Packets : (In, Out) : (82, 32) |
2. 고객사 문제 증상에 대해 Edge Support Bundle에서 통계치 검증
Edge Support Bundle에서도 UI와 같이 문제 증상 확인
$ cat ./etc/nsx_issue version: 3.2.3.1.0.22104642 node-type: nsx-edge build-type: release export-type: unrestricted ./edge/lb-virtual-server 228 { 229 "display_name": "XXX", 230 "enabled": true, 231 "l4_curr_sess": 42710340, 232 "l4_max_sess": 42713880, 233 "l4_sess_rate": 134, 234 "l4_total_sess": 279413485, 235 "l7_curr_sess": 198, 236 "l7_max_sess": 377, 237 "l7_sess_rate": 17, 238 "l7_total_sess": 20410527, 239 "pools": [ 3101 { 3102 "bytes_in": 1144162501, 3103 "bytes_in_rate": -1, 3104 "bytes_out": 1729959240, 3105 "bytes_out_rate": -1, 3106 "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3107 "display_name": "XXX", 3108 "drop_pkt_by_acl": -1, 3109 "drop_sess_by_rule": -1, 3110 "max_sess": 148404, 3111 "members": [ 3112 { 3112 { 3113 "bytes_in": 228991968, 3114 "bytes_in_rate": -1, 3115 "bytes_out": 345598564, 3116 "bytes_out_rate": -1, 3117 "curr_sess": 0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3118 "display_name": "XXX", 3119 "drop_pkt_by_acl": -1, 3120 "drop_sess_by_rule": -1, 3121 "id": "XXX", 3122 "max_sess": 7, 3123 "packets_in": 805708, 3124 "packets_out": 1109895, 3125 "req_rate": -1, 3126 "sess_rate": 0, 3127 "total_req": -1, 3128 "total_sess": 175836, 3129 "type": "primary" 3130 }, 3131 { 3132 "bytes_in": 227631878, 3133 "bytes_in_rate": -1, 3134 "bytes_out": 346536676, 3135 "bytes_out_rate": -1, 3136 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3137 "display_name": "XXX", 3138 "drop_pkt_by_acl": -1, 3139 "drop_sess_by_rule": -1, 3140 "id": "XXX", 3141 "max_sess": 7, 3142 "packets_in": 776670, 3143 "packets_out": 1110726, 3144 "req_rate": -1, 3145 "sess_rate": 0, 3146 "total_req": -1, 3147 "total_sess": 175841, 3148 "type": "primary" 3149 }, 3150 { 3151 "bytes_in": 229182392, 3152 "bytes_in_rate": -1, 3153 "bytes_out": 346082963, 3154 "bytes_out_rate": -1, 3155 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3156 "display_name": "XXX", 3157 "drop_pkt_by_acl": -1, 3158 "drop_sess_by_rule": -1, 3159 "id": "XXX", 3160 "max_sess": 7, 3161 "packets_in": 806199, 3162 "packets_out": 1110187, 3163 "req_rate": -1, 3164 "sess_rate": 0, 3165 "total_req": -1, 3166 "total_sess": 175817, 3167 "type": "primary" 3168 }, 3169 { 3170 "bytes_in": 229329793, 3171 "bytes_in_rate": -1, 3172 "bytes_out": 345543871, 3173 "bytes_out_rate": -1, 3174 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3175 "display_name": "XXX", 3176 "drop_pkt_by_acl": -1, 3177 "drop_sess_by_rule": -1, 3178 "id": "XXX", 3179 "max_sess": 7, 3180 "packets_in": 808909, 3181 "packets_out": 1109692, 3182 "req_rate": -1, 3183 "sess_rate": 0, 3184 "total_req": -1, 3185 "total_sess": 175791, 3186 "type": "primary" 3187 }, 3188 { 3189 "bytes_in": 229026470, 3190 "bytes_in_rate": -1, 3191 "bytes_out": 346197166, 3192 "bytes_out_rate": -1, 3193 "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3194 "display_name": "XXX", 3195 "drop_pkt_by_acl": -1, 3196 "drop_sess_by_rule": -1, 3197 "id": "XXX", 3198 "max_sess": 7, 3199 "packets_in": 806059, 3200 "packets_out": 1110242, 3201 "req_rate": -1, 3202 "sess_rate": 0, 3203 "total_req": -1, 3204 "total_sess": 175797, 3205 "type": "primary" 3206 } 3207 ], 3208 "packets_in": 4003545, 3209 "packets_out": 5550742, 3210 "req_rate": -1, 3211 "sess_rate": 0, 3212 "total_req": -1, 3213 "total_sess": 1027452, 3214 "type": "l4", 3215 "uuid": "42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2", 3216 "vss": { 3217 "50c759f7-a641-4f30-926d-6c2bc4fc9536": "XXX" 3218 } 3219 }, 11137 "size": "LARGE", 11138 "sr_ha_state": "active", 11139 "uuid": "0324f7ba-f22a-41ff-b78a-493089e729c6", 11140 "virtual_servers": [ 12043 { 12044 "bytes_in": 1686116560, 12045 "bytes_in_rate": -1, 12046 "bytes_out": 1144162501, 12047 "bytes_out_rate": -1, 12048 "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 12049 "display_name": "XXX", 12050 "drop_pkt_by_acl": 0, 12051 "drop_sess_by_rule": -1, 12052 "ip_address": "XXX", 12053 "ip_protocol": "TCP", 12054 "max_sess": 148404, 12055 "packets_in": 4820030, 12056 "packets_out": 4003545, 12057 "port": "XXX", 12058 "req_rate": -1, 12059 "sess_rate": 0, 12060 "total_req": -1, 12061 "total_sess": 1027452, 12062 "type": "l4", 12063 "uuid": "50c759f7-a641-4f30-926d-6c2bc4fc9536" 12064 }, |
3. 개발팀을 통해 문제 재현 Step 확인하여, 내부 Lab에서 문제 재현
3-1. 양 쪽 웹서버에 Javascript로 Timeout 이용
# cat index.html <!doctype html> <html> <head> <title>JS Hello World</title> </head> <body> <script> for(var i=0; i < 5000000000; i++); </script> <p>test</p> </body> </html> |
3-2. Web Brower에서 VIP를 이용하여 Web Server 접속
3-3. 테스트 전 통계 자료 확인
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Tue Apr 09 2024 UTC 13:22:45.773 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT l4lb-0 000000000000001b tcp 192.168.1.2 64322 172.31.1.50 80 100.64.120.1 4109 172.31.1.51 80 l4lb-0 000000000000001c tcp 192.168.1.2 64321 172.31.1.50 80 100.64.120.1 4110 172.31.1.51 80 edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Tue Apr 09 2024 UTC 13:22:48.423 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.50:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 3, 23, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (6221, 20503) Packets : (In, Out) : (76, 80) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Tue Apr 09 2024 UTC 13:22:51.314 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 3, 29, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (413048, 20317) Packets : (In, Out) : (433, 237) Pool Member Display-Name : 172.31.1.52 IP : 172.31.1.52 Port : 80 Sessions : (Cur, Max, Total, Rate) : (0, 2, 14, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (111102, 8395) Packets : (In, Out) : (148, 102) Pool Member Display-Name : 172.31.1.51 IP : 172.31.1.51 Port : 80 Sessions : (Cur, Max, Total, Rate) : (2, 2, 15, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (301946, 11922) Packets : (In, Out) : (285, 135) |
3-4. Connection이 제거되기 전, Virtual Server의 IP Address 변경
3-5. Server Pool Member 제거
3-6. Web Server 중지 후 통계 자료 확인
Pool Member 정보는 현재 Web Server가 중지되어 있어 확인할 수 없음
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Tue Apr 09 2024 UTC 14:05:21.984 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Tue Apr 09 2024 UTC 14:05:24.676 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.53:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Tue Apr 09 2024 UTC 14:05:29.027 24304: Internal Error: Query LB Datapath Failed. pool 138614e9-2a81-446c-9329-db96c8358545 is not valid edge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error |
3-7. Web Server 시작 및 Server Pool Member 다시 추가
5-8. 통계치 다시 확인
※ Pool Member들의 Current Session은 0이지만, Virtual Server와 Pool의 Current Session이 줄어들지 않음
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables Tue Apr 09 2024 UTC 14:10:04.392 Session-Tables TABLE ID PROTO CADDR CPORT VADDR VPORT SADDR SPORT DADDR DPORT edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats Tue Apr 09 2024 UTC 14:10:07.704 Virtual Server UUID : e29760db-2371-432a-8056-09b40232091e Display-Name : inline-virtual-server-2 VIP : TCP 172.31.1.53:80 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Drop_By_ACL) : (0) Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats Tue Apr 09 2024 UTC 14:10:10.946 Pool UUID : 138614e9-2a81-446c-9329-db96c8358545 Display-Name : inline-server-pool-2 Type : L4 Sessions : (Cur, Max, Total, Rate) : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) Pool Member Display-Name : 172.31.1.52 IP : 172.31.1.52 Port : 80 Sessions : (Cur, Max, Total, Rate) : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) Pool Member Display-Name : 172.31.1.51 IP : 172.31.1.51 Port : 80 Sessions : (Cur, Max, Total, Rate) : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bytes : (In, Out) : (0, 0) Packets : (In, Out) : (0, 0) |
[Conclusion]
1. 현재 Code Fix 중
2. Workaround로 Edge를 Maintenance Mode로 전환하여 LB Container 재시작