본문 바로가기

Networking

[NSX] The number of current sessions of virtual server doesn't match with summation of current sessions across pool members.

 

NSX에서 LB를 사용하는 환경에서 Connection이 종료되지 않은 상태에서 Virtual Server의 IP Address(VIP)나 Port를 변경하는 경우, Virtual Server에 남아 있는 Current Session 값이 초기화 되지 않는 현상이 있습니다.

 

현재 본 이슈는 Bug로 판명되어 Load Balancer 개발팀에서 Code를 Fix 중이며, Workaround로는 LB를 Hosting하는 Docker를 재시작 하기 위해서 Edge를 Maintenance Mode로 전환했다가 해제하는 방법을 가이드 드리고 있습니다.

 

[Symptom]

NSX Load Balancer에서 Virtual Server와 Pool에서 보이는 Current Session의 수는 동일한데, Pool 하위의 Pool Member들이 가지고 있는 Current Session의 수는 동일하지 않음

Pool에서 확인한 Current Session 수는 148370

Pool
UUID                            : 42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2
Display-Name                    : XXX
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (148370, 148404, 1027448, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (1144156441, 1729952455)
Packets                         :
    (In, Out)                   : (4003525, 5550714)

 

Pool Member별 Current Session 수

Pool Member
Display-Name                    : XXXap03
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175836, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (228991968, 345598564)
Packets                         :
    (In, Out)                   : (805708, 1109895)
 
 
Pool Member
Display-Name                    : XXXap02
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175840, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (227630363, 346534980)
Packets                         :
    (In, Out)                   : (776665, 1110719)
 
 
Pool Member
Display-Name                    : XXXap05
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175816, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (229180877, 346081266)
Packets                         :
    (In, Out)                   : (806194, 1110180)
 
 
Pool Member
Display-Name                    : XXXap01
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175790, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (229328278, 345542175)
Packets                         :
    (In, Out)                   : (808904, 1109685)
 
 
Pool Member
Display-Name                    : XXXap04
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175796, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (229024955, 346195470)
Packets                         :
    (In, Out)                   : (806054, 1110235)

 

[Troubleshooting Notes]

1. 내부 LAB 환경에서 재현 테스트

1-1. 테스트 환경

Client : 172.31.1.30
Virtual Server : 172.31.1.50
Pool Member#1 : 172.31.1.51
Pool Member#2 : 172.31.1.52

 

1-2. 테스트 환경 구성 현황

 

 

 

 

1-3. VIP를 사용하여 Web Server(Nginx) 접속 테스트

 

1-4. Virtual Server, Pool, Pool Member별 통계치 확인

※ Virtual Server와 Pool의 Current Session 수는 동일하고, Pool Member들이 Current Session 수의 합과 같음 

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Mon Mar 25 2024 UTC 05:50:16.280
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
l4lb-0  0000000000000009 tcp   192.168.1.2     54054  172.31.1.50     80      100.64.120.1    4100   172.31.1.51     80
l4lb-0  000000000000000a tcp   192.168.1.2     54055  172.31.1.50     80      100.64.120.1    4101   172.31.1.52     80
l4lb-0  000000000000000b tcp   192.168.1.2     54056  172.31.1.50     80      100.64.120.1    4101   172.31.1.51     80
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Mon Mar 25 2024 UTC 05:02:11.391
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.50:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (5686, 197536)
Packets                               :
    (In, Out)                         : (53, 164)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Mon Mar 25 2024 UTC 05:02:14.030
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (197536, 5998)
Packets                         :
    (In, Out)                   : (164, 59)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Mon Mar 25 2024 UTC 05:02:14.030
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (197536, 5998)
Packets                         :
    (In, Out)                   : (164, 59)
 
Pool Member
Display-Name                    : 172.31.1.52
IP                              : 172.31.1.52
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (1, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (96860, 2712)
Packets                         :
    (In, Out)                   : (82, 27)
 
 
Pool Member
Display-Name                    : 172.31.1.51
IP                              : 172.31.1.51
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (100676, 3286)
Packets                         :
    (In, Out)                   : (82, 32)

 

2. 고객사 문제 증상에 대해 Edge Support Bundle에서 통계치 검증

Edge Support Bundle에서도 UI와 같이 문제 증상 확인

$ cat ./etc/nsx_issue
version: 3.2.3.1.0.22104642
node-type: nsx-edge
build-type: release
export-type: unrestricted

./edge/lb-virtual-server
 
        228         {
        229             "display_name": "XXX",
        230             "enabled": true,
        231             "l4_curr_sess": 42710340,
        232             "l4_max_sess": 42713880,
        233             "l4_sess_rate": 134,
        234             "l4_total_sess": 279413485,
        235             "l7_curr_sess": 198,
        236             "l7_max_sess": 377,
        237             "l7_sess_rate": 17,
        238             "l7_total_sess": 20410527,
        239             "pools": [
     
       3101                 {
       3102                     "bytes_in": 1144162501,
       3103                     "bytes_in_rate": -1,
       3104                     "bytes_out": 1729959240,
       3105                     "bytes_out_rate": -1,
       3106                     "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3107                     "display_name": "XXX",
       3108                     "drop_pkt_by_acl": -1,
       3109                     "drop_sess_by_rule": -1,
       3110                     "max_sess": 148404,
       3111                     "members": [
       3112                         {
       3112                         {
       3113                             "bytes_in": 228991968,
       3114                             "bytes_in_rate": -1,
       3115                             "bytes_out": 345598564,
       3116                             "bytes_out_rate": -1,
       3117                             "curr_sess": 0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3118                             "display_name": "XXX",
       3119                             "drop_pkt_by_acl": -1,
       3120                             "drop_sess_by_rule": -1,
       3121                             "id": "XXX",
       3122                             "max_sess": 7,
       3123                             "packets_in": 805708,
       3124                             "packets_out": 1109895,
       3125                             "req_rate": -1,
       3126                             "sess_rate": 0,
       3127                             "total_req": -1,
       3128                             "total_sess": 175836,
       3129                             "type": "primary"
       3130                         },
       3131                         {
       3132                             "bytes_in": 227631878,
       3133                             "bytes_in_rate": -1,
       3134                             "bytes_out": 346536676,
       3135                             "bytes_out_rate": -1,
       3136                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3137                             "display_name": "XXX",
       3138                             "drop_pkt_by_acl": -1,
       3139                             "drop_sess_by_rule": -1,
       3140                             "id": "XXX",
       3141                             "max_sess": 7,
       3142                             "packets_in": 776670,
       3143                             "packets_out": 1110726,
       3144                             "req_rate": -1,
       3145                             "sess_rate": 0,
       3146                             "total_req": -1,
       3147                             "total_sess": 175841,
       3148                             "type": "primary"
       3149                         },
       3150                         {
       3151                             "bytes_in": 229182392,
       3152                             "bytes_in_rate": -1,
       3153                             "bytes_out": 346082963,
       3154                             "bytes_out_rate": -1,
       3155                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3156                             "display_name": "XXX",
       3157                             "drop_pkt_by_acl": -1,
       3158                             "drop_sess_by_rule": -1,
       3159                             "id": "XXX",
       3160                             "max_sess": 7,
       3161                             "packets_in": 806199,
       3162                             "packets_out": 1110187,
       3163                             "req_rate": -1,
       3164                             "sess_rate": 0,
       3165                             "total_req": -1,
       3166                             "total_sess": 175817,
       3167                             "type": "primary"
       3168                         },
       3169                         {
       3170                             "bytes_in": 229329793,
       3171                             "bytes_in_rate": -1,
       3172                             "bytes_out": 345543871,
       3173                             "bytes_out_rate": -1,
       3174                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3175                             "display_name": "XXX",
       3176                             "drop_pkt_by_acl": -1,
       3177                             "drop_sess_by_rule": -1,
       3178                             "id": "XXX",
       3179                             "max_sess": 7,
       3180                             "packets_in": 808909,
       3181                             "packets_out": 1109692,
       3182                             "req_rate": -1,
       3183                             "sess_rate": 0,
       3184                             "total_req": -1,
       3185                             "total_sess": 175791,
       3186                             "type": "primary"
       3187                         },
       3188                         {
       3189                             "bytes_in": 229026470,
       3190                             "bytes_in_rate": -1,
       3191                             "bytes_out": 346197166,
       3192                             "bytes_out_rate": -1,
       3193                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3194                             "display_name": "XXX",
       3195                             "drop_pkt_by_acl": -1,
       3196                             "drop_sess_by_rule": -1,
       3197                             "id": "XXX",
       3198                             "max_sess": 7,
       3199                             "packets_in": 806059,
       3200                             "packets_out": 1110242,
       3201                             "req_rate": -1,
       3202                             "sess_rate": 0,
       3203                             "total_req": -1,
       3204                             "total_sess": 175797,
       3205                             "type": "primary"
       3206                         }
       3207                     ],
       3208                     "packets_in": 4003545,
       3209                     "packets_out": 5550742,
       3210                     "req_rate": -1,
       3211                     "sess_rate": 0,
       3212                     "total_req": -1,
       3213                     "total_sess": 1027452,
       3214                     "type": "l4",
       3215                     "uuid": "42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2",
       3216                     "vss": {
       3217                         "50c759f7-a641-4f30-926d-6c2bc4fc9536": "XXX"
       3218                     }
       3219                 },
     
      11137             "size": "LARGE",
      11138             "sr_ha_state": "active",
      11139             "uuid": "0324f7ba-f22a-41ff-b78a-493089e729c6",
      11140             "virtual_servers": [
     
      12043                 {
      12044                     "bytes_in": 1686116560,
      12045                     "bytes_in_rate": -1,
      12046                     "bytes_out": 1144162501,
      12047                     "bytes_out_rate": -1,
      12048                     "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
      12049                     "display_name": "XXX",
      12050                     "drop_pkt_by_acl": 0,
      12051                     "drop_sess_by_rule": -1,
      12052                     "ip_address": "XXX",
      12053                     "ip_protocol": "TCP",
      12054                     "max_sess": 148404,
      12055                     "packets_in": 4820030,
      12056                     "packets_out": 4003545,
      12057                     "port": "XXX",
      12058                     "req_rate": -1,
      12059                     "sess_rate": 0,
      12060                     "total_req": -1,
      12061                     "total_sess": 1027452,
      12062                     "type": "l4",
      12063                     "uuid": "50c759f7-a641-4f30-926d-6c2bc4fc9536"
      12064                 },

 

3. 개발팀을 통해 문제 재현 Step 확인하여, 내부 Lab에서 문제 재현

3-1. 양 쪽 웹서버에 Javascript로 Timeout 이용

# cat index.html
<!doctype html>
<html>
 
  <head>
    <title>JS Hello World</title>
  </head>
 
  <body>
 
    <script>
       for(var i=0; i < 5000000000; i++);
    </script>
 
    <p>test</p>
 
  </body>
 
</html>

 

3-2. Web Brower에서 VIP를 이용하여 Web Server 접속

 

3-3. 테스트 전 통계 자료 확인

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Tue Apr 09 2024 UTC 13:22:45.773
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
l4lb-0  000000000000001b tcp   192.168.1.2     64322  172.31.1.50     80      100.64.120.1    4109   172.31.1.51     80
l4lb-0  000000000000001c tcp   192.168.1.2     64321  172.31.1.50     80      100.64.120.1    4110   172.31.1.51     80
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Tue Apr 09 2024 UTC 13:22:48.423
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.50:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (2, 3, 23, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (6221, 20503)
Packets                               :
    (In, Out)                         : (76, 80)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Tue Apr 09 2024 UTC 13:22:51.314
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 3, 29, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (413048, 20317)
Packets                         :
    (In, Out)                   : (433, 237)
 
 
Pool Member
Display-Name                    : 172.31.1.52
IP                              : 172.31.1.52
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 2, 14, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (111102, 8395)
Packets                         :
    (In, Out)                   : (148, 102)
 
 
Pool Member
Display-Name                    : 172.31.1.51
IP                              : 172.31.1.51
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 2, 15, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (301946, 11922)
Packets                         :
    (In, Out)                   : (285, 135)

 

3-4. Connection이 제거되기 전, Virtual Server의 IP Address 변경

 

3-5. Server Pool Member 제거

 

3-6. Web Server 중지 후 통계 자료 확인

Pool Member 정보는 현재 Web Server가 중지되어 있어 확인할 수 없음

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Tue Apr 09 2024 UTC 14:05:21.984
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Tue Apr 09 2024 UTC 14:05:24.676
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.53:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (0, 0)
Packets                               :
    (In, Out)                         : (0, 0)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats  
Tue Apr 09 2024 UTC 14:05:29.027
24304: Internal Error: Query LB Datapath Failed. pool 138614e9-2a81-446c-9329-db96c8358545 is not valid
edge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error

 

3-7. Web Server 시작 및 Server Pool Member 다시 추가

 

 

5-8. 통계치 다시 확인

※ Pool Member들의 Current Session은 0이지만, Virtual Server와 Pool의 Current Session이 줄어들지 않음

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Tue Apr 09 2024 UTC 14:10:04.392
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Tue Apr 09 2024 UTC 14:10:07.704
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.53:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (0, 0)
Packets                               :
    (In, Out)                         : (0, 0)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Tue Apr 09 2024 UTC 14:10:10.946
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (0, 0)
Packets                         :
    (In, Out)                   : (0, 0)
 
 
Pool Member
Display-Name                    : 172.31.1.52
IP                              : 172.31.1.52
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (0, 0)
Packets                         :
    (In, Out)                   : (0, 0)
 
 
Pool Member
Display-Name                    : 172.31.1.51
IP                              : 172.31.1.51
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (0, 0)
Packets                         :
    (In, Out)                   : (0, 0)

 

[Conclusion]

1. 현재 Code Fix 중

2. Workaround로 Edge를 Maintenance Mode로 전환하여 LB Container 재시작