Networking

[NSX] The number of current sessions of virtual server doesn't match with summation of current sessions across pool members.

haewon83 2024. 4. 14. 16:02

 

NSX에서 LB를 사용하는 환경에서 Connection이 종료되지 않은 상태에서 Virtual Server의 IP Address(VIP)나 Port를 변경하는 경우, Virtual Server에 남아 있는 Current Session 값이 초기화 되지 않는 현상이 있습니다.

 

현재 본 이슈는 Bug로 판명되어 Load Balancer 개발팀에서 Code를 Fix 중이며, Workaround로는 LB를 Hosting하는 Docker를 재시작 하기 위해서 Edge를 Maintenance Mode로 전환했다가 해제하는 방법을 가이드 드리고 있습니다.

 

[Symptom]

NSX Load Balancer에서 Virtual Server와 Pool에서 보이는 Current Session의 수는 동일한데, Pool 하위의 Pool Member들이 가지고 있는 Current Session의 수는 동일하지 않음

Pool에서 확인한 Current Session 수는 148370

Pool
UUID                            : 42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2
Display-Name                    : XXX
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (148370, 148404, 1027448, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (1144156441, 1729952455)
Packets                         :
    (In, Out)                   : (4003525, 5550714)

 

Pool Member별 Current Session 수

Pool Member
Display-Name                    : XXXap03
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175836, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (228991968, 345598564)
Packets                         :
    (In, Out)                   : (805708, 1109895)
 
 
Pool Member
Display-Name                    : XXXap02
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175840, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (227630363, 346534980)
Packets                         :
    (In, Out)                   : (776665, 1110719)
 
 
Pool Member
Display-Name                    : XXXap05
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175816, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (229180877, 346081266)
Packets                         :
    (In, Out)                   : (806194, 1110180)
 
 
Pool Member
Display-Name                    : XXXap01
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175790, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (229328278, 345542175)
Packets                         :
    (In, Out)                   : (808904, 1109685)
 
 
Pool Member
Display-Name                    : XXXap04
IP                              : XXX
Port                            : XXX
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 7, 175796, 0)  >>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (229024955, 346195470)
Packets                         :
    (In, Out)                   : (806054, 1110235)

 

[Troubleshooting Notes]

1. 내부 LAB 환경에서 재현 테스트

1-1. 테스트 환경

Client : 172.31.1.30
Virtual Server : 172.31.1.50
Pool Member#1 : 172.31.1.51
Pool Member#2 : 172.31.1.52

 

1-2. 테스트 환경 구성 현황

 

 

 

 

1-3. VIP를 사용하여 Web Server(Nginx) 접속 테스트

 

1-4. Virtual Server, Pool, Pool Member별 통계치 확인

※ Virtual Server와 Pool의 Current Session 수는 동일하고, Pool Member들이 Current Session 수의 합과 같음 

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Mon Mar 25 2024 UTC 05:50:16.280
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
l4lb-0  0000000000000009 tcp   192.168.1.2     54054  172.31.1.50     80      100.64.120.1    4100   172.31.1.51     80
l4lb-0  000000000000000a tcp   192.168.1.2     54055  172.31.1.50     80      100.64.120.1    4101   172.31.1.52     80
l4lb-0  000000000000000b tcp   192.168.1.2     54056  172.31.1.50     80      100.64.120.1    4101   172.31.1.51     80
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Mon Mar 25 2024 UTC 05:02:11.391
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.50:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (5686, 197536)
Packets                               :
    (In, Out)                         : (53, 164)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Mon Mar 25 2024 UTC 05:02:14.030
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (197536, 5998)
Packets                         :
    (In, Out)                   : (164, 59)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Mon Mar 25 2024 UTC 05:02:14.030
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (3, 3, 6, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (197536, 5998)
Packets                         :
    (In, Out)                   : (164, 59)
 
Pool Member
Display-Name                    : 172.31.1.52
IP                              : 172.31.1.52
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (1, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (96860, 2712)
Packets                         :
    (In, Out)                   : (82, 27)
 
 
Pool Member
Display-Name                    : 172.31.1.51
IP                              : 172.31.1.51
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 2, 3, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (100676, 3286)
Packets                         :
    (In, Out)                   : (82, 32)

 

2. 고객사 문제 증상에 대해 Edge Support Bundle에서 통계치 검증

Edge Support Bundle에서도 UI와 같이 문제 증상 확인

$ cat ./etc/nsx_issue
version: 3.2.3.1.0.22104642
node-type: nsx-edge
build-type: release
export-type: unrestricted

./edge/lb-virtual-server
 
        228         {
        229             "display_name": "XXX",
        230             "enabled": true,
        231             "l4_curr_sess": 42710340,
        232             "l4_max_sess": 42713880,
        233             "l4_sess_rate": 134,
        234             "l4_total_sess": 279413485,
        235             "l7_curr_sess": 198,
        236             "l7_max_sess": 377,
        237             "l7_sess_rate": 17,
        238             "l7_total_sess": 20410527,
        239             "pools": [
     
       3101                 {
       3102                     "bytes_in": 1144162501,
       3103                     "bytes_in_rate": -1,
       3104                     "bytes_out": 1729959240,
       3105                     "bytes_out_rate": -1,
       3106                     "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3107                     "display_name": "XXX",
       3108                     "drop_pkt_by_acl": -1,
       3109                     "drop_sess_by_rule": -1,
       3110                     "max_sess": 148404,
       3111                     "members": [
       3112                         {
       3112                         {
       3113                             "bytes_in": 228991968,
       3114                             "bytes_in_rate": -1,
       3115                             "bytes_out": 345598564,
       3116                             "bytes_out_rate": -1,
       3117                             "curr_sess": 0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3118                             "display_name": "XXX",
       3119                             "drop_pkt_by_acl": -1,
       3120                             "drop_sess_by_rule": -1,
       3121                             "id": "XXX",
       3122                             "max_sess": 7,
       3123                             "packets_in": 805708,
       3124                             "packets_out": 1109895,
       3125                             "req_rate": -1,
       3126                             "sess_rate": 0,
       3127                             "total_req": -1,
       3128                             "total_sess": 175836,
       3129                             "type": "primary"
       3130                         },
       3131                         {
       3132                             "bytes_in": 227631878,
       3133                             "bytes_in_rate": -1,
       3134                             "bytes_out": 346536676,
       3135                             "bytes_out_rate": -1,
       3136                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3137                             "display_name": "XXX",
       3138                             "drop_pkt_by_acl": -1,
       3139                             "drop_sess_by_rule": -1,
       3140                             "id": "XXX",
       3141                             "max_sess": 7,
       3142                             "packets_in": 776670,
       3143                             "packets_out": 1110726,
       3144                             "req_rate": -1,
       3145                             "sess_rate": 0,
       3146                             "total_req": -1,
       3147                             "total_sess": 175841,
       3148                             "type": "primary"
       3149                         },
       3150                         {
       3151                             "bytes_in": 229182392,
       3152                             "bytes_in_rate": -1,
       3153                             "bytes_out": 346082963,
       3154                             "bytes_out_rate": -1,
       3155                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3156                             "display_name": "XXX",
       3157                             "drop_pkt_by_acl": -1,
       3158                             "drop_sess_by_rule": -1,
       3159                             "id": "XXX",
       3160                             "max_sess": 7,
       3161                             "packets_in": 806199,
       3162                             "packets_out": 1110187,
       3163                             "req_rate": -1,
       3164                             "sess_rate": 0,
       3165                             "total_req": -1,
       3166                             "total_sess": 175817,
       3167                             "type": "primary"
       3168                         },
       3169                         {
       3170                             "bytes_in": 229329793,
       3171                             "bytes_in_rate": -1,
       3172                             "bytes_out": 345543871,
       3173                             "bytes_out_rate": -1,
       3174                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3175                             "display_name": "XXX",
       3176                             "drop_pkt_by_acl": -1,
       3177                             "drop_sess_by_rule": -1,
       3178                             "id": "XXX",
       3179                             "max_sess": 7,
       3180                             "packets_in": 808909,
       3181                             "packets_out": 1109692,
       3182                             "req_rate": -1,
       3183                             "sess_rate": 0,
       3184                             "total_req": -1,
       3185                             "total_sess": 175791,
       3186                             "type": "primary"
       3187                         },
       3188                         {
       3189                             "bytes_in": 229026470,
       3190                             "bytes_in_rate": -1,
       3191                             "bytes_out": 346197166,
       3192                             "bytes_out_rate": -1,
       3193                             "curr_sess": 1, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       3194                             "display_name": "XXX",
       3195                             "drop_pkt_by_acl": -1,
       3196                             "drop_sess_by_rule": -1,
       3197                             "id": "XXX",
       3198                             "max_sess": 7,
       3199                             "packets_in": 806059,
       3200                             "packets_out": 1110242,
       3201                             "req_rate": -1,
       3202                             "sess_rate": 0,
       3203                             "total_req": -1,
       3204                             "total_sess": 175797,
       3205                             "type": "primary"
       3206                         }
       3207                     ],
       3208                     "packets_in": 4003545,
       3209                     "packets_out": 5550742,
       3210                     "req_rate": -1,
       3211                     "sess_rate": 0,
       3212                     "total_req": -1,
       3213                     "total_sess": 1027452,
       3214                     "type": "l4",
       3215                     "uuid": "42b01d28-61c7-495c-b7d7-fcdcb9f8c4b2",
       3216                     "vss": {
       3217                         "50c759f7-a641-4f30-926d-6c2bc4fc9536": "XXX"
       3218                     }
       3219                 },
     
      11137             "size": "LARGE",
      11138             "sr_ha_state": "active",
      11139             "uuid": "0324f7ba-f22a-41ff-b78a-493089e729c6",
      11140             "virtual_servers": [
     
      12043                 {
      12044                     "bytes_in": 1686116560,
      12045                     "bytes_in_rate": -1,
      12046                     "bytes_out": 1144162501,
      12047                     "bytes_out_rate": -1,
      12048                     "curr_sess": 148374, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
      12049                     "display_name": "XXX",
      12050                     "drop_pkt_by_acl": 0,
      12051                     "drop_sess_by_rule": -1,
      12052                     "ip_address": "XXX",
      12053                     "ip_protocol": "TCP",
      12054                     "max_sess": 148404,
      12055                     "packets_in": 4820030,
      12056                     "packets_out": 4003545,
      12057                     "port": "XXX",
      12058                     "req_rate": -1,
      12059                     "sess_rate": 0,
      12060                     "total_req": -1,
      12061                     "total_sess": 1027452,
      12062                     "type": "l4",
      12063                     "uuid": "50c759f7-a641-4f30-926d-6c2bc4fc9536"
      12064                 },

 

3. 개발팀을 통해 문제 재현 Step 확인하여, 내부 Lab에서 문제 재현

3-1. 양 쪽 웹서버에 Javascript로 Timeout 이용

# cat index.html
<!doctype html>
<html>
 
  <head>
    <title>JS Hello World</title>
  </head>
 
  <body>
 
    <script>
       for(var i=0; i < 5000000000; i++);
    </script>
 
    <p>test</p>
 
  </body>
 
</html>

 

3-2. Web Brower에서 VIP를 이용하여 Web Server 접속

 

3-3. 테스트 전 통계 자료 확인

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Tue Apr 09 2024 UTC 13:22:45.773
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
l4lb-0  000000000000001b tcp   192.168.1.2     64322  172.31.1.50     80      100.64.120.1    4109   172.31.1.51     80
l4lb-0  000000000000001c tcp   192.168.1.2     64321  172.31.1.50     80      100.64.120.1    4110   172.31.1.51     80
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Tue Apr 09 2024 UTC 13:22:48.423
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.50:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (2, 3, 23, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (6221, 20503)
Packets                               :
    (In, Out)                         : (76, 80)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Tue Apr 09 2024 UTC 13:22:51.314
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 3, 29, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (413048, 20317)
Packets                         :
    (In, Out)                   : (433, 237)
 
 
Pool Member
Display-Name                    : 172.31.1.52
IP                              : 172.31.1.52
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 2, 14, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (111102, 8395)
Packets                         :
    (In, Out)                   : (148, 102)
 
 
Pool Member
Display-Name                    : 172.31.1.51
IP                              : 172.31.1.51
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 2, 15, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (301946, 11922)
Packets                         :
    (In, Out)                   : (285, 135)

 

3-4. Connection이 제거되기 전, Virtual Server의 IP Address 변경

 

3-5. Server Pool Member 제거

 

3-6. Web Server 중지 후 통계 자료 확인

Pool Member 정보는 현재 Web Server가 중지되어 있어 확인할 수 없음

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Tue Apr 09 2024 UTC 14:05:21.984
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Tue Apr 09 2024 UTC 14:05:24.676
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.53:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (0, 0)
Packets                               :
    (In, Out)                         : (0, 0)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats  
Tue Apr 09 2024 UTC 14:05:29.027
24304: Internal Error: Query LB Datapath Failed. pool 138614e9-2a81-446c-9329-db96c8358545 is not valid
edge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error

 

3-7. Web Server 시작 및 Server Pool Member 다시 추가

 

 

5-8. 통계치 다시 확인

※ Pool Member들의 Current Session은 0이지만, Virtual Server와 Pool의 Current Session이 줄어들지 않음

edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 session-tables
Tue Apr 09 2024 UTC 14:10:04.392
Session-Tables
TABLE   ID               PROTO CADDR           CPORT  VADDR           VPORT   SADDR           SPORT  DADDR           DPORT
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 virtual-server e29760db-2371-432a-8056-09b40232091e stats
Tue Apr 09 2024 UTC 14:10:07.704
Virtual Server
UUID                                  : e29760db-2371-432a-8056-09b40232091e
Display-Name                          : inline-virtual-server-2
VIP                                   : TCP 172.31.1.53:80
Type                                  : L4
Sessions                              :
    (Cur, Max, Total, Rate)           : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    (Drop_By_ACL)                     : (0)
Bytes                                 :
    (In, Out)                         : (0, 0)
Packets                               :
    (In, Out)                         : (0, 0)
 
edge-node-01> get load-balancer ec41fd15-9ac8-4bb2-b780-63ee6ed684f5 pool 138614e9-2a81-446c-9329-db96c8358545 stats
Tue Apr 09 2024 UTC 14:10:10.946
Pool
UUID                            : 138614e9-2a81-446c-9329-db96c8358545
Display-Name                    : inline-server-pool-2
Type                            : L4
Sessions                        :
    (Cur, Max, Total, Rate)     : (2, 2, 4, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (0, 0)
Packets                         :
    (In, Out)                   : (0, 0)
 
 
Pool Member
Display-Name                    : 172.31.1.52
IP                              : 172.31.1.52
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (0, 0)
Packets                         :
    (In, Out)                   : (0, 0)
 
 
Pool Member
Display-Name                    : 172.31.1.51
IP                              : 172.31.1.51
Port                            : 80
Sessions                        :
    (Cur, Max, Total, Rate)     : (0, 0, 0, 0) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Bytes                           :
    (In, Out)                   : (0, 0)
Packets                         :
    (In, Out)                   : (0, 0)

 

[Conclusion]

1. 현재 Code Fix 중

2. Workaround로 Edge를 Maintenance Mode로 전환하여 LB Container 재시작