[NSX] vim.tiny process crashed unexpectedly and core dump file generated.
NSX Edge Node에서 vim.tiny 프로세스가 예기치 않게 종료되면서 Core Dump가 발생한 케이스가 있습니다.
NSX는 기본적으로 root 계정으로 사용자가 접근하여 Operation을 수행하는 것을 권고하고 있지 않으며,
또한, 해당 증상을 재현해 볼 수 있는 방법이 있어, 흥미로운 증상이라 이와 관련된 내용을 알아보겠습니다.
[Symptom]
NSX Manager에서 Application이 Crash 되었다는 Alarm 발생
[Troubleshooting Notes]
1. Application Crash가 발생한 Edge Node에서 실제로 문제 증상이 발생했는지 Syslog를 통해 검증
Syslog에서 확인되는 바와 같이 vim.tiny 프로세스가 예기치 않게 종료
$ grep -i "eventState" syslog* | grep -i application <DATE_TIME> <HOSTNAME> NSX 3692 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="CRITICAL" eventFeatureName="infrastructure_service" eventType="application_crashed" eventSev="critical" eventState="Off" entId="a681797a-a988-11ec-91c1-024233b15cd7"] All core dump files are withdrawn from system. <DATE_TIME> <HOSTNAME> NSX 3692 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="CRITICAL" eventFeatureName="infrastructure_service" eventType="application_crashed" eventSev="critical" eventState="On" entId="a681797a-a988-11ec-91c1-024233b15cd7"] Application on NSX node be06-vebe901-sam has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. $ grep -i "core file" syslog* … <DATE_TIME> <HOSTNAME> NSX 691637 - [nsx@6876 comp="nsx-edge" subcomp="n ode-mgmt" username="root" level="WARNING"] Failed to parse core file name: core.vim.tiny.1721816611.57850.0.11 <DATE_TIME> <HOSTNAME> NSX 691637 - [nsx@6876 comp="nsx-edge" subcomp="n ode-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.vim.tiny.1721816611.57850.0.11.gz <DATE_TIME> <HOSTNAME> NSX 3692 - [nsx@6876 comp="nsx-edge" subcomp="nsx -sha" username="nsx-sha" level="CRITICAL" eventFeatureName="infrastructure_service" eventType="application_crashed" eventS ev="critical" eventState="On" entId="a681797a-a988-11ec-91c1-024233b15cd7"] Application on NSX node be06-vebe901-sam has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Suppo rt team. |
2. Dump 분석을 위해 vim.tiny는 Ubuntu에 Symbol이 있기 때문에 이를 Download 후 설치
Symbol Download 시, vim.tiny 버전과 Ubuntu 버전 사전 확인
https://launchpad.net/ubuntu/focal/amd64/vim-tiny-dbgsym/2:8.1.2269-1ubuntu5.21
root@edge-node-02:/var/log# dpkg -l | grep vim ii vim-common 2:8.1.2269-1ubuntu5.21 all Vi IMproved - Common files ii vim-tiny 2:8.1.2269-1ubuntu5.21 amd64 Vi IMproved - enhanced vi editor - compact version root@edge-node-02:/var/log# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal root@edge-node-02:/mnt/coredump# dpkg -i vim-tiny-dbgsym_8.1.2269-1ubuntu5.21_amd64.ddeb dpkg: warning: downgrading vim-tiny-dbgsym from 2:8.1.2269-1ubuntu5.23 to 2:8.1.2269-1ubuntu5.21 (Reading database ... 45806 files and directories currently installed.) Preparing to unpack vim-tiny-dbgsym_8.1.2269-1ubuntu5.21_amd64.ddeb ... Unpacking vim-tiny-dbgsym (2:8.1.2269-1ubuntu5.21) over (2:8.1.2269-1ubuntu5.23) ... Setting up vim-tiny-dbgsym (2:8.1.2269-1ubuntu5.21) ... |
3. GDB를 이용하여, 수집된 Core Dump의 Backtrace 확인
Backtrace를 확인해보면, vim.tiny를 통해 /etc/default/kdump-tools 파일을 Open 한 상황에서 ":st" command를 이용하여 Suspend 상태로 전환한 것으로 확인
root@edge-node-02:/mnt/coredump# /opt/vmware/nsx-edge/sbin/gdb /usr/bin/vim.tiny core.vim.tiny.1721816611.57850.0.11 GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/bin/vim.tiny... Reading symbols from /usr/lib/debug/.build-id/d2/a842f097ee187cee7da69f798844c3a5e153e9.debug... [New LWP 57850] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `vim.tiny /etc/default/kdump-tools'. Program terminated with signal SIGSEGV, Segmentation fault. #0 deathtrap (sigarg=15) at os_unix.c:946 946 os_unix.c: No such file or directory. (gdb) bt #0 deathtrap (sigarg=15) at os_unix.c:946 #1 <signal handler called> #2 deathtrap (sigarg=1) at os_unix.c:946 #3 <signal handler called> #4 0x00007206b18fc3db in kill () at ../sysdeps/unix/syscall-template.S:78 #5 0x00001a16e162cff6 in mch_suspend () at os_unix.c:1256 #6 0x00001a16e1667d49 in ui_suspend () at ui.c:558 #7 0x00001a16e15d4186 in ex_stop (eap=0x7d3f0520a350) at ex_docmd.c:5401 #8 ex_stop (eap=0x7d3f0520a350) at ex_docmd.c:5384 #9 0x00001a16e15d96c2 in do_one_cmd (cmdlinep=cmdlinep@entry=0x7d3f0520a550, sourcing=sourcing@entry=1, fgetline=fgetline@entry=0x0, cookie=cookie@entry=0x0) at ex_docmd.c:2482 #10 0x00001a16e15da03c in do_cmdline (cmdline=<optimized out>, fgetline=0x0, cookie=0x0, flags=11) at ex_docmd.c:975 #11 0x00001a16e1619868 in normal_cmd (oap=0x7d3f0520a650, toplevel=1) at normal.c:1082 #12 0x00001a16e1682a72 in main_loop (cmdwin=0, noexmode=0) at main.c:1510 #13 0x00001a16e16835c6 in vim_main2 () at main.c:901 #14 0x00001a16e15a987d in main (argc=<optimized out>, argv=<optimized out>) at main.c:3322 (gdb) bt full #0 deathtrap (sigarg=15) at os_unix.c:946 entered = 0 i = <optimized out> #1 <signal handler called> No locals. #2 deathtrap (sigarg=1) at os_unix.c:946 entered = 0 i = <optimized out> #3 <signal handler called> No locals. #4 0x00007206b18fc3db in kill () at ../sysdeps/unix/syscall-template.S:78 No locals. #5 0x00001a16e162cff6 in mch_suspend () at os_unix.c:1256 No locals. #6 0x00001a16e1667d49 in ui_suspend () at ui.c:558 No locals. #7 0x00001a16e15d4186 in ex_stop (eap=0x7d3f0520a350) at ex_docmd.c:5401 No locals. #8 ex_stop (eap=0x7d3f0520a350) at ex_docmd.c:5384 No locals. #9 0x00001a16e15d96c2 in do_one_cmd (cmdlinep=cmdlinep@entry=0x7d3f0520a550, sourcing=sourcing@entry=1, fgetline=fgetline@entry=0x0, cookie=cookie@entry=0x0) at ex_docmd.c:2482 p = <optimized out> lnum = <optimized out> n = <optimized out> errormsg = 0x0 after_modifier = <optimized out> ea = {arg = 0x1a16e4548542 "", nextcmd = 0x0, cmd = 0x1a16e4548540 "st", cmdlinep = 0x7d3f0520a550, cmdidx = CMD_stop, argt = 524546, skip = 0, forceit = 0, addr_count = 0, line1 = 1, line2 = 1, addr_type = ADDR_NONE, flags = 0, do_ecmd_cmd = 0x0, do_ecmd_lnum = 0, append = 0, usefilter = 0, amount = 0, regname = 0, force_bin = 0, read_edit = 0, force_ff = 0, force_enc = 0, bad_char = 0, useridx = 0, errmsg = 0x0, getline = 0x0, cookie = 0x0, verbose_save = -1, save_msg_silent = -1, did_esilent = 0} save_msg_scroll = 0 save_cmdmod = {hide = 0, split = 0, tab = 0, keepalt = 0, keepmarks = 0, keepjumps = 0, lockmarks = 0, keeppatterns = 0, noswapfile = 0, save_ei = 0x0, filter_regmatch = {regprog = 0x0, startp = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, endp = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, rm_ic = 0}, filter_force = 0} save_reg_executing = 0 ni = 0 cmd = <optimized out> #10 0x00001a16e15da03c in do_cmdline (cmdline=<optimized out>, fgetline=0x0, cookie=0x0, flags=11) at ex_docmd.c:975 next_cmdline = <optimized out> cmdline_copy = 0x1a16e4548540 "st" used_getline = <optimized out> recursive = 1 msg_didout_before_start = 0 count = <optimized out> did_inc = 0 retval = 1 call_depth = 1 #11 0x00001a16e1619868 in normal_cmd (oap=0x7d3f0520a650, toplevel=1) at normal.c:1082 ca = {oap = 0x7d3f0520a650, prechar = 0, cmdchar = 26, nchar = 0, ncharC1 = 0, ncharC2 = 0, extra_char = 0, opcount = 0, count0 = 0, count1 = 1, arg = 0, retval = 0, searchbuf = 0x0} c = <optimized out> ctrl_w = <optimized out> old_col = 0 old_pos = {lnum = 77, col = 0, coladd = <optimized out>} mapped_len = <optimized out> old_mapped_len = 0 idx = 28 #12 0x00001a16e1682a72 in main_loop (cmdwin=0, noexmode=0) at main.c:1510 oa = {op_type = 0, regname = 0, motion_type = 1, motion_force = 0, use_reg_one = 0, inclusive = 0, end_adjusted = 0, start = {lnum = 0, col = 0, coladd = 0}, end = {lnum = 0, col = 0, coladd = 0}, cursor_start = {lnum = 0, col = 0, coladd = 0}, line_count = 0, empty = 0, is_VIsual = 0, block_mode = 0, start_vcol = 0, end_vcol = 0, prev_opcount = 0, prev_count0 = 0} prev_oap = 0x0 previous_got_int = 1 #13 0x00001a16e16835c6 in vim_main2 () at main.c:901 No locals. #14 0x00001a16e15a987d in main (argc=<optimized out>, argv=<optimized out>) at main.c:3322 i = <optimized out> |
4. Session 관련 Log 확인
Log를 확인해보면, 특정 시점에 Client로부터 Edge Node로 SSH Session(125번)이 생성되었고,
그 후, SSH Session이 예기치 않게 종료된 것으로 확인
<DATE_TIME> <HOSTNAME> kernel - - - [78725.005151] get_sigframe: 13 callbacks suppressed <DATE_TIME> <HOSTNAME> kernel - - - [78725.005153] signal: vim.tiny[57850] overflowed sigaltstack <DATE_TIME> <HOSTNAME> kernel - - - [78725.005156] signal: vim.tiny[57850] overflowed sigaltstack <DATE_TIME> <HOSTNAME> kernel - - - [78725.005164] grsec: From <CLIENT_IP_ADDR>: Segmentation fault occurred at 0000000000000000 in /usr/bin/vim.tiny[vim.tiny:57850] uid/euid:0/0 gid/egid:0/0, parent /[bash:57561] uid/euid:0/0 gid/egid:0/0 <DATE_TIME> <HOSTNAME> NSX 691637 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.vim.tiny.1721816611.57850.0.11.gz <DATE_TIME> <HOSTNAME> sshd 57540 - - Accepted password for root from <CLIENT_IP_ADDR> port 56516 ssh2 <DATE_TIME> <HOSTNAME> sshd 57540 - - pam_unix(sshd:session): session opened for user root by (uid=0) <DATE_TIME> <HOSTNAME> systemd-logind 3039 - - New session 125 of user root. <DATE_TIME> <HOSTNAME> sshd 57540 - - pam_unix(sshd:session): session closed for user root <DATE_TIME> <HOSTNAME> sshd 57540 - - pam_systemd(sshd:session): Failed to release session: Interrupted system call <DATE_TIME> <HOSTNAME> systemd-logind 3039 - - Removed session 125. |
5. 문제 재현
우선, Client에서 SSH로 Edge Node에 root 계정을 이용하여 접속
root 계정으로 접속한 SSH Session에서 "vim.tiny /etc/default/kdump-tools" 명령어로 vim을 실행시킨 상태에서 ":st" 명령어로 vim을 suspend
이 후, Client를 Power Off
Edge Node에는 바로 SSH Session이 종료되지 않음
root@edge-node-01:/var/log# ps -ef | grep vim root 848258 848045 0 05:31 pts/1 00:00:00 vim.tiny /etc/default/kdump-tools root 849066 694893 0 05:32 pts/0 00:00:00 grep --color=auto vim |
시간이 지나고 난 후 SSH Session이 종료되고, Core Dump 파일이 생성되어 있음
root@edge-node-01:/var/log# ps -ef | grep vim root 953038 694893 0 08:55 pts/0 00:00:00 grep --color=auto vim root@edge-node-01:/var/log# cd core root@edge-node-01:/var/log/core# ls core.vim.tiny.1723182012.848258.0.11.gz lost+found |
실제 Core Dump를 확인한 과정과 동일하게 vim.tiny symbol 파일을 Edge Node로 Upload하고 설치
생성되었던 Core Dump를 압축 해제하고, GDB로 Backtrace 확인
※ Backtrace를 보면 실제로 문제 생겼을 때 수집된 Backtrace와 동일
root@edge-node-01:/var/log/core# ls -al total 2100 drwxrwx--- 3 root nsx 4096 Aug 10 16:41 . drwxr-xr-x 14 root root 4096 May 2 06:48 .. -rw-r--r-- 1 root root 89629 Aug 9 05:40 core.vim.tiny.1723182012.848258.0.11.gz drwx------ 2 root root 16384 Jun 28 2022 lost+found -rw-r--r-- 1 root root 2032404 Aug 10 16:40 vim-tiny-dbgsym_8.1.2269-1ubuntu5.21_amd64.ddeb root@edge-node-01:/var/log/core# gzip -d core.vim.tiny.1723182012.848258.0.11.gz root@edge-node-01:/var/log/core# ls -al total 2840 drwxrwx--- 3 root nsx 4096 Aug 10 16:42 . drwxr-xr-x 14 root root 4096 May 2 06:48 .. -rw-r--r-- 1 root root 847872 Aug 9 05:40 core.vim.tiny.1723182012.848258.0.11 drwx------ 2 root root 16384 Jun 28 2022 lost+found -rw-r--r-- 1 root root 2032404 Aug 10 16:40 vim-tiny-dbgsym_8.1.2269-1ubuntu5.21_amd64.ddeb root@edge-node-01:/var/log/core# /opt/vmware/nsx-edge/sbin/gdb /usr/bin/vim.tiny core.vim.tiny.1723182012.848258.0.11 GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/bin/vim.tiny... Reading symbols from /usr/lib/debug/.build-id/d2/a842f097ee187cee7da69f798844c3a5e153e9.debug... [New LWP 848258] warning: Section `.reg-xstate/848258' in core file too small. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `vim.tiny /etc/default/kdump-tools'. Program terminated with signal SIGSEGV, Segmentation fault. warning: Section `.reg-xstate/848258' in core file too small. #0 deathtrap (sigarg=15) at os_unix.c:946 946 os_unix.c: No such file or directory. (gdb) bt #0 deathtrap (sigarg=15) at os_unix.c:946 #1 <signal handler called> #2 deathtrap (sigarg=1) at os_unix.c:946 #3 <signal handler called> #4 0x000069b74bdbf3db in kill () at ../sysdeps/unix/syscall-template.S:78 #5 0x0000111b6a4c5ff6 in mch_suspend () at os_unix.c:1256 #6 0x0000111b6a500d49 in ui_suspend () at ui.c:558 #7 0x0000111b6a46d186 in ex_stop (eap=0x781e0ab50530) at ex_docmd.c:5401 #8 ex_stop (eap=0x781e0ab50530) at ex_docmd.c:5384 #9 0x0000111b6a4726c2 in do_one_cmd (cmdlinep=cmdlinep@entry=0x781e0ab50730, sourcing=sourcing@entry=0, fgetline=fgetline@entry=0x111b6a4785d0 <getexline>, cookie=cookie@entry=0x0) at ex_docmd.c:2482 #10 0x0000111b6a47303c in do_cmdline (cmdline=cmdline@entry=0x0, fgetline=0x111b6a4785d0 <getexline>, cookie=cookie@entry=0x0, flags=0) at ex_docmd.c:975 #11 0x0000111b6a4aeb0b in nv_colon (cap=0x781e0ab507b0) at normal.c:3332 #12 nv_colon (cap=0x781e0ab507b0) at normal.c:3299 #13 0x0000111b6a4b2868 in normal_cmd (oap=0x781e0ab50850, toplevel=1) at normal.c:1082 #14 0x0000111b6a51ba72 in main_loop (cmdwin=0, noexmode=0) at main.c:1510 #15 0x0000111b6a51c5c6 in vim_main2 () at main.c:901 #16 0x0000111b6a44287d in main (argc=<optimized out>, argv=<optimized out>) at main.c:3322 |
NSX Edge Node의 경우에는 root 계정을 이용하여 접속 시, 다음과 같이 경고 문구를 통해 사용자의 변경에 대한 주의를 주고 있기 때문에, 이러한 Operation은 VMware by Broadcom을 통해 Guide 받지 않은 경우 삼가
*************************************************************************** NOTICE TO USERS WARNING! Changes made to NSX Data Center while logged in as the root user can cause system failure and potentially impact your network. Please be advised that changes made to the system as the root user must only be made under the guidance of VMware. *************************************************************************** |