[NSX] High CPU 분석을 위한 perf tool 사용 방법
NSX의 중요 Component인 Edge Appliance의 경우 Ubuntu OS에 DPDK를 구현한 방식으로 구성되어 있습니다.
이 Edge Appliance의 가장 중요한 프로세스는 datapathd 인데, 해당 프로세스가 Scheduling 되는 Core에서 High CPU Usage 현상이 발생하는 경우 Packet Drop등의 장애로 이어질 수 있습니다.
이런 경우, OS Troubleshooting 할 때와 동일하게 CPU Sampling이 필요합니다.
제가 Microsoft 재직 시절, 동일한 Troubleshooting을 할 때는 Windows OS에서 Process Monitor나 Performance Toolkit(xperf)를 이용하여 CPU Sampling을 했었는데, Linux의 경우 perf 도구를 이용하여 동일한 내용을 확인할 수 있습니다.
오늘은 이 perf 도구를 활용한 CPU Sampling 수집 방법을 알아보겠습니다.
사전 확인 정보
우선, 어떤 Core가 High CPU 현상을 보여주는지 부터 확인이 필요합니다.
Edge Node에 admin session으로 접속하신 후, "get dataplane cpu stats" 명령어 결과를 통해 몇 번 Core가 CPU를 얼마나 사용하고 있는지 알 수 있습니다.
아래 예제는 Lab 환경이기 때문에 CPU Usage가 높게 나오는 Core는 없는 상황입니다.
edge-node-01> get dataplane cpu stats Tue Jan 09 2024 UTC 00:57:12.144 CPU Usage Core : 0 Crypto : 0 pps Intercore : 0 pps Kni : 0 pps Rx : 0 pps Slowpath : 0 pps Tx : 0 pps Usage : 0% ### <-- !! Core : 1 Crypto : 0 pps Intercore : 0 pps Kni : 0 pps Rx : 60 pps Slowpath : 0 pps Tx : 0 pps Usage : 0% Core : 2 Crypto : 0 pps Intercore : 0 pps Kni : 0 pps Rx : 0 pps Slowpath : 0 pps Tx : 0 pps Usage : 0% Core : 3 Crypto : 0 pps Intercore : 0 pps Kni : 0 pps Rx : 0 pps Slowpath : 0 pps Tx : 0 pps Usage : 0% Core : 4 Crypto : 0 pps Intercore : 0 pps Kni : 0 pps Rx : 0 pps Slowpath : 0 pps Tx : 0 pps Usage : n/a% Core : 5 Crypto : 0 pps Intercore : 0 pps Kni : 0 pps Rx : 0 pps Slowpath : 0 pps Tx : 0 pps Usage : n/a% |
CPU Usage가 높게 나오는 Core를 확인한 후, perf 도구를 통해 각 Core별 많이 사용되고 있는 Function을 확인할 수 있습니다.
그 전에 아래 Kernel 옵션을 설정해야 합니다.
Symbol을 이용하기 위해 perf top permission 부여
※ kptr_restrict is a linux mechanism to hide kernel symbols. If not disabled, kernel symbols won't be present.
root@edge-node-01:/tmp# cat /proc/sys/kernel/kptr_restrict 2 root@edge-node-01:/tmp# echo 1 > /proc/sys/kernel/kptr_restrict root@edge-node-01:/tmp# cat /proc/sys/kernel/kptr_restrict 1 |
참고로 3.2.2 이전 버전에서는 datapathd의 Symbol을 보기 위해서 내부에서 dbg package가 필요했으나 3.2.2. 버전부터는 대부분의 내용은 확인이 가능한 것으로 확인됩니다.
1. perf top
개별 Core Number를 이용하여 perf top -C <core#> 를 통해
root@edge-node-01:/tmp# perf top -C 0 Samples: 3K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 615874560 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 26.62% [kernel] [k] finish_task_switch 18.41% librte_net_vmxnet3.so.22.0 [.] 0x0000000000002854 12.23% librte_net_vmxnet3.so.22.0 [.] 0x0000000000002890 9.15% datapathd [.] fpn_main_loop 1.72% [kernel] [k] audit_filter_syscall 1.25% [kernel] [k] _raw_spin_unlock_irq 1.08% librte_ethdev.so.22.0 [.] rte_eth_dev_rx_intr_disable 0.95% [kernel] [k] audit_filter_rules.constprop.0 0.90% datapathd [.] ipsec_dequeue root@edge-node-01:/tmp# perf top -C 1 Samples: 5K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 799148009 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 31.51% [kernel] [k] finish_task_switch 10.60% librte_net_vmxnet3.so.22.0 [.] 0x0000000000002854 10.21% librte_net_vmxnet3.so.22.0 [.] 0x0000000000002890 9.44% datapathd [.] fpn_main_loop 2.42% [kernel] [k] audit_filter_syscall 2.38% [kernel] [k] _raw_spin_unlock_irq 1.32% datapathd [.] ipsec_dequeue 1.16% [kernel] [k] audit_filter_rules.constprop.0 1.06% librte_kni.so.22.0 [.] rte_kni_tx_burst root@edge-node-01:/tmp# perf top -C 2 Samples: 504 of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 126000000 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 29.37% [kernel] [k] finish_task_switch 11.90% librte_net_vmxnet3.so.22.0 [.] 0x0000000000002854 11.11% librte_net_vmxnet3.so.22.0 [.] 0x0000000000002890 8.93% datapathd [.] fpn_main_loop 2.58% [kernel] [k] _raw_spin_unlock_irq 1.59% datapathd [.] crypto_poll 1.59% librte_kni.so.22.0 [.] rte_kni_tx_burst 1.39% [kernel] [k] audit_filter_rules.constprop.0 |
사용 후에는 Kernel 옵션을 원복합니다.
root@edge-node-01:/tmp# echo 2 > /proc/sys/kernel/kptr_restrict |
2. perf record
perf top으로 확인한 output을 파일로 저장하고 싶은 경우 perf top 보다, perf record 옵션을 이용하는 것이 적합합니다.
root@edge-node-01:~# perf record -C 0 -o /var/log/vmware/perf-record-c0.out > /dev/null 2>&1 root@edge-node-01:/var/log/vmware# perf report -i perf-record-c0.out > perf-report-c0.txt # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 54K of event 'cpu-clock:pppH' # Event count (approx.): 13719750000 # # Overhead Command Shared Object Symbol # ........ ............... .............................. ....................................... # 86.94% swapper [kernel.kallsyms] [k] acpi_idle_do_entry 3.45% swapper [kernel.kallsyms] [k] finish_task_switch 3.43% dp-fp:0 [kernel.kallsyms] [k] finish_task_switch 2.07% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x000000000000c2f9 1.97% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x000000000000c329 0.28% dp-fp:0 datapathd [.] fpn_main_loop 0.09% dp-fp:0 [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.07% dp-fp:0 [kernel.kallsyms] [k] audit_filter_syscall 0.05% rcu_preempt [kernel.kallsyms] [k] finish_task_switch 0.05% dp-fp:0 datapathd [.] rte_eth_rx_burst 0.04% dp-fp:0 datapathd [.] ipsec_dequeue 0.04% dp-fp:0 libc-2.31.so [.] epoll_wait 0.04% dp-fp:0 [kernel.kallsyms] [k] __fget_files 0.03% dp-fp:0 librte_kni.so.20.0 [.] 0x0000000000002f88 0.03% dp-fp:0 datapathd [.] IsLbEnabled 0.03% dp-fp:0 datapathd [.] slowpath_ring_drain 0.03% dp-fp:0 libc-2.31.so [.] timerfd_settime 0.03% dp-fp:0 librte_kni.so.20.0 [.] rte_kni_tx_burst 0.03% swapper [kernel.kallsyms] [k] cpuidle_enter_state 0.03% dp-fp:0 [kernel.kallsyms] [k] syscall_enter_from_user_mode 0.03% dp-fp:0 [kernel.kallsyms] [k] __audit_syscall_exit 0.03% dp-fp:0 [kernel.kallsyms] [k] _raw_write_unlock_irq 0.03% dp-fp:0 [kernel.kallsyms] [k] common_file_perm 0.03% dp-fp:0 datapathd [.] hash_iterate.constprop.18 0.03% dp-fp:0 librte_ethdev.so.20.0 [.] rte_eth_dev_is_valid_port 0.03% kauditd [kernel.kallsyms] [k] finish_task_switch 0.02% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x0000000000007690 0.02% swapper [kernel.kallsyms] [k] tick_nohz_idle_enter 0.02% dp-fp:0 [kernel.kallsyms] [k] do_syscall_64 0.02% dp-fp:0 ld-2.31.so [.] __tls_get_addr 0.02% dp-fp:0 libpthread-2.31.so [.] __libc_read ... |
perf record를 background로 실행하여 일정 시간 동안 수집되도록 할 수도 있습니다.
아래 예제에서는 nohup을 이용하여 background로 실행되도록 하고, timeout을 이용하여 정해진 시간 동안만 수집하도록 하였습니다.
root@edge-node-01:/var/log/vmware# nohup timeout 30 perf record -C 0 -o /var/log/vmware/perf-record-c0-$(date +%Y-%m-%d).out > /dev/null 2>&1 & [1] 963158 root@edge-node-01:/var/log/vmware# ps -ef | grep record root 963158 120832 0 06:21 pts/0 00:00:00 timeout 30 perf record -C 0 -o /var/log/vmware/perf-record-c0-2024-04-30.out root 963160 963158 4 06:21 pts/0 00:00:00 /usr/lib/linux-tools/5.15.92-nn12-server/perf record -C 0 -o /var/log/vmware/perf-record-c0-2024-04-30.out root 963264 120832 0 06:22 pts/0 00:00:00 grep --color=auto record root@edge-node-01:/var/log/vmware# ps -ef | grep record root 963415 120832 0 06:22 pts/0 00:00:00 grep --color=auto record [1]+ Exit 124 nohup timeout 30 perf record -C 0 -o /var/log/vmware/perf-record-c0-$(date +%Y-%m-%d).out > /dev/null 2>&1 root@edge-node-01:/var/log/vmware# perf report -i /var/log/vmware/perf-record-c0-2024-04-30.out > perf-record-c0-2024-04-30.txt root@edge-node-01:/var/log/vmware# cat perf-record-c0-2024-04-30.txt | more # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 115K of event 'cpu-clock:pppH' # Event count (approx.): 28945500000 # # Overhead Command Shared Object Symbol # ........ ............... .............................. .............................................................................................................................. # 86.92% swapper [kernel.kallsyms] [k] acpi_idle_do_entry 3.49% dp-fp:0 [kernel.kallsyms] [k] finish_task_switch 3.49% swapper [kernel.kallsyms] [k] finish_task_switch 2.14% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x000000000000c2f9 1.89% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x000000000000c329 0.27% dp-fp:0 datapathd [.] fpn_main_loop 0.10% dp-fp:0 [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.08% dp-fp:0 [kernel.kallsyms] [k] audit_filter_syscall 0.04% dp-fp:0 libc-2.31.so [.] epoll_wait 0.04% dp-fp:0 datapathd [.] rte_eth_rx_burst 0.03% dp-fp:0 datapathd [.] ipsec_dequeue 0.03% dp-fp:0 datapathd [.] hash_iterate.constprop.18 0.03% dp-fp:0 [kernel.kallsyms] [k] __put_user_nocheck_4 0.03% dp-fp:0 datapathd [.] l4lb_dpmsg_pending 0.03% dp-fp:0 librte_ethdev.so.20.0 [.] rte_eth_dev_is_valid_port 0.03% dp-fp:0 [kernel.kallsyms] [k] __fget_files 0.03% dp-fp:0 librte_kni.so.20.0 [.] rte_kni_tx_burst 0.03% dp-fp:0 ld-2.31.so [.] __tls_get_addr 0.03% dp-fp:0 libpthread-2.31.so [.] __libc_read 0.03% swapper [kernel.kallsyms] [k] cpuidle_enter_state 0.03% swapper [kernel.kallsyms] [k] tick_nohz_idle_enter 0.02% dp-fp:0 librte_eal.so.20.0 [.] rte_epoll_wait 0.02% dp-fp:0 [kernel.kallsyms] [k] _raw_write_unlock_irq 0.02% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x000000000000c31d 0.02% dp-fp:0 libc-2.31.so [.] __libc_disable_asynccancel 0.02% dp-fp:0 [vdso] [.] __vdso_time 0.02% rcu_preempt [kernel.kallsyms] [k] finish_task_switch 0.02% dp-fp:0 [kernel.kallsyms] [k] syscall_enter_from_user_mode 0.02% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x0000000000007690 0.02% dp-fp:0 librte_pmd_vmxnet3_uio.so.20.0 [.] 0x0000000000009194 0.02% kauditd [kernel.kallsyms] [k] finish_task_switch 0.02% dp-fp:0 librte_kni.so.20.0 [.] 0x0000000000002f88 0.02% swapper [kernel.kallsyms] [k] do_idle --More-- |