[NUMA] Constructs
ESXi NUMA Deep Dive 문서를 기반으로 확인한 내용을 정리합니다.
ESXi는 NUMA Architecture에 최적화 되어 있고, 이를 위해 NUMA Scheduler와 CPU Scheduler를 이용
ESXi가 NUMA Architecture 상에서 동작하는 경우, NUMA Scheduler가 활성화
NUMA Scheduler는 VM에 할당할 CPU와 Memory를 최적화
VM Initial Placement와 Running 중에 NUMA Node간 VM Workload 분산 역할 수행
PCPU는 VMKernel 내에서 사용하는 abstraction layer로, 위 그림을 참고해보면 전체 Core를 사용하거나 HT를 이용할 수도 있음
HT는 Logical Processor로 표현
VSOCKET은 단일 PCPU에 Mapping 될 수도 있고, 위 그림처럼 여러 PCPU로 확장도 가능
이는 VCPU의 개수와 CPS(Cores Per Socket) 수에 따라 달라짐
Initial Placement와 Load Balancing을 위해서, NUMA Scheduler는 NUMA Home Client(NHN)와 NUMA Client 구조체 사용
NUMA Home Node
NUMA Home Node는 단일 CPU package와 local memory를 논리적으로 표현하기 위해 사용
NUMA Home Node를 통해 NUMA Client가 CPU package 내 physical core 수 계산 가능
NUMA Scheduler는 CPU package의 core 수를 계산하여, VM의 vCPU 수와 비교하고 이 정보를 기반으로 Initial Placement와 Load Balancing 하는데 사용
만약 vCPU의 수가 단일 CPU package의 core 수를 초과하는 경우, vCPU를 여러 NUMA Node에 걸쳐 분산
여러 NUMA Node에 걸쳐 vCPU가 분산되는 것을 피하려면 vCPU 수를 줄이거나,
NUMA Scheduler가 physical core 수 대신에 HT인 Logical Processor 수를 계산하도록 numa.vcpu.preferHT를 True로 설정할 수도 있음
## NUMA Scheduler는 기본적으로 Logical Processor를 계산하여 성능 최적화에 사용하지 않음
## 다만, cache와 memory를 공유하는게 성능 측면에서 더 이점이 있는 Workload의 경우에는
## NUMA Scheduler가 physical core 대신에 Logical Processor를 이용하도록 하는 것이 효과적인 경우가 있을 수 있음
CPU와 동일하게 Memory도 단일 CPU Package에 연결된 Local Memory 보다 더 크게 Virtual Memory를 할당하는 경우,
Memory Scheduler는 다른 NUMA Node에서 Memory를 할당
NUMA Client
NUMA Client는 단일 NUMA Home Node에 맞는 VM의 vCPU와 Memory 집합
NUMA Client는 NUMA Scheduler가 사용하는 최소 단위
Power-On 시점에 vCPU 개수를 계산하여, CPU package 내의 physical core 수와 비교한 후,
만약 NUMA Node의 physical core 수를 초과하지 않는 경우, 단일 NUMA Client 생성
단일 NUMA Client가 생성되는 경우에는 당연히 VM에게 NUMA가 아닌 UMA topology로 인식
반대로 vCPU가 단일 CPU package의 physical core 수보다 많은 경우, 하나의 VM을 위해 여러 NUMA Client가 생성
아래 그림은 단일 CPU package에 10개의 core가 있는 환경에서 VM에 vCPU를 12개 할당한 경우 두 개의 NUMA Client가 생성되는 것을 보여줌
## 이런 VM을 Wide-VM 이라고 함
vNUMA Node
VM에 보여주는 NUMA Topology를 vNUMA라고 함
예를 들어, 위 예제처럼 12개의 vCPU를 가진 VM의 경우 2개의 NUMA Client가 생성되므로 VM의 Guest OS에는 2개의 vNUMA가 표현
이를 이용하여 Guest OS 내부에서 NUMA Optimization에 활용
이처럼 여러 vNUMA Client가 생성되면 NUMA Scheduler는 vNUMA Client를 Auto-size 처리
기본적으로는 vCPU가 분산되는 vNUMA Client의 수를 최소화
Determining the vNUMA Layout
vmware.log에 NUMA Configuation 정보가 포함
아래 예제의 경우 cpuid.coresPerSocket 값이 2로 되어 있기 때문에, 전체 vCPU 32개는 2개의 Physical Domain(PPD)로 Grouping
vNUMA Topology를 VM에 표현하기 위해서 VPD로 PPD와 같이 2개로 설정
아래 예제 VM은 vCPU 32개, Memory는 192GB(Reserved) 할당한 상태
vmdumper
# vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi "DICT.*(displayname.*|numa.*|cores.*|vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*" "/$path/vmware.log"; echo -e; done <snip> DICT numvcpus = "32" DICT memSize = "196608" DICT sched.cpu.affinity = "all" DICT displayName = "debuggee" DICT cpuid.coresPerSocket = "2" DICT numa.autosize.cookie = "320022" DICT numa.autosize.vcpu.maxPerVirtualNode = "16" numaHost: NUMA config: consolidation= 1 preferHT= 0 partitionByMemory = 0 numaHost: 32 VCPUs 2 VPDs 2 PPDs ### <-- numaHost: VCPU 0 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 1 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 2 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 3 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 4 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 5 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 6 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 7 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 8 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 9 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 10 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 11 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 12 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 13 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 14 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 15 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 16 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 17 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 18 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 19 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 20 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 21 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 22 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 23 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 24 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 25 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 26 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 27 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 28 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 29 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 30 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 31 VPD 1 PPD 1 NodeMask ffffffffffffffff |
Guest OS 내에서는 어떻게 표현되는지 Windows의 경우 CoreInfo(https://learn.microsoft.com/en-us/sysinternals/downloads/coreinfo)를 이용할 수 있고, Linux의 경우에는 numactl 도구로 확인 가능
Windows의 경우
C:\Users\Administrator\Downloads\SysinternalsSuite>Coreinfo64.exe Coreinfo v3.6 - Dump information on system CPU and memory topology Copyright (C) 2008-2022 Mark Russinovich Sysinternals - www.sysinternals.com Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz Intel64 Family 6 Model 85 Stepping 7, GenuineIntel Microcode signature: 05003302 HTT * Hyperthreading enabled CET - Supports Control Flow Enforcement Technology Kernel CET - Kernel-mode CET Enabled User CET - User-mode CET Allowed HYPERVISOR * Hypervisor is present VMX - Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization X64 * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT SGX - Supports Intel SGX NX * Supports no-execute page protection SMEP * Supports Supervisor Mode Execution Prevention SMAP * Supports Supervisor Mode Access Prevention PAGE1GB * Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS * Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE * Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX * Supports AVX instruction extensions AVX2 * Supports AVX2 instruction extensions AVX-512-F * Supports AVX-512 Foundation instructions AVX-512-DQ * Supports AVX-512 double and quadword instructions AVX-512-IFAMA - Supports AVX-512 integer Fused multiply-add instructions AVX-512-PF - Supports AVX-512 prefetch instructions AVX-512-ER - Supports AVX-512 exponential and reciprocal instructions AVX-512-CD * Supports AVX-512 conflict detection instructions AVX-512-BW * Supports AVX-512 byte and word instructions AVX-512-VL * Supports AVX-512 vector length instructions FMA * Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE * Supports XSAVE/XRSTOR instructions OSXSAVE * Supports XSETBV/XGETBV instructions RDRAND * Supports RDRAND instruction RDSEED * Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 * Supports CMPXCHG16B instruction BMI1 * Supports bit manipulation extensions 1 BMI2 * Supports bit manipulation extensions 2 ADX * Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C * Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR - Supports optimized FXSAVE/FSRSTOR instruction MONITOR - Supports MONITOR and MWAIT instructions MOVBE * Supports MOVBE instruction ERMSB * Supports Enhanced REP MOVSB/STOSB PCLMULDQ * Supports PCLMULDQ instruction POPCNT * Supports POPCNT instruction LZCNT * Supports LZCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE - Supports Hardware Lock Elision instructions RTM - Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 - Can write history of 64-bit branch addresses DS - Implements memory-resident debug buffer DS-CPL - Supports Debug Store feature with CPL PCID * Supports PCIDs and settable CR4.PCIDE INVPCID * Supports INVPCID instruction PDCM - Supports Performance Capabilities MSR RDTSCP * Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE * Local APIC supports one-shot deadline timer TSC-INVARIANT * TSC runs at constant rate xTPR - Supports disabling task priority messages EIST - Supports Enhanced Intel Speedstep ACPI - Implements MSR for power management TM - Implements thermal monitor circuitry TM2 - Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC * Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE - Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Maximum implemented CPUID leaves: 00000016 (Basic), 80000008 (Extended). Maximum implemented address width: 48 bits (virtual), 45 bits (physical). Processor signature: 00050657 Logical to Physical Processor Map: *------------------------------- Physical Processor 0 -*------------------------------ Physical Processor 1 --*----------------------------- Physical Processor 2 ---*---------------------------- Physical Processor 3 ----*--------------------------- Physical Processor 4 -----*-------------------------- Physical Processor 5 ------*------------------------- Physical Processor 6 -------*------------------------ Physical Processor 7 --------*----------------------- Physical Processor 8 ---------*---------------------- Physical Processor 9 ----------*--------------------- Physical Processor 10 -----------*-------------------- Physical Processor 11 ------------*------------------- Physical Processor 12 -------------*------------------ Physical Processor 13 --------------*----------------- Physical Processor 14 ---------------*---------------- Physical Processor 15 ----------------*--------------- Physical Processor 16 -----------------*-------------- Physical Processor 17 ------------------*------------- Physical Processor 18 -------------------*------------ Physical Processor 19 --------------------*----------- Physical Processor 20 ---------------------*---------- Physical Processor 21 ----------------------*--------- Physical Processor 22 -----------------------*-------- Physical Processor 23 ------------------------*------- Physical Processor 24 -------------------------*------ Physical Processor 25 --------------------------*----- Physical Processor 26 ---------------------------*---- Physical Processor 27 ----------------------------*--- Physical Processor 28 -----------------------------*-- Physical Processor 29 ------------------------------*- Physical Processor 30 -------------------------------* Physical Processor 31 Logical Processor to Socket Map: **------------------------------ Socket 0 --**---------------------------- Socket 1 ----**-------------------------- Socket 2 ------**------------------------ Socket 3 --------**---------------------- Socket 4 ----------**-------------------- Socket 5 ------------**------------------ Socket 6 --------------**---------------- Socket 7 ----------------**-------------- Socket 8 ------------------**------------ Socket 9 --------------------**---------- Socket 10 ----------------------**-------- Socket 11 ------------------------**------ Socket 12 --------------------------**---- Socket 13 ----------------------------**-- Socket 14 ------------------------------** Socket 15 Logical Processor to NUMA Node Map: ### <-- ****************---------------- NUMA Node 0 ----------------**************** NUMA Node 1 Approximate Cross-NUMA Node Access Cost (relative to fastest): 00 01 00: 1.0 1.0 01: 1.5 2.1 Logical Processor to Cache Map: *------------------------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64 **------------------------------ Unified Cache 1, Level 3, 22 MB, Assoc 11, LineSize 64 -*------------------------------ Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Unified Cache 2, Level 2, 1 MB, Assoc 16, LineSize 64 --*----------------------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Unified Cache 3, Level 2, 1 MB, Assoc 16, LineSize 64 --**---------------------------- Unified Cache 4, Level 3, 22 MB, Assoc 11, LineSize 64 ---*---------------------------- Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Unified Cache 5, Level 2, 1 MB, Assoc 16, LineSize 64 ----*--------------------------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Unified Cache 6, Level 2, 1 MB, Assoc 16, LineSize 64 ----**-------------------------- Unified Cache 7, Level 3, 22 MB, Assoc 11, LineSize 64 -----*-------------------------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Unified Cache 8, Level 2, 1 MB, Assoc 16, LineSize 64 ------*------------------------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Unified Cache 9, Level 2, 1 MB, Assoc 16, LineSize 64 ------**------------------------ Unified Cache 10, Level 3, 22 MB, Assoc 11, LineSize 64 -------*------------------------ Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Unified Cache 11, Level 2, 1 MB, Assoc 16, LineSize 64 --------*----------------------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Instruction Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Unified Cache 12, Level 2, 1 MB, Assoc 16, LineSize 64 --------**---------------------- Unified Cache 13, Level 3, 22 MB, Assoc 11, LineSize 64 ---------*---------------------- Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Instruction Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Unified Cache 14, Level 2, 1 MB, Assoc 16, LineSize 64 ----------*--------------------- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Instruction Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Unified Cache 15, Level 2, 1 MB, Assoc 16, LineSize 64 ----------**-------------------- Unified Cache 16, Level 3, 22 MB, Assoc 11, LineSize 64 -----------*-------------------- Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Instruction Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Unified Cache 17, Level 2, 1 MB, Assoc 16, LineSize 64 ------------*------------------- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Instruction Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Unified Cache 18, Level 2, 1 MB, Assoc 16, LineSize 64 ------------**------------------ Unified Cache 19, Level 3, 22 MB, Assoc 11, LineSize 64 -------------*------------------ Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Instruction Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Unified Cache 20, Level 2, 1 MB, Assoc 16, LineSize 64 --------------*----------------- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Instruction Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Unified Cache 21, Level 2, 1 MB, Assoc 16, LineSize 64 --------------**---------------- Unified Cache 22, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------*---------------- Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Instruction Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Unified Cache 23, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------*--------------- Data Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Instruction Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Unified Cache 24, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------**-------------- Unified Cache 25, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------*-------------- Data Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Instruction Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Unified Cache 26, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------*------------- Data Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Instruction Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Unified Cache 27, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------**------------ Unified Cache 28, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------*------------ Data Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Instruction Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Unified Cache 29, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------*----------- Data Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Instruction Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Unified Cache 30, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------**---------- Unified Cache 31, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------------*---------- Data Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Instruction Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Unified Cache 32, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------*--------- Data Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Instruction Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Unified Cache 33, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------**-------- Unified Cache 34, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------------*-------- Data Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Instruction Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Unified Cache 35, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------*------- Data Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Instruction Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Unified Cache 36, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------**------ Unified Cache 37, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------------*------ Data Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Instruction Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Unified Cache 38, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------*----- Data Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Instruction Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Unified Cache 39, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------**---- Unified Cache 40, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------------------*---- Data Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Instruction Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Unified Cache 41, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------*--- Data Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Instruction Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Unified Cache 42, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------**-- Unified Cache 43, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------------------*-- Data Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Instruction Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Unified Cache 44, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------*- Data Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Instruction Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Unified Cache 45, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------** Unified Cache 46, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------------------* Data Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Instruction Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Unified Cache 47, Level 2, 1 MB, Assoc 16, LineSize 64 Logical Processor to Group Map: ******************************** Group 0 |
Linux의 경우
[root@localhost ~]# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 96528 MB node 0 free: 95401 MB node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 96749 MB node 1 free: 95545 MB node distances: node 0 1 0: 10 20 1: 20 10 [root@localhost ~]# numactl --show policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 cpubind: 0 1 nodebind: 0 1 membind: 0 1 |
Adjusting Virtual NUMA Topology
Application Memory Bandwidth Requirment가 있거나, 작은 System을 위한 NUMA Client 크기 조절이 필요한 경우
여러 Advanced Parameter 이용 가능
numa.vcpu.min
아래 2가지 조건이 부합되는 경우, vNUMA topology가 VM에 표현
1) VM이 9개 이상의 vCPU를 가지는 경우
2) vCPU가 physical NUMA Node의 core count를 초과하는 경우
## numa.vcpu.preferHT=TRUE 옵션을 사용하는 경우, core count 대신 Logical Processor(HT)를 계산
numa.vcpu.maxPerMachineNode
Workload가 Memory Latency보다 Memory Bandwidth가 필요한 경우 사용할 수 있는 옵션
이 설정을 이용하면, 단일 NUMA Client에 포함되는 vCPU의 수를 줄여서 원래 단일 NUMA Client를 사용할 VM을 여러 NUMA Client를 생성할 수 있도록 NUMA Scheduler를 조정
Count Threads Not Cores(numa.vcpu.preferHT=TRUE)
기본적으로 NUMA Scheduler는 최대한 NUMA Node를 적게 사용하도록 하여, physical core에게 scheduling 기회를 최대한 주려는 것이 목표
따라서 NUMA Client는 단일 CPU package의 최대 physical core 수까지 제한적
하지만 몇몇 Application은 Thread간의 많은 Memory 공유가 필요(Cache Intensive)
이런 Application들은 최대한 많은 local memory와 단일 local cache를 사용할 때 Performance 측면에서 이점이 있음
이런 경우에는 여러 NUMA Home Node에 vCPU를 분산시키기 보다는, local memory를 더 활용할 수 있도록 Hyperthread를 사용하는 것이 합리적
결국 이 옵션을 사용하면 단일 CPU package 내의 core 수 대신 thread를 계산하기 때문에, physical core 수보다 큰 사이즈의 단일 NUMA Client 생성이 가능
예를 들어, 단일 CPU package 내의 core 수가 10개인 상황에서 VM에 vCPU를 12개 할당하는 경우,
원래대로라면, NUMA Scheduler가 두 개의 NUMA Client를 이용하지만,
numa.vcpu.preferHT=TRUE 옵션을 사용하게 되면, NUMA Scheduler는 단일 CPU package 내에서 core 수 대신 thread 수(20개)를 이용하기 때문에,
단일 NUMA Client만 생성하면 되고 이로 인하여 NUMA Scheduler는 단일 CPU package 내에서 12개의 vCPU를 모두 scheduling이 가능해짐
이 옵션은 CPU Scheduler에 적용되는 옵션이 아니기 때문에,
vCPU의 Scheduling을 항상 core 대신 thread에 한다는 의미는 아니며, CPU Scheduler 입장에서는 physical core에서 scheduling을 할 수도 있음
또한, 이 옵션을 사용하는 경우에는 VM의 Memory 크기를 단일 NUMA Home Node의 크기로 제한해야 함
## numa.vcpu.preferHT 옵션을 사용하는 경우, coresPerSocket 값을 physical CPU package에 Align 하는 것을 권고 --> LLC Optimization을 위해
## vCPU 32, coresPerSocket=2, Memory 192GB VM을 상대로 테스트
## Physical Server의 CPU 현황은 다음과 같음
numa.vcpu.preferHT 설정 전
# vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi "DICT.*(displayname.*|numa.*|cores.* |vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*" "/$path/vmware.log"; echo -e; done DICT numvcpus = "32" ### <-- DICT memSize = "196608" DICT sched.cpu.affinity = "all" DICT displayName = "debuggee" DICT cpuid.coresPerSocket = "2" ### <-- DICT numa.autosize.cookie = "320022" DICT numa.autosize.vcpu.maxPerVirtualNode = "16" ### <-- numaHost: NUMA config: consolidation= 1 preferHT= 0 partitionByMemory = 0 ### <-- numaHost: 32 VCPUs 2 VPDs 2 PPDs ### <-- numaHost: VCPU 0 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 1 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 2 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 3 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 4 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 5 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 6 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 7 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 8 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 9 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 10 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 11 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 12 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 13 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 14 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 15 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 16 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 17 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 18 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 19 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 20 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 21 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 22 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 23 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 24 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 25 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 26 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 27 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 28 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 29 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 30 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 31 VPD 1 PPD 1 NodeMask ffffffffffffffff C:\Users\Administrator\Downloads\SysinternalsSuite>Coreinfo64.exe Coreinfo v3.6 - Dump information on system CPU and memory topology Copyright (C) 2008-2022 Mark Russinovich Sysinternals - www.sysinternals.com Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz Intel64 Family 6 Model 85 Stepping 7, GenuineIntel Microcode signature: 05003302 HTT * Hyperthreading enabled CET - Supports Control Flow Enforcement Technology Kernel CET - Kernel-mode CET Enabled User CET - User-mode CET Allowed HYPERVISOR * Hypervisor is present VMX - Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization X64 * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT SGX - Supports Intel SGX NX * Supports no-execute page protection SMEP * Supports Supervisor Mode Execution Prevention SMAP * Supports Supervisor Mode Access Prevention PAGE1GB * Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS * Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE * Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX * Supports AVX instruction extensions AVX2 * Supports AVX2 instruction extensions AVX-512-F * Supports AVX-512 Foundation instructions AVX-512-DQ * Supports AVX-512 double and quadword instructions AVX-512-IFAMA - Supports AVX-512 integer Fused multiply-add instructions AVX-512-PF - Supports AVX-512 prefetch instructions AVX-512-ER - Supports AVX-512 exponential and reciprocal instructions AVX-512-CD * Supports AVX-512 conflict detection instructions AVX-512-BW * Supports AVX-512 byte and word instructions AVX-512-VL * Supports AVX-512 vector length instructions FMA * Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE * Supports XSAVE/XRSTOR instructions OSXSAVE * Supports XSETBV/XGETBV instructions RDRAND * Supports RDRAND instruction RDSEED * Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 * Supports CMPXCHG16B instruction BMI1 * Supports bit manipulation extensions 1 BMI2 * Supports bit manipulation extensions 2 ADX * Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C * Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR - Supports optimized FXSAVE/FSRSTOR instruction MONITOR - Supports MONITOR and MWAIT instructions MOVBE * Supports MOVBE instruction ERMSB * Supports Enhanced REP MOVSB/STOSB PCLMULDQ * Supports PCLMULDQ instruction POPCNT * Supports POPCNT instruction LZCNT * Supports LZCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE - Supports Hardware Lock Elision instructions RTM - Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 - Can write history of 64-bit branch addresses DS - Implements memory-resident debug buffer DS-CPL - Supports Debug Store feature with CPL PCID * Supports PCIDs and settable CR4.PCIDE INVPCID * Supports INVPCID instruction PDCM - Supports Performance Capabilities MSR RDTSCP * Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE * Local APIC supports one-shot deadline timer TSC-INVARIANT * TSC runs at constant rate xTPR - Supports disabling task priority messages EIST - Supports Enhanced Intel Speedstep ACPI - Implements MSR for power management TM - Implements thermal monitor circuitry TM2 - Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC * Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE - Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Maximum implemented CPUID leaves: 00000016 (Basic), 80000008 (Extended). Maximum implemented address width: 48 bits (virtual), 45 bits (physical). Processor signature: 00050657 Logical to Physical Processor Map: *------------------------------- Physical Processor 0 -*------------------------------ Physical Processor 1 --*----------------------------- Physical Processor 2 ---*---------------------------- Physical Processor 3 ----*--------------------------- Physical Processor 4 -----*-------------------------- Physical Processor 5 ------*------------------------- Physical Processor 6 -------*------------------------ Physical Processor 7 --------*----------------------- Physical Processor 8 ---------*---------------------- Physical Processor 9 ----------*--------------------- Physical Processor 10 -----------*-------------------- Physical Processor 11 ------------*------------------- Physical Processor 12 -------------*------------------ Physical Processor 13 --------------*----------------- Physical Processor 14 ---------------*---------------- Physical Processor 15 ----------------*--------------- Physical Processor 16 -----------------*-------------- Physical Processor 17 ------------------*------------- Physical Processor 18 -------------------*------------ Physical Processor 19 --------------------*----------- Physical Processor 20 ---------------------*---------- Physical Processor 21 ----------------------*--------- Physical Processor 22 -----------------------*-------- Physical Processor 23 ------------------------*------- Physical Processor 24 -------------------------*------ Physical Processor 25 --------------------------*----- Physical Processor 26 ---------------------------*---- Physical Processor 27 ----------------------------*--- Physical Processor 28 -----------------------------*-- Physical Processor 29 ------------------------------*- Physical Processor 30 -------------------------------* Physical Processor 31 Logical Processor to Socket Map: **------------------------------ Socket 0 --**---------------------------- Socket 1 ----**-------------------------- Socket 2 ------**------------------------ Socket 3 --------**---------------------- Socket 4 ----------**-------------------- Socket 5 ------------**------------------ Socket 6 --------------**---------------- Socket 7 ----------------**-------------- Socket 8 ------------------**------------ Socket 9 --------------------**---------- Socket 10 ----------------------**-------- Socket 11 ------------------------**------ Socket 12 --------------------------**---- Socket 13 ----------------------------**-- Socket 14 ------------------------------** Socket 15 Logical Processor to NUMA Node Map: ****************---------------- NUMA Node 0 ----------------**************** NUMA Node 1 Approximate Cross-NUMA Node Access Cost (relative to fastest): 00 01 00: 1.1 1.0 01: 1.6 2.2 Logical Processor to Cache Map: *------------------------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64 **------------------------------ Unified Cache 1, Level 3, 22 MB, Assoc 11, LineSize 64 -*------------------------------ Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Unified Cache 2, Level 2, 1 MB, Assoc 16, LineSize 64 --*----------------------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Unified Cache 3, Level 2, 1 MB, Assoc 16, LineSize 64 --**---------------------------- Unified Cache 4, Level 3, 22 MB, Assoc 11, LineSize 64 ---*---------------------------- Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Unified Cache 5, Level 2, 1 MB, Assoc 16, LineSize 64 ----*--------------------------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Unified Cache 6, Level 2, 1 MB, Assoc 16, LineSize 64 ----**-------------------------- Unified Cache 7, Level 3, 22 MB, Assoc 11, LineSize 64 -----*-------------------------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Unified Cache 8, Level 2, 1 MB, Assoc 16, LineSize 64 ------*------------------------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Unified Cache 9, Level 2, 1 MB, Assoc 16, LineSize 64 ------**------------------------ Unified Cache 10, Level 3, 22 MB, Assoc 11, LineSize 64 -------*------------------------ Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Unified Cache 11, Level 2, 1 MB, Assoc 16, LineSize 64 --------*----------------------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Instruction Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Unified Cache 12, Level 2, 1 MB, Assoc 16, LineSize 64 --------**---------------------- Unified Cache 13, Level 3, 22 MB, Assoc 11, LineSize 64 ---------*---------------------- Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Instruction Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Unified Cache 14, Level 2, 1 MB, Assoc 16, LineSize 64 ----------*--------------------- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Instruction Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Unified Cache 15, Level 2, 1 MB, Assoc 16, LineSize 64 ----------**-------------------- Unified Cache 16, Level 3, 22 MB, Assoc 11, LineSize 64 -----------*-------------------- Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Instruction Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Unified Cache 17, Level 2, 1 MB, Assoc 16, LineSize 64 ------------*------------------- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Instruction Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Unified Cache 18, Level 2, 1 MB, Assoc 16, LineSize 64 ------------**------------------ Unified Cache 19, Level 3, 22 MB, Assoc 11, LineSize 64 -------------*------------------ Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Instruction Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Unified Cache 20, Level 2, 1 MB, Assoc 16, LineSize 64 --------------*----------------- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Instruction Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Unified Cache 21, Level 2, 1 MB, Assoc 16, LineSize 64 --------------**---------------- Unified Cache 22, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------*---------------- Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Instruction Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Unified Cache 23, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------*--------------- Data Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Instruction Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Unified Cache 24, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------**-------------- Unified Cache 25, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------*-------------- Data Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Instruction Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Unified Cache 26, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------*------------- Data Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Instruction Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Unified Cache 27, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------**------------ Unified Cache 28, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------*------------ Data Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Instruction Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Unified Cache 29, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------*----------- Data Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Instruction Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Unified Cache 30, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------**---------- Unified Cache 31, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------------*---------- Data Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Instruction Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Unified Cache 32, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------*--------- Data Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Instruction Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Unified Cache 33, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------**-------- Unified Cache 34, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------------*-------- Data Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Instruction Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Unified Cache 35, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------*------- Data Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Instruction Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Unified Cache 36, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------**------ Unified Cache 37, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------------*------ Data Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Instruction Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Unified Cache 38, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------*----- Data Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Instruction Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Unified Cache 39, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------**---- Unified Cache 40, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------------------*---- Data Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Instruction Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Unified Cache 41, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------*--- Data Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Instruction Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Unified Cache 42, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------**-- Unified Cache 43, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------------------*-- Data Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Instruction Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Unified Cache 44, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------*- Data Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Instruction Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Unified Cache 45, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------** Unified Cache 46, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------------------* Data Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Instruction Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Unified Cache 47, Level 2, 1 MB, Assoc 16, LineSize 64 Logical Processor to Group Map: ******************************** Group 0 |
numa.vcpu.preferHT 설정 후
# vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi "DICT.*(displayname.*|numa.*|cores.*|vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*" "/$path/vmware.log"; echo -e; done DICT numvcpus = "32" ### <-- DICT memSize = "196608" DICT sched.cpu.affinity = "all" DICT displayName = "debuggee" DICT cpuid.coresPerSocket = "2" ### <-- DICT numa.autosize.cookie = "320162" DICT numa.autosize.vcpu.maxPerVirtualNode = "32" DICT numa.vcpu.preferHT = "TRUE" ### <-- numaHost: NUMA config: consolidation= 1 preferHT= 1 partitionByMemory = 0 ### <-- numa: coresPerSocket= 2 maxVcpusPerVPD= 32 ### <-- numaHost: 32 VCPUs 1 VPDs 1 PPDs ### <-- numaHost: VCPU 0 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 1 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 2 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 3 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 4 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 5 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 6 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 7 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 8 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 9 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 10 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 11 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 12 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 13 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 14 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 15 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 16 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 17 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 18 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 19 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 20 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 21 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 22 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 23 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 24 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 25 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 26 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 27 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 28 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 29 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 30 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 31 VPD 0 PPD 0 NodeMask ffffffffffffffff C:\Users\Administrator\Downloads\SysinternalsSuite>Coreinfo64.exe Coreinfo v3.6 - Dump information on system CPU and memory topology Copyright (C) 2008-2022 Mark Russinovich Sysinternals - www.sysinternals.com Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz Intel64 Family 6 Model 85 Stepping 7, GenuineIntel Microcode signature: 05003302 HTT * Hyperthreading enabled CET - Supports Control Flow Enforcement Technology Kernel CET - Kernel-mode CET Enabled User CET - User-mode CET Allowed HYPERVISOR * Hypervisor is present VMX - Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization X64 * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT SGX - Supports Intel SGX NX * Supports no-execute page protection SMEP * Supports Supervisor Mode Execution Prevention SMAP * Supports Supervisor Mode Access Prevention PAGE1GB * Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS * Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE * Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX * Supports AVX instruction extensions AVX2 * Supports AVX2 instruction extensions AVX-512-F * Supports AVX-512 Foundation instructions AVX-512-DQ * Supports AVX-512 double and quadword instructions AVX-512-IFAMA - Supports AVX-512 integer Fused multiply-add instructions AVX-512-PF - Supports AVX-512 prefetch instructions AVX-512-ER - Supports AVX-512 exponential and reciprocal instructions AVX-512-CD * Supports AVX-512 conflict detection instructions AVX-512-BW * Supports AVX-512 byte and word instructions AVX-512-VL * Supports AVX-512 vector length instructions FMA * Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE * Supports XSAVE/XRSTOR instructions OSXSAVE * Supports XSETBV/XGETBV instructions RDRAND * Supports RDRAND instruction RDSEED * Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 * Supports CMPXCHG16B instruction BMI1 * Supports bit manipulation extensions 1 BMI2 * Supports bit manipulation extensions 2 ADX * Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C * Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR - Supports optimized FXSAVE/FSRSTOR instruction MONITOR - Supports MONITOR and MWAIT instructions MOVBE * Supports MOVBE instruction ERMSB * Supports Enhanced REP MOVSB/STOSB PCLMULDQ * Supports PCLMULDQ instruction POPCNT * Supports POPCNT instruction LZCNT * Supports LZCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE - Supports Hardware Lock Elision instructions RTM - Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 - Can write history of 64-bit branch addresses DS - Implements memory-resident debug buffer DS-CPL - Supports Debug Store feature with CPL PCID * Supports PCIDs and settable CR4.PCIDE INVPCID * Supports INVPCID instruction PDCM - Supports Performance Capabilities MSR RDTSCP * Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE * Local APIC supports one-shot deadline timer TSC-INVARIANT * TSC runs at constant rate xTPR - Supports disabling task priority messages EIST - Supports Enhanced Intel Speedstep ACPI - Implements MSR for power management TM - Implements thermal monitor circuitry TM2 - Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC * Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE - Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Maximum implemented CPUID leaves: 00000016 (Basic), 80000008 (Extended). Maximum implemented address width: 48 bits (virtual), 45 bits (physical). Processor signature: 00050657 Logical to Physical Processor Map: *------------------------------- Physical Processor 0 -*------------------------------ Physical Processor 1 --*----------------------------- Physical Processor 2 ---*---------------------------- Physical Processor 3 ----*--------------------------- Physical Processor 4 -----*-------------------------- Physical Processor 5 ------*------------------------- Physical Processor 6 -------*------------------------ Physical Processor 7 --------*----------------------- Physical Processor 8 ---------*---------------------- Physical Processor 9 ----------*--------------------- Physical Processor 10 -----------*-------------------- Physical Processor 11 ------------*------------------- Physical Processor 12 -------------*------------------ Physical Processor 13 --------------*----------------- Physical Processor 14 ---------------*---------------- Physical Processor 15 ----------------*--------------- Physical Processor 16 -----------------*-------------- Physical Processor 17 ------------------*------------- Physical Processor 18 -------------------*------------ Physical Processor 19 --------------------*----------- Physical Processor 20 ---------------------*---------- Physical Processor 21 ----------------------*--------- Physical Processor 22 -----------------------*-------- Physical Processor 23 ------------------------*------- Physical Processor 24 -------------------------*------ Physical Processor 25 --------------------------*----- Physical Processor 26 ---------------------------*---- Physical Processor 27 ----------------------------*--- Physical Processor 28 -----------------------------*-- Physical Processor 29 ------------------------------*- Physical Processor 30 -------------------------------* Physical Processor 31 Logical Processor to Socket Map: **------------------------------ Socket 0 --**---------------------------- Socket 1 ----**-------------------------- Socket 2 ------**------------------------ Socket 3 --------**---------------------- Socket 4 ----------**-------------------- Socket 5 ------------**------------------ Socket 6 --------------**---------------- Socket 7 ----------------**-------------- Socket 8 ------------------**------------ Socket 9 --------------------**---------- Socket 10 ----------------------**-------- Socket 11 ------------------------**------ Socket 12 --------------------------**---- Socket 13 ----------------------------**-- Socket 14 ------------------------------** Socket 15 Logical Processor to NUMA Node Map: ******************************** NUMA Node 0 No NUMA nodes. Logical Processor to Cache Map: *------------------------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64 **------------------------------ Unified Cache 1, Level 3, 22 MB, Assoc 11, LineSize 64 -*------------------------------ Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Unified Cache 2, Level 2, 1 MB, Assoc 16, LineSize 64 --*----------------------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Unified Cache 3, Level 2, 1 MB, Assoc 16, LineSize 64 --**---------------------------- Unified Cache 4, Level 3, 22 MB, Assoc 11, LineSize 64 ---*---------------------------- Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Unified Cache 5, Level 2, 1 MB, Assoc 16, LineSize 64 ----*--------------------------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Unified Cache 6, Level 2, 1 MB, Assoc 16, LineSize 64 ----**-------------------------- Unified Cache 7, Level 3, 22 MB, Assoc 11, LineSize 64 -----*-------------------------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Unified Cache 8, Level 2, 1 MB, Assoc 16, LineSize 64 ------*------------------------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Unified Cache 9, Level 2, 1 MB, Assoc 16, LineSize 64 ------**------------------------ Unified Cache 10, Level 3, 22 MB, Assoc 11, LineSize 64 -------*------------------------ Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Unified Cache 11, Level 2, 1 MB, Assoc 16, LineSize 64 --------*----------------------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Instruction Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Unified Cache 12, Level 2, 1 MB, Assoc 16, LineSize 64 --------**---------------------- Unified Cache 13, Level 3, 22 MB, Assoc 11, LineSize 64 ---------*---------------------- Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Instruction Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Unified Cache 14, Level 2, 1 MB, Assoc 16, LineSize 64 ----------*--------------------- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Instruction Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Unified Cache 15, Level 2, 1 MB, Assoc 16, LineSize 64 ----------**-------------------- Unified Cache 16, Level 3, 22 MB, Assoc 11, LineSize 64 -----------*-------------------- Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Instruction Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Unified Cache 17, Level 2, 1 MB, Assoc 16, LineSize 64 ------------*------------------- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Instruction Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Unified Cache 18, Level 2, 1 MB, Assoc 16, LineSize 64 ------------**------------------ Unified Cache 19, Level 3, 22 MB, Assoc 11, LineSize 64 -------------*------------------ Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Instruction Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Unified Cache 20, Level 2, 1 MB, Assoc 16, LineSize 64 --------------*----------------- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Instruction Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Unified Cache 21, Level 2, 1 MB, Assoc 16, LineSize 64 --------------**---------------- Unified Cache 22, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------*---------------- Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Instruction Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Unified Cache 23, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------*--------------- Data Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Instruction Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Unified Cache 24, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------**-------------- Unified Cache 25, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------*-------------- Data Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Instruction Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Unified Cache 26, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------*------------- Data Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Instruction Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Unified Cache 27, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------**------------ Unified Cache 28, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------*------------ Data Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Instruction Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Unified Cache 29, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------*----------- Data Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Instruction Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Unified Cache 30, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------**---------- Unified Cache 31, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------------*---------- Data Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Instruction Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Unified Cache 32, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------*--------- Data Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Instruction Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Unified Cache 33, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------**-------- Unified Cache 34, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------------*-------- Data Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Instruction Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Unified Cache 35, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------*------- Data Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Instruction Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Unified Cache 36, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------**------ Unified Cache 37, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------------*------ Data Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Instruction Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Unified Cache 38, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------*----- Data Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Instruction Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Unified Cache 39, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------**---- Unified Cache 40, Level 3, 22 MB, Assoc 11, LineSize 64 ---------------------------*---- Data Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Instruction Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Unified Cache 41, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------*--- Data Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Instruction Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Unified Cache 42, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------**-- Unified Cache 43, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------------------*-- Data Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Instruction Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Unified Cache 44, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------*- Data Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Instruction Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Unified Cache 45, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------** Unified Cache 46, Level 3, 22 MB, Assoc 11, LineSize 64 -------------------------------* Data Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Instruction Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Unified Cache 47, Level 2, 1 MB, Assoc 16, LineSize 64 Logical Processor to Group Map: ******************************** Group 0 |
Cores per Socket
1보다 큰 수로 설정된 경우 vNUMA Node를 생성
numa.vcpu.preferHT 설정과 coresPerSocket 값을 조합한 테스트
numa.vcpu.preferHT 설정 전 coresPerSocket 값을 16으로 변경 ## Physical Layout과 맞추려는 목적
## Guest OS 내에서 실행한 CoreInfo64.exe 결과를 보면 Socket이 2개만 인식(전체 vCPU 32개 / corePerSocket(16) = 2)
numa.vcpu.preferHT 설정 전 coresPerSocket 값을 16으로 변경
# vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi "DICT.*(displayname.*|numa.*|cores.*|vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*" "/$path/vmware.log"; echo -e; done DICT numvcpus = "32" ### <-- DICT memSize = "196608" DICT sched.cpu.affinity = "all" DICT displayName = "debuggee" DICT cpuid.coresPerSocket = "16" ### <-- DICT numa.autosize.cookie = "320022" DICT numa.autosize.vcpu.maxPerVirtualNode = "32" DICT numa.vcpu.preferHT = "FALSE" ### <-- numaHost: NUMA config: consolidation= 1 preferHT= 0 partitionByMemory = 0 ### <-- numa: coresPerSocket= 16 maxVcpusPerVPD= 16 ### <-- numaHost: 32 VCPUs 2 VPDs 2 PPDs ### <-- numaHost: VCPU 0 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 1 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 2 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 3 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 4 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 5 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 6 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 7 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 8 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 9 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 10 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 11 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 12 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 13 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 14 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 15 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 16 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 17 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 18 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 19 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 20 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 21 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 22 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 23 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 24 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 25 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 26 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 27 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 28 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 29 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 30 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 31 VPD 1 PPD 1 NodeMask ffffffffffffffff C:\Users\Administrator\Downloads\SysinternalsSuite>Coreinfo64.exe Coreinfo v3.6 - Dump information on system CPU and memory topology Copyright (C) 2008-2022 Mark Russinovich Sysinternals - www.sysinternals.com Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz Intel64 Family 6 Model 85 Stepping 7, GenuineIntel Microcode signature: 05003302 HTT * Hyperthreading enabled CET - Supports Control Flow Enforcement Technology Kernel CET - Kernel-mode CET Enabled User CET - User-mode CET Allowed HYPERVISOR * Hypervisor is present VMX - Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization X64 * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT SGX - Supports Intel SGX NX * Supports no-execute page protection SMEP * Supports Supervisor Mode Execution Prevention SMAP * Supports Supervisor Mode Access Prevention PAGE1GB * Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS * Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE * Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX * Supports AVX instruction extensions AVX2 * Supports AVX2 instruction extensions AVX-512-F * Supports AVX-512 Foundation instructions AVX-512-DQ * Supports AVX-512 double and quadword instructions AVX-512-IFAMA - Supports AVX-512 integer Fused multiply-add instructions AVX-512-PF - Supports AVX-512 prefetch instructions AVX-512-ER - Supports AVX-512 exponential and reciprocal instructions AVX-512-CD * Supports AVX-512 conflict detection instructions AVX-512-BW * Supports AVX-512 byte and word instructions AVX-512-VL * Supports AVX-512 vector length instructions FMA * Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE * Supports XSAVE/XRSTOR instructions OSXSAVE * Supports XSETBV/XGETBV instructions RDRAND * Supports RDRAND instruction RDSEED * Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 * Supports CMPXCHG16B instruction BMI1 * Supports bit manipulation extensions 1 BMI2 * Supports bit manipulation extensions 2 ADX * Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C * Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR - Supports optimized FXSAVE/FSRSTOR instruction MONITOR - Supports MONITOR and MWAIT instructions MOVBE * Supports MOVBE instruction ERMSB * Supports Enhanced REP MOVSB/STOSB PCLMULDQ * Supports PCLMULDQ instruction POPCNT * Supports POPCNT instruction LZCNT * Supports LZCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE - Supports Hardware Lock Elision instructions RTM - Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 - Can write history of 64-bit branch addresses DS - Implements memory-resident debug buffer DS-CPL - Supports Debug Store feature with CPL PCID * Supports PCIDs and settable CR4.PCIDE INVPCID * Supports INVPCID instruction PDCM - Supports Performance Capabilities MSR RDTSCP * Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE * Local APIC supports one-shot deadline timer TSC-INVARIANT * TSC runs at constant rate xTPR - Supports disabling task priority messages EIST - Supports Enhanced Intel Speedstep ACPI - Implements MSR for power management TM - Implements thermal monitor circuitry TM2 - Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC * Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE - Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Maximum implemented CPUID leaves: 00000016 (Basic), 80000008 (Extended). Maximum implemented address width: 48 bits (virtual), 45 bits (physical). Processor signature: 00050657 Logical to Physical Processor Map: *------------------------------- Physical Processor 0 -*------------------------------ Physical Processor 1 --*----------------------------- Physical Processor 2 ---*---------------------------- Physical Processor 3 ----*--------------------------- Physical Processor 4 -----*-------------------------- Physical Processor 5 ------*------------------------- Physical Processor 6 -------*------------------------ Physical Processor 7 --------*----------------------- Physical Processor 8 ---------*---------------------- Physical Processor 9 ----------*--------------------- Physical Processor 10 -----------*-------------------- Physical Processor 11 ------------*------------------- Physical Processor 12 -------------*------------------ Physical Processor 13 --------------*----------------- Physical Processor 14 ---------------*---------------- Physical Processor 15 ----------------*--------------- Physical Processor 16 -----------------*-------------- Physical Processor 17 ------------------*------------- Physical Processor 18 -------------------*------------ Physical Processor 19 --------------------*----------- Physical Processor 20 ---------------------*---------- Physical Processor 21 ----------------------*--------- Physical Processor 22 -----------------------*-------- Physical Processor 23 ------------------------*------- Physical Processor 24 -------------------------*------ Physical Processor 25 --------------------------*----- Physical Processor 26 ---------------------------*---- Physical Processor 27 ----------------------------*--- Physical Processor 28 -----------------------------*-- Physical Processor 29 ------------------------------*- Physical Processor 30 -------------------------------* Physical Processor 31 Logical Processor to Socket Map: ****************---------------- Socket 0 ----------------**************** Socket 1 Logical Processor to NUMA Node Map: ****************---------------- NUMA Node 0 ----------------**************** NUMA Node 1 Approximate Cross-NUMA Node Access Cost (relative to fastest): 00 01 00: 1.1 1.0 01: 1.7 2.2 Logical Processor to Cache Map: *------------------------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64 ****************---------------- Unified Cache 1, Level 3, 22 MB, Assoc 11, LineSize 64 -*------------------------------ Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Unified Cache 2, Level 2, 1 MB, Assoc 16, LineSize 64 --*----------------------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Unified Cache 3, Level 2, 1 MB, Assoc 16, LineSize 64 ---*---------------------------- Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Unified Cache 4, Level 2, 1 MB, Assoc 16, LineSize 64 ----*--------------------------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Unified Cache 5, Level 2, 1 MB, Assoc 16, LineSize 64 -----*-------------------------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Unified Cache 6, Level 2, 1 MB, Assoc 16, LineSize 64 ------*------------------------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Unified Cache 7, Level 2, 1 MB, Assoc 16, LineSize 64 -------*------------------------ Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Unified Cache 8, Level 2, 1 MB, Assoc 16, LineSize 64 --------*----------------------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Instruction Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Unified Cache 9, Level 2, 1 MB, Assoc 16, LineSize 64 ---------*---------------------- Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Instruction Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Unified Cache 10, Level 2, 1 MB, Assoc 16, LineSize 64 ----------*--------------------- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Instruction Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Unified Cache 11, Level 2, 1 MB, Assoc 16, LineSize 64 -----------*-------------------- Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Instruction Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Unified Cache 12, Level 2, 1 MB, Assoc 16, LineSize 64 ------------*------------------- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Instruction Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Unified Cache 13, Level 2, 1 MB, Assoc 16, LineSize 64 -------------*------------------ Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Instruction Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Unified Cache 14, Level 2, 1 MB, Assoc 16, LineSize 64 --------------*----------------- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Instruction Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Unified Cache 15, Level 2, 1 MB, Assoc 16, LineSize 64 ---------------*---------------- Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Instruction Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Unified Cache 16, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------*--------------- Data Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Instruction Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Unified Cache 17, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------**************** Unified Cache 18, Level 3, 22 MB, Assoc 11, LineSize 64 -----------------*-------------- Data Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Instruction Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Unified Cache 19, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------*------------- Data Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Instruction Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Unified Cache 20, Level 2, 1 MB, Assoc 16, LineSize 64 -------------------*------------ Data Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Instruction Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Unified Cache 21, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------*----------- Data Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Instruction Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Unified Cache 22, Level 2, 1 MB, Assoc 16, LineSize 64 ---------------------*---------- Data Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Instruction Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Unified Cache 23, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------*--------- Data Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Instruction Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Unified Cache 24, Level 2, 1 MB, Assoc 16, LineSize 64 -----------------------*-------- Data Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Instruction Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Unified Cache 25, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------*------- Data Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Instruction Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Unified Cache 26, Level 2, 1 MB, Assoc 16, LineSize 64 -------------------------*------ Data Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Instruction Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Unified Cache 27, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------*----- Data Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Instruction Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Unified Cache 28, Level 2, 1 MB, Assoc 16, LineSize 64 ---------------------------*---- Data Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Instruction Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Unified Cache 29, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------*--- Data Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Instruction Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Unified Cache 30, Level 2, 1 MB, Assoc 16, LineSize 64 -----------------------------*-- Data Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Instruction Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Unified Cache 31, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------*- Data Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Instruction Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Unified Cache 32, Level 2, 1 MB, Assoc 16, LineSize 64 -------------------------------* Data Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Instruction Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Unified Cache 33, Level 2, 1 MB, Assoc 16, LineSize 64 Logical Processor to Group Map: ******************************** Group 0 |
numa.vcpu.preferHT 설정한 상태에서 coresPerSocket 값을 32로 변경 ## Physical Layout과 맞추려는 목적
## Guest OS 내에서 실행한 CoreInfo64.exe 결과를 보면 Socket이 1개만 인식(전체 vCPU 32개 / corePerSocket(32) = 1)
numa.vcpu.preferHT 설정 후 coresPerSocket 값을 32로 변경
# vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi "DICT.*(displayname.*|numa.*|cores.* |vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*" "/$path/vmware.log"; echo -e; done DICT numvcpus = "32" ### <-- DICT memSize = "196608" DICT sched.cpu.affinity = "all" DICT displayName = "debuggee" DICT cpuid.coresPerSocket = "32" ### <-- DICT numa.autosize.cookie = "320162" DICT numa.autosize.vcpu.maxPerVirtualNode = "16" DICT numa.vcpu.preferHT = "TRUE" ### <-- numaHost: NUMA config: consolidation= 1 preferHT= 1 partitionByMemory = 0 ### <-- numa: coresPerSocket= 32 maxVcpusPerVPD= 32 ### <-- numaHost: 32 VCPUs 1 VPDs 1 PPDs ### <-- numaHost: VCPU 0 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 1 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 2 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 3 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 4 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 5 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 6 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 7 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 8 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 9 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 10 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 11 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 12 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 13 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 14 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 15 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 16 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 17 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 18 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 19 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 20 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 21 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 22 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 23 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 24 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 25 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 26 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 27 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 28 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 29 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 30 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 31 VPD 0 PPD 0 NodeMask ffffffffffffffff C:\Users\Administrator\Downloads\SysinternalsSuite>Coreinfo64.exe Coreinfo v3.6 - Dump information on system CPU and memory topology Copyright (C) 2008-2022 Mark Russinovich Sysinternals - www.sysinternals.com Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz Intel64 Family 6 Model 85 Stepping 7, GenuineIntel Microcode signature: 05003302 HTT * Hyperthreading enabled CET - Supports Control Flow Enforcement Technology Kernel CET - Kernel-mode CET Enabled User CET - User-mode CET Allowed HYPERVISOR * Hypervisor is present VMX - Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization X64 * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT SGX - Supports Intel SGX NX * Supports no-execute page protection SMEP * Supports Supervisor Mode Execution Prevention SMAP * Supports Supervisor Mode Access Prevention PAGE1GB * Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS * Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE * Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX * Supports AVX instruction extensions AVX2 * Supports AVX2 instruction extensions AVX-512-F * Supports AVX-512 Foundation instructions AVX-512-DQ * Supports AVX-512 double and quadword instructions AVX-512-IFAMA - Supports AVX-512 integer Fused multiply-add instructions AVX-512-PF - Supports AVX-512 prefetch instructions AVX-512-ER - Supports AVX-512 exponential and reciprocal instructions AVX-512-CD * Supports AVX-512 conflict detection instructions AVX-512-BW * Supports AVX-512 byte and word instructions AVX-512-VL * Supports AVX-512 vector length instructions FMA * Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE * Supports XSAVE/XRSTOR instructions OSXSAVE * Supports XSETBV/XGETBV instructions RDRAND * Supports RDRAND instruction RDSEED * Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 * Supports CMPXCHG16B instruction BMI1 * Supports bit manipulation extensions 1 BMI2 * Supports bit manipulation extensions 2 ADX * Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C * Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR - Supports optimized FXSAVE/FSRSTOR instruction MONITOR - Supports MONITOR and MWAIT instructions MOVBE * Supports MOVBE instruction ERMSB * Supports Enhanced REP MOVSB/STOSB PCLMULDQ * Supports PCLMULDQ instruction POPCNT * Supports POPCNT instruction LZCNT * Supports LZCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE - Supports Hardware Lock Elision instructions RTM - Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 - Can write history of 64-bit branch addresses DS - Implements memory-resident debug buffer DS-CPL - Supports Debug Store feature with CPL PCID * Supports PCIDs and settable CR4.PCIDE INVPCID * Supports INVPCID instruction PDCM - Supports Performance Capabilities MSR RDTSCP * Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE * Local APIC supports one-shot deadline timer TSC-INVARIANT * TSC runs at constant rate xTPR - Supports disabling task priority messages EIST - Supports Enhanced Intel Speedstep ACPI - Implements MSR for power management TM - Implements thermal monitor circuitry TM2 - Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC * Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE - Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Maximum implemented CPUID leaves: 00000016 (Basic), 80000008 (Extended). Maximum implemented address width: 48 bits (virtual), 45 bits (physical). Processor signature: 00050657 Logical to Physical Processor Map: *------------------------------- Physical Processor 0 -*------------------------------ Physical Processor 1 --*----------------------------- Physical Processor 2 ---*---------------------------- Physical Processor 3 ----*--------------------------- Physical Processor 4 -----*-------------------------- Physical Processor 5 ------*------------------------- Physical Processor 6 -------*------------------------ Physical Processor 7 --------*----------------------- Physical Processor 8 ---------*---------------------- Physical Processor 9 ----------*--------------------- Physical Processor 10 -----------*-------------------- Physical Processor 11 ------------*------------------- Physical Processor 12 -------------*------------------ Physical Processor 13 --------------*----------------- Physical Processor 14 ---------------*---------------- Physical Processor 15 ----------------*--------------- Physical Processor 16 -----------------*-------------- Physical Processor 17 ------------------*------------- Physical Processor 18 -------------------*------------ Physical Processor 19 --------------------*----------- Physical Processor 20 ---------------------*---------- Physical Processor 21 ----------------------*--------- Physical Processor 22 -----------------------*-------- Physical Processor 23 ------------------------*------- Physical Processor 24 -------------------------*------ Physical Processor 25 --------------------------*----- Physical Processor 26 ---------------------------*---- Physical Processor 27 ----------------------------*--- Physical Processor 28 -----------------------------*-- Physical Processor 29 ------------------------------*- Physical Processor 30 -------------------------------* Physical Processor 31 Logical Processor to Socket Map: ******************************** Socket 0 Logical Processor to NUMA Node Map: ******************************** NUMA Node 0 No NUMA nodes. Logical Processor to Cache Map: *------------------------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 *------------------------------- Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64 ******************************** Unified Cache 1, Level 3, 22 MB, Assoc 11, LineSize 64 -*------------------------------ Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 -*------------------------------ Unified Cache 2, Level 2, 1 MB, Assoc 16, LineSize 64 --*----------------------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --*----------------------------- Unified Cache 3, Level 2, 1 MB, Assoc 16, LineSize 64 ---*---------------------------- Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ---*---------------------------- Unified Cache 4, Level 2, 1 MB, Assoc 16, LineSize 64 ----*--------------------------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ----*--------------------------- Unified Cache 5, Level 2, 1 MB, Assoc 16, LineSize 64 -----*-------------------------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 -----*-------------------------- Unified Cache 6, Level 2, 1 MB, Assoc 16, LineSize 64 ------*------------------------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ------*------------------------- Unified Cache 7, Level 2, 1 MB, Assoc 16, LineSize 64 -------*------------------------ Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 -------*------------------------ Unified Cache 8, Level 2, 1 MB, Assoc 16, LineSize 64 --------*----------------------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Instruction Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 --------*----------------------- Unified Cache 9, Level 2, 1 MB, Assoc 16, LineSize 64 ---------*---------------------- Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Instruction Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*---------------------- Unified Cache 10, Level 2, 1 MB, Assoc 16, LineSize 64 ----------*--------------------- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Instruction Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ----------*--------------------- Unified Cache 11, Level 2, 1 MB, Assoc 16, LineSize 64 -----------*-------------------- Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Instruction Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 -----------*-------------------- Unified Cache 12, Level 2, 1 MB, Assoc 16, LineSize 64 ------------*------------------- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Instruction Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------*------------------- Unified Cache 13, Level 2, 1 MB, Assoc 16, LineSize 64 -------------*------------------ Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Instruction Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 -------------*------------------ Unified Cache 14, Level 2, 1 MB, Assoc 16, LineSize 64 --------------*----------------- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Instruction Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*----------------- Unified Cache 15, Level 2, 1 MB, Assoc 16, LineSize 64 ---------------*---------------- Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Instruction Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*---------------- Unified Cache 16, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------*--------------- Data Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Instruction Cache 16, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------*--------------- Unified Cache 17, Level 2, 1 MB, Assoc 16, LineSize 64 -----------------*-------------- Data Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Instruction Cache 17, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------*-------------- Unified Cache 18, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------*------------- Data Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Instruction Cache 18, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------*------------- Unified Cache 19, Level 2, 1 MB, Assoc 16, LineSize 64 -------------------*------------ Data Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Instruction Cache 19, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------*------------ Unified Cache 20, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------*----------- Data Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Instruction Cache 20, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*----------- Unified Cache 21, Level 2, 1 MB, Assoc 16, LineSize 64 ---------------------*---------- Data Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Instruction Cache 21, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*---------- Unified Cache 22, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------*--------- Data Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Instruction Cache 22, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------*--------- Unified Cache 23, Level 2, 1 MB, Assoc 16, LineSize 64 -----------------------*-------- Data Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Instruction Cache 23, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------*-------- Unified Cache 24, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------*------- Data Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Instruction Cache 24, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------*------- Unified Cache 25, Level 2, 1 MB, Assoc 16, LineSize 64 -------------------------*------ Data Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Instruction Cache 25, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------*------ Unified Cache 26, Level 2, 1 MB, Assoc 16, LineSize 64 --------------------------*----- Data Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Instruction Cache 26, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------------*----- Unified Cache 27, Level 2, 1 MB, Assoc 16, LineSize 64 ---------------------------*---- Data Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Instruction Cache 27, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------------*---- Unified Cache 28, Level 2, 1 MB, Assoc 16, LineSize 64 ----------------------------*--- Data Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Instruction Cache 28, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------------*--- Unified Cache 29, Level 2, 1 MB, Assoc 16, LineSize 64 -----------------------------*-- Data Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Instruction Cache 29, Level 1, 32 KB, Assoc 8, LineSize 64 -----------------------------*-- Unified Cache 30, Level 2, 1 MB, Assoc 16, LineSize 64 ------------------------------*- Data Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Instruction Cache 30, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------------------*- Unified Cache 31, Level 2, 1 MB, Assoc 16, LineSize 64 -------------------------------* Data Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Instruction Cache 31, Level 1, 32 KB, Assoc 8, LineSize 64 -------------------------------* Unified Cache 32, Level 2, 1 MB, Assoc 16, LineSize 64 Logical Processor to Group Map: ******************************** Group 0 |
Virtual Proximity Domains와 Physical Proximity Domains
NUMA Client가 Virtual NUMA Topology
VPD는 VM에 표현되는 구조체이고, PPD는 placement(Initial placement와 load balancing)를 위해 NUMA가 사용하는 구조체
Physical Proximity Domains
PPD는 단일 CPU package 당 vCPU에 크기를 맞춤, 따라서 PPD는 단일 CPU package 보다 커질 수 없음
PPD는 단일 CPU package의 모든 core에 vCPU를 mapping 해주는 역할(affinity)
NUMA Scheduler가 이 PPD를 이용하여 vCPU가 올바르게 Grouping되어 있는지 확인
Virtual Proximity Domains
VPD가 VM에 표현되는 Virtual NUMA Topology
기본적으로 VPD는 PPD에 Align 되며, VPD는 여러 PPD에 걸쳐있을 수 있음
예를 들어, vCPU를 40개 할당하고, coresPerSocket 값을 20으로 설정하는 경우, Virtual NUMA Topology는 다음과 같이 생성
즉, 두 개의 VPD가 생기고, 하나의 VPD가 두 개의 PPD에 걸쳐 있는 상황
## 이런 상황은 피해야 함, CPU와 Cache 최적화 때문에
## 따라서, coresPerSocket은 단일 CPU package의 physical boudnary에 Align하는 것을 권고
ESXi 6.5부터는 NUMA Scheduler에서 coresPerSocket 설정과 VPD 생성을 분리 ## local memory의 fragementation을 피하기 위해서
ESXi 6.0에서는 예를 들어, vCPU가 16개이고 coresPerSocket=2인 경우 8개의 PPD와 8개의 VPD가 생성
그래서 ESXi 6.5부터는 VPD의 크기가 단일 CPU package의 core 수에 따라 변경
그래서 가급적 Virtual NUMA Topology를 Physical NUMA Topology에 맞춤
ESXi 6.0 예제와 동일하게 vCPU가 16개이고 coresPerSocket=2인 경우 2개의 PPD과 2개의 VPD 생성
ESXi 6.5에서 새로 추가된 부분이 "NUMA config: consolidation=1"
이 옵션은 vCPU가 최대한 적은 수의 Proximity Domain에 통합되도록 하는 기능
위 그림을 보면, coresPerSocket에 따라 여러 VSOCKET이 생성되었지만 VPD는 단 2개만 생성 ## 단일 VPD가 Single Memory Address Space를 표현해주는 점이 중요
Guest OS NUMA Optimization
요즘 Application과 OS는 NUMA Node와 Cache 기반한 Memory Access를 관리
하지만, 대부분의 Application들이 NUMA Node간 Workload Balance를 완벽하게 하지는 못함
많은 Application들이 Data를 생성할 때는 단일 Thread를 사용하지만, Data를 접근할 때는 여러 Thread를 활용
따라서, Data에 접근하는 Thread들이 여러 Socket 분산
중요한 점 중의 하나가 Cache Address Space가 Virtual Socket에 의해 생성된다는 점
만약 다음 그림과 같이 coresPerSocket을 8로 설정하게 되면, CPU package의 memory와 cache address space를 맞출 수 있음
이렇게 되면 단일 Virtual Socket에 있는 vCPU들은 LLC(Last Level Cache)를 공유할 수 있음
Advanced Setting numa.consolidate
VM Memory 설정이 단일 NUMA Node의 크기를 초과하는 경우(System의 전체 Memory아 128GB인데, VM에 96GB가 할당된 경우)
vCPU가 8개 이기 때문에, 단일 NUMA Node에만 CPU를 사용하고 Memory의 경우에는 32GB는 다른 CPU의 Memory Controller를 통해 Access
즉, vCPU 관점에서는 단일 NUMA Node 환경
이런 경우, numa.consolidate=FALSE 옵션을 사용하면, vCPU가 최대한 많은 NUMA Node로 분산
따라서 다음과 같이 2개의 PPD가 생성되며 각 PPD마다 4개의 vCPU가 할당
하지만 이런 경우에도 Application이 단일 Thread로 돌아가는 경우라면 여전히 Remote Memory 사용이 많을 수 있음
또한, 이런 경우 vCPU가 8개이므로 Virtual NUMA Topology가 VM에 표현되지 않음 ## VPD가 1개