Thin Provisioning은 어떤 역할을 하며, Thin Provisioning이 제공하는 기능 중 Space Reclamation과 관련한 예제를 살펴보겠습니다.
Thick Provisioning vs. Thin Provisioning
Thick Provisioning
Thick Provisioning, aka 'Fixed Provisioning, is a method of providing the storage layers above the layer that is provisioning with less resources than are reported.
The above diagram shows an example of Thick Provisioning.
There are three LUNs exposed to the operating system.
The sectors of each LUN are backed with actual sectors on the SAN level.
Thin Provisioning
The white areas represent sectors have not been written to yet and are still blank.
Since we do not use those sectors for anything, they are NOT backed by physical sectors on the SAN.
Only sectors that are in use are backed by physical sectors.
Purpose of Thin Provisioning
To provide a provisioning model that could handle uninterrupted storage service during the storage space expansion or reduction.
Provide a robust communication and interoperability model in between hosts and thinly provisioned SANs. Deliver threshold and resource exhaustion notification to server and storage administrators.
Enhance the usage efficiency of storage space - support space reclamation and resource re-allocation.
Thin Provisioning States
When Thin Provisioning is in use, there are three states for each sector (LBA).
Mapped- logical block provisioning state of an LBA in which physical capacity has been assigned to the referenced logical block.
Anchored- logical block provisioning state of an LBA in which physical capacity has been reserved for the referenced logical block.
Deallocated- logical block provisioning state of an LBA in which physical capacity has not been reserved for the referenced logical block.
Thin Provisioning LUN Identification
OS will identify the provisioning type and UNMAP/TRIM capability.
The storage device shall report its provisioning type and UNMAP/TRIM capability according to SBC3(SCSI Block Commands) spec.
Space Reclamation Operation in the Storage Stack
Space reclamation can be triggered by file deletion, a file system level trim, or a storage optimization operation.
File system level trim is enabled for a storage device designed to perform “read return zero” after a trim or an unmap operation.
When a large file is deleted from the file system or a file system level trim is triggered, OS converts file delete or trim notifications into a corresponding UNMAP request.
The storage stack translates the UNMAP request into an SCSI UNMAP command.
[LAB]
## Guest OS(Linux)
1. SCSI Log 설정
[root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@localhost ~]# sysctl -a |grep dev.scsi.logging_level dev.scsi.logging_level = 0 sysctl: reading key "net.ipv6.conf.all.stable_secret" sysctl: reading key "net.ipv6.conf.default.stable_secret" sysctl: reading key "net.ipv6.conf.ens192.stable_secret" sysctl: reading key "net.ipv6.conf.ens224.stable_secret" sysctl: reading key "net.ipv6.conf.lo.stable_secret" ## Enable scsi events to syslog https://www.cyberciti.biz/faq/linux-log-all-scsi-events-to-syslog/ [root@localhost ~]# echo -1 > /proc/sys/dev/scsi/logging_level [root@localhost ~]# sysctl -a |grep dev.scsi.logging_level dev.scsi.logging_level = -1 sysctl: reading key "net.ipv6.conf.all.stable_secret" sysctl: reading key "net.ipv6.conf.default.stable_secret" sysctl: reading key "net.ipv6.conf.ens192.stable_secret" sysctl: reading key "net.ipv6.conf.ens224.stable_secret" sysctl: reading key "net.ipv6.conf.lo.stable_secret" |
2. Linux OS의 SCSI Layer에서 unmap command를 생성할 수 있도록 하기 위해 /etc/fstab에 특정 filesystem 라인에 discard 항목 추가
Chapter 31. Discarding unused blocks
[root@localhost ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 8.9M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/centos-root 98G 23G 76G 23% / /dev/sdb1 50G 53M 47G 1% /unmap /dev/sda1 1014M 153M 862M 15% /boot tmpfs 3.2G 0 3.2G 0% /run/user/0 [root@localhost ~]# blkid | grep sdb1 /dev/sdb1: UUID="0d4ff069-7dd2-4fe1-8ca6-7cd5d7e77fb7" TYPE="ext4" PARTLABEL="primary" PARTUUID="565d87a5-e745-4300-8246-9e0f0ffbe388" [root@localhost ~]# cat /etc/fstab # # /etc/fstab # Created by anaconda on Mon May 8 20:17:33 2023 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/centos-root / xfs defaults 0 0 UUID=c9fe8371-2d95-4517-a8be-8441a1a336b3 /boot xfs defaults 0 0 /dev/mapper/centos-swap swap swap defaults 0 0 UUID=0d4ff069-7dd2-4fe1-8ca6-7cd5d7e77fb7 /unmap ext4 defaults,discard 0 1 ### <-- !! |
3. Disk Device의 Thin Provisioning 관련 정보 조회
page 0xb2에는 LBP(Logical Block Provisioning) 관련 VPD page
page 0xb0에는 BL(Block Limits) 관련 VPD page
## Vital Product Data (VPD) is a collection of configuration and informational data associated with a particular set of hardware or software.
## The vital product data may include vendor identification, product identification, unit serial numbers, device operating definitions, manufacturing data, field replaceable unit information, and other vendor specific information.
[root@localhost ~]# sg_vpd --page=0xb2 /dev/sdb Logical block provisioning VPD page (SBC): Unmap command supported (LBPU): 1 Write same (16) with unmap bit supported (LBWS): 1 Write same (10) with unmap bit supported (LBWS10): 1 Logical block provisioning read zeros (LBPRZ): 1 Anchored LBAs supported (ANC_SUP): 0 Threshold exponent: 1 Descriptor present (DP): 0 Provisioning type: 2 [root@localhost ~]# sudo sg_vpd --page=lbpv /dev/sdb Logical block provisioning VPD page (SBC): Unmap command supported (LBPU): 1 Write same (16) with unmap bit supported (LBWS): 1 Write same (10) with unmap bit supported (LBWS10): 1 Logical block provisioning read zeros (LBPRZ): 1 Anchored LBAs supported (ANC_SUP): 0 Threshold exponent: 1 Descriptor present (DP): 0 Provisioning type: 2 [root@localhost ~]# sg_vpd --page=0xb0 /dev/sdb Block limits VPD page (SBC): Write same no zero (WSNZ): 1 Maximum compare and write length: 0 blocks Optimal transfer length granularity: 0 blocks Maximum transfer length: 0 blocks Optimal transfer length: 0 blocks Maximum prefetch length: 0 blocks Maximum unmap LBA count: 4194304 Maximum unmap block descriptor count: 64 Optimal unmap granularity: 8 Unmap granularity alignment valid: 1 Unmap granularity alignment: 0 Maximum write same length: 0x2000 blocks [root@localhost ~]# sudo sg_vpd --page=bl /dev/sdb Block limits VPD page (SBC): Write same no zero (WSNZ): 1 Maximum compare and write length: 0 blocks Optimal transfer length granularity: 0 blocks Maximum transfer length: 0 blocks Optimal transfer length: 0 blocks Maximum prefetch length: 0 blocks Maximum unmap LBA count: 4194304 Maximum unmap block descriptor count: 64 Optimal unmap granularity: 8 Unmap granularity alignment valid: 1 Unmap granularity alignment: 0 Maximum write same length: 0x2000 blocks [root@localhost ~]# for x in /sys/class/scsi_disk/* ; do echo $x; cat $x/provisioning_mode; done /sys/class/scsi_disk/0:0:0:0 writesame_16 /sys/class/scsi_disk/0:0:1:0 writesame_16 |
- If LBP VPD is not supported:
- Use UNMAP command if BL VPD says MAXIMUM UNMAP BLOCK DESCRIPTOR COUNT bigger than 1.
- Use Write same 16 command otherwise.
- If LBP VPD is supported:
- Use UNMAP 0x42 command if LBP VPD says LBUP is 1.
- Use WRITE SAME (16) 0x93 command if LBP VPD says LBPWS is 1.
- Use WRITE SAME (10) 0x41 command if LBP VPD says LBPWS10 is 1.
- Disable unmap support otherwise.
## Guest OS(Windows)
1. DisableDeleteNotify 값 확인
With Windows Server 2012 or later, UNMAP commands are issued under the following conditions:
When you delete files from a file system, Windows automatically issues reclaim commands for the area of the file system you freed.
When you format a volume residing on a thin-provisioned drive with the quick option, Windows reclaims the entire volume.
When a regularly scheduled operation selects the Optimize option for a volume or when you manually select this option, either from the Optimize Drives console or when you use the optimize-volume PowerShell command with the Retrim option, Windows issues an UNMAP command.
C:\>fsutil behavior query DisableDeleteNotify NTFS DisableDeleteNotify = 0 (Disabled) ReFS DisableDeleteNotify = 0 (Disabled) |
2. Disk Device의 Thin Provisioning 관련 정보 조회(Linux와 동일)
C:\Users\Administrator\Downloads\sg3_utils-1.37exe>sg_vpd.exe --page=0xb2 d: Logical block provisioning VPD page (SBC): Unmap command supported (LBPU): 1 Write same (16) with unmap bit supported (LBWS): 1 Write same (10) with unmap bit supported (LBWS10): 1 Logical block provisioning read zeros (LBPRZ): 1 Anchored LBAs supported (ANC_SUP): 0 Threshold exponent: 1 Descriptor present (DP): 0 Provisioning type: 2 C:\Users\Administrator\Downloads\sg3_utils-1.37exe>sg_vpd.exe --page=0xb0 d: Block limits VPD page (SBC): Write same no zero (WSNZ): 1 Maximum compare and write length: 0 blocks Optimal transfer length granularity: 0 blocks Maximum transfer length: 0 blocks Optimal transfer length: 0 blocks Maximum prefetch length: 0 blocks Maximum unmap LBA count: 65536 Maximum unmap block descriptor count: 64 Optimal unmap granularity: 2048 Unmap granularity alignment valid: 1 Unmap granularity alignment: 0 Maximum write same length: 0x2000 blocks |
## Hypervisor(ESXi)
1. Thin Provisioning 지원 여부 확인
# esxcli storage core device list -d naa.60060160c0304e00165e6264c9b82c18 | grep Thin Thin Provisioning Status: yes |
## Guest OS 내부에서 파일을 제거하고 UNMAP 작업을 시도하면, Hypervisor에서는 정해진 Provisioning Mode에 따라서 처리
1. Guest OS(Windows)에서 Filesystem TRIM 작업 호출
PS C:\Users\Administrator> Optimize-Volume -DriveLetter D -ReTrim -Verbose VERBOSE: Invoking retrim on New Volume (D:)... VERBOSE: Performing pass 1: VERBOSE: Retrim: 0% complete... VERBOSE: Retrim: 6% complete... VERBOSE: Retrim: 43% complete... VERBOSE: Retrim: 90% complete... VERBOSE: Retrim: 100% complete. VERBOSE: Post Defragmentation Report: VERBOSE: Volume Information: VERBOSE: Volume size = 29.98 GB VERBOSE: Cluster size = 4 KB VERBOSE: Used space = 70.61 MB VERBOSE: Free space = 29.91 GB VERBOSE: Retrim: VERBOSE: Backed allocations = 29 VERBOSE: Allocations trimmed = 27 VERBOSE: Total space trimmed = 26.95 GB |
2. 해당 시점에 Hypervisor의 iSCSI Packet 수집
[테스트 예제]
## From Filesystem to underlying storage
1. From Filesystem to Disk Driver
Combine with FFFFE001F7658180 , NtfsDeallocateClusters and NtfsMarkUnusedContextPreTrimWorkItemProcessing
[11] 0004.1100::07/30/18-09:42:54.8300433 [ntfs] bitmpsup_c2379 NtfsDeallocateClusters() - NtfsDeallocateClusters: Vcb FFFFE001F7658180 - deleting FR 10000002a7e9f from clusters 53 to 7fffffffffffffff
[11] 0004.1100::07/30/18-09:42:54.8300446 [ntfs] bitmpsup_c2453 NtfsDeallocateClusters() - NtfsDeallocateClusters: Vcb FFFFE001F7658180 - deleting FR 10000002a7e9f starting at 631138c for 3d clusters
[11] 0004.1100::07/30/18-09:42:54.8300459 [ntfs] bitmpsup_c2505 NtfsDeallocateClusters() - NtfsDeallocateClusters: Vcb FFFFE001F7658180 - updating DeallocatedClustersCount from 0 to 3d
[0] 0004.0574::07/30/18-09:43:03.5848104 [ntfs] delnotify_c1414 NtfsMarkUnusedContextPreTrimProcessing() - NtfsMarkUnusedContextPreTrimProcessing: Vcb FFFFE001F7658180 - Kicked off DelayedWorkQueue
[4] 0004.0EDC::07/30/18-09:43:03.5848220 [ntfs] delnotify_c1654 NtfsMarkUnusedContextPreTrimWorkItemProcessing() - NtfsMarkUnusedContextPreTrimWorkItemProcessing: Vcb FFFFE001F7658180 - Sending storage ioctl down. MUC FFFFE001FD68EF20
[6] 0004.0EDC::07/30/18-09:43:03.6638743 [ntfs] delnotify_c1807 NtfsMarkUnusedContextPreTrimWorkItemProcessing() - NtfsMarkUnusedContextPreTrimWorkItemProcessing: Vcb FFFFE001F7658180 - Add MUC FFFFE001FD68EF20 to post trim list
[6] 0004.0EDC::07/30/18-09:43:03.6638632 [classpnp] utils_c3460 DeviceProcessDsmTrimRequest() - DeviceProcessDsmTrimRequest (FFFFE001F5FA8060): UNMAP command issued. Returned NTSTATUS: STATUS_SUCCESS.
[6] 0004.0EDC::07/30/18-09:43:03.6638845 [ntfs] delnotify_c1160 NtfsMarkUnusedContextPostTrimProcessing() - NtfsMarkUnusedContextPostTrimProcessing: Vcb FFFFE001F7658180 - Releasing bitmap
NtfsDeallocateClusters -> increase DeallocatedClustersCount -> trigger trim -> unmap sent -> DeallocatedClustersCount decreased to 0 .
2. From iSCSI Initiator to iSCSI Target
[참고 자료]
Thin-provisioned disks with QEMU and KVM
https://wiki.qemu.org/images/4/45/Devconf14-bonzini-thin-provisioning.pdf
Information technology - SCSI Block Commands – 4 (SBC-4)
https://standards.incits.org/apps/group_public/download.php/124286/livelink
Logical block provisioning - SBC
https://gist.github.com/cathay4t/e80e02a737242a5f3824606543631bfe
The road for thin-provisioning
https://www.linux-kvm.org/images/7/77/2012-forum-thin-provisioning.pdf
SCSI Commands Reference Manual
https://www.seagate.com/files/staticfiles/support/docs/manual/Interface%20manuals/100293068j.pdf
File Systems and Thin Provisioning
https://manpages.ubuntu.com/manpages/focal/en/man8/sg_write_same.8.html
https://linux.die.net/man/8/sg_write_same
'Storage' 카테고리의 다른 글
Corrupted journal resource cluster metadata (0) | 2023.07.26 |
---|---|
How to use Hexdump (0) | 2023.07.26 |
How APD(All-Path Down) Works (4) | 2023.06.08 |
vSAN Objects(vDisk, Home Namespace and etc) (0) | 2023.06.01 |
ATS(Atomic Test & Set) (0) | 2023.05.15 |