본문 바로가기

Storage

How Thin Provisioning Work - Space Reclamation

 

Thin Provisioning은 어떤 역할을 하며, Thin Provisioning이 제공하는 기능 중 Space Reclamation과 관련한 예제를 살펴보겠습니다.

 

Thick Provisioning vs. Thin Provisioning

Thick Provisioning

Thick Provisioning, aka 'Fixed Provisioning, is a method of providing the storage layers above the layer that is provisioning with less resources than are reported.

 

 

The above diagram shows an example of Thick Provisioning.

There are three LUNs exposed to the operating system.  

The sectors of each LUN are backed with actual sectors on the SAN level.

 

Thin Provisioning

The white areas represent sectors have not been written to yet and are still blank.  

Since we do not use those sectors for anything, they are NOT backed by physical sectors on the SAN.  

Only sectors that are in use are backed by physical sectors.

 

 

Purpose of Thin Provisioning

To provide a provisioning model that could handle uninterrupted storage service during the storage space expansion or reduction.

Provide a robust communication and interoperability model in between hosts and thinly provisioned SANs.  Deliver threshold and resource exhaustion notification to server and storage administrators.

Enhance the usage efficiency of storage space - support space reclamation and resource re-allocation.

 

Thin Provisioning States

When Thin Provisioning is in use, there are three states for each sector (LBA).

Mapped- logical block provisioning state of an LBA in which physical capacity has been assigned to the referenced logical block.

Anchored- logical block provisioning state of an LBA in which physical capacity has been reserved for the referenced logical block.

Deallocated- logical block provisioning state of an LBA in which physical capacity has not been reserved for the referenced logical block.

 

Thin Provisioning LUN Identification 

OS will identify the provisioning type and UNMAP/TRIM capability.  

The storage device shall report its provisioning type and UNMAP/TRIM capability according to SBC3(SCSI Block Commands) spec.

 

Space Reclamation Operation in the Storage Stack

Space reclamation can be triggered by file deletion, a file system level trim, or a storage optimization operation.  

File system level trim is enabled for a storage device designed to perform “read return zero” after a trim or an unmap operation.  

When a large file is deleted from the file system or a file system level trim is triggered, OS converts file delete or trim notifications into a corresponding UNMAP request

The storage stack translates the UNMAP request into an SCSI UNMAP command.

 

[LAB]

## Guest OS(Linux)

1. SCSI Log 설정

[root@localhost ~]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
 
[root@localhost ~]# sysctl -a |grep dev.scsi.logging_level
dev.scsi.logging_level = 0
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.ens192.stable_secret"
sysctl: reading key "net.ipv6.conf.ens224.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
 
## Enable scsi events to syslog
https://www.cyberciti.biz/faq/linux-log-all-scsi-events-to-syslog/
[root@localhost ~]# echo -1 > /proc/sys/dev/scsi/logging_level
 
[root@localhost ~]# sysctl -a |grep dev.scsi.logging_level
dev.scsi.logging_level = -1
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.ens192.stable_secret"
sysctl: reading key "net.ipv6.conf.ens224.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"

 

2. Linux OS의 SCSI Layer에서 unmap command를 생성할 수 있도록 하기 위해 /etc/fstab에 특정 filesystem 라인에 discard 항목 추가

Chapter 31. Discarding unused blocks

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/discarding-unused-blocks_managing-file-systems

[root@localhost ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                  16G     0   16G   0% /dev
tmpfs                     16G     0   16G   0% /dev/shm
tmpfs                     16G  8.9M   16G   1% /run
tmpfs                     16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/centos-root   98G   23G   76G  23% /
/dev/sdb1                 50G   53M   47G   1% /unmap
/dev/sda1               1014M  153M  862M  15% /boot
tmpfs                    3.2G     0  3.2G   0% /run/user/0
 
[root@localhost ~]# blkid | grep sdb1
/dev/sdb1: UUID="0d4ff069-7dd2-4fe1-8ca6-7cd5d7e77fb7" TYPE="ext4" PARTLABEL="primary" PARTUUID="565d87a5-e745-4300-8246-9e0f0ffbe388"
 
[root@localhost ~]# cat /etc/fstab
 
#
# /etc/fstab
# Created by anaconda on Mon May  8 20:17:33 2023
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=c9fe8371-2d95-4517-a8be-8441a1a336b3 /boot                   xfs     defaults        0 0
/dev/mapper/centos-swap swap                    swap    defaults        0 0
UUID=0d4ff069-7dd2-4fe1-8ca6-7cd5d7e77fb7       /unmap  ext4    defaults,discard        0 1 ### <-- !!

 

3. Disk Device의 Thin Provisioning 관련 정보 조회

page 0xb2에는 LBP(Logical Block Provisioning) 관련 VPD page

page 0xb0에는 BL(Block Limits) 관련 VPD page

## Vital Product Data (VPD) is a collection of configuration and informational data associated with a particular set of hardware or software.

## The vital product data may include vendor identification, product identification, unit serial numbers, device operating definitions, manufacturing data, field replaceable unit information, and other vendor specific information.

[root@localhost ~]# sg_vpd --page=0xb2 /dev/sdb
Logical block provisioning VPD page (SBC):
  Unmap command supported (LBPU): 1
  Write same (16) with unmap bit supported (LBWS): 1
  Write same (10) with unmap bit supported (LBWS10): 1
  Logical block provisioning read zeros (LBPRZ): 1
  Anchored LBAs supported (ANC_SUP): 0
  Threshold exponent: 1
  Descriptor present (DP): 0
  Provisioning type: 2
 
[root@localhost ~]# sudo sg_vpd --page=lbpv /dev/sdb
Logical block provisioning VPD page (SBC):
  Unmap command supported (LBPU): 1
  Write same (16) with unmap bit supported (LBWS): 1
  Write same (10) with unmap bit supported (LBWS10): 1
  Logical block provisioning read zeros (LBPRZ): 1
  Anchored LBAs supported (ANC_SUP): 0
  Threshold exponent: 1
  Descriptor present (DP): 0
  Provisioning type: 2
 
[root@localhost ~]# sg_vpd --page=0xb0 /dev/sdb
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 1
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 4194304
  Maximum unmap block descriptor count: 64
  Optimal unmap granularity: 8
  Unmap granularity alignment valid: 1
  Unmap granularity alignment: 0
  Maximum write same length: 0x2000 blocks
 
[root@localhost ~]# sudo sg_vpd --page=bl /dev/sdb
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 1
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 4194304
  Maximum unmap block descriptor count: 64
  Optimal unmap granularity: 8
  Unmap granularity alignment valid: 1
  Unmap granularity alignment: 0
  Maximum write same length: 0x2000 blocks
 
[root@localhost ~]# for x in /sys/class/scsi_disk/* ; do  echo $x; cat $x/provisioning_mode; done
/sys/class/scsi_disk/0:0:0:0
writesame_16
/sys/class/scsi_disk/0:0:1:0
writesame_16

 

  • If LBP VPD is not supported:
    • Use UNMAP command if BL VPD says MAXIMUM UNMAP BLOCK DESCRIPTOR COUNT bigger than 1.
    • Use Write same 16 command otherwise.
  • If LBP VPD is supported:
    • Use UNMAP 0x42 command if LBP VPD says LBUP is 1.
    • Use WRITE SAME (16) 0x93 command if LBP VPD says LBPWS is 1.
    • Use WRITE SAME (10) 0x41 command if LBP VPD says LBPWS10 is 1.
    • Disable unmap support otherwise.

 

## Guest OS(Windows)

1. DisableDeleteNotify 값 확인

With Windows Server 2012 or later, UNMAP commands are issued under the following conditions:

When you delete files from a file system, Windows automatically issues reclaim commands for the area of the file system you freed.

When you format a volume residing on a thin-provisioned drive with the quick option, Windows reclaims the entire volume.

When a regularly scheduled operation selects the Optimize option for a volume or when you manually select this option, either from the Optimize Drives console or when you use the optimize-volume PowerShell command with the Retrim option, Windows issues an UNMAP command.

C:\>fsutil behavior query DisableDeleteNotify
NTFS DisableDeleteNotify = 0  (Disabled)
ReFS DisableDeleteNotify = 0  (Disabled)

 

2. Disk Device의 Thin Provisioning 관련 정보 조회(Linux와 동일)

C:\Users\Administrator\Downloads\sg3_utils-1.37exe>sg_vpd.exe --page=0xb2 d:
Logical block provisioning VPD page (SBC):
  Unmap command supported (LBPU): 1
  Write same (16) with unmap bit supported (LBWS): 1
  Write same (10) with unmap bit supported (LBWS10): 1
  Logical block provisioning read zeros (LBPRZ): 1
  Anchored LBAs supported (ANC_SUP): 0
  Threshold exponent: 1
  Descriptor present (DP): 0
  Provisioning type: 2
 
C:\Users\Administrator\Downloads\sg3_utils-1.37exe>sg_vpd.exe --page=0xb0 d:
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 1
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 65536
  Maximum unmap block descriptor count: 64
  Optimal unmap granularity: 2048
  Unmap granularity alignment valid: 1
  Unmap granularity alignment: 0
  Maximum write same length: 0x2000 blocks

 

## Hypervisor(ESXi)

1. Thin Provisioning 지원 여부 확인

# esxcli storage core device list -d naa.60060160c0304e00165e6264c9b82c18 | grep Thin
   Thin Provisioning Status: yes

 

## Guest OS 내부에서 파일을 제거하고 UNMAP 작업을 시도하면, Hypervisor에서는 정해진 Provisioning Mode에 따라서 처리

1. Guest OS(Windows)에서 Filesystem TRIM 작업 호출

PS C:\Users\Administrator> Optimize-Volume -DriveLetter D -ReTrim -Verbose
VERBOSE: Invoking retrim on New Volume (D:)...
VERBOSE: Performing pass 1:
VERBOSE: Retrim:  0% complete...
VERBOSE: Retrim:  6% complete...
VERBOSE: Retrim:  43% complete...
VERBOSE: Retrim:  90% complete...
VERBOSE: Retrim:  100% complete.
VERBOSE:
Post Defragmentation Report:
VERBOSE:
 Volume Information:
VERBOSE:   Volume size                 = 29.98 GB
VERBOSE:   Cluster size                = 4 KB
VERBOSE:   Used space                  = 70.61 MB
VERBOSE:   Free space                  = 29.91 GB
VERBOSE:
 Retrim:
VERBOSE:   Backed allocations          = 29
VERBOSE:   Allocations trimmed         = 27
VERBOSE:   Total space trimmed         = 26.95 GB

 

2. 해당 시점에 Hypervisor의 iSCSI Packet 수집

 

[테스트 예제]

## From Filesystem to underlying storage

1. From Filesystem to Disk Driver

Combine with FFFFE001F7658180 , NtfsDeallocateClusters and NtfsMarkUnusedContextPreTrimWorkItemProcessing

[11] 0004.1100::07/30/18-09:42:54.8300433 [ntfs] bitmpsup_c2379 NtfsDeallocateClusters() - NtfsDeallocateClusters: Vcb FFFFE001F7658180 - deleting FR 10000002a7e9f from clusters 53 to 7fffffffffffffff

[11] 0004.1100::07/30/18-09:42:54.8300446 [ntfs] bitmpsup_c2453 NtfsDeallocateClusters() - NtfsDeallocateClusters: Vcb FFFFE001F7658180 - deleting FR 10000002a7e9f starting at 631138c for 3d clusters

[11] 0004.1100::07/30/18-09:42:54.8300459 [ntfs] bitmpsup_c2505 NtfsDeallocateClusters() - NtfsDeallocateClusters: Vcb FFFFE001F7658180 - updating DeallocatedClustersCount from 0 to 3d

[0] 0004.0574::07/30/18-09:43:03.5848104 [ntfs] delnotify_c1414 NtfsMarkUnusedContextPreTrimProcessing() - NtfsMarkUnusedContextPreTrimProcessing: Vcb FFFFE001F7658180 - Kicked off DelayedWorkQueue

[4] 0004.0EDC::07/30/18-09:43:03.5848220 [ntfs] delnotify_c1654 NtfsMarkUnusedContextPreTrimWorkItemProcessing() - NtfsMarkUnusedContextPreTrimWorkItemProcessing: Vcb FFFFE001F7658180 - Sending storage ioctl down.  MUC FFFFE001FD68EF20

[6] 0004.0EDC::07/30/18-09:43:03.6638743 [ntfs] delnotify_c1807 NtfsMarkUnusedContextPreTrimWorkItemProcessing() - NtfsMarkUnusedContextPreTrimWorkItemProcessing: Vcb FFFFE001F7658180 - Add MUC FFFFE001FD68EF20 to post trim list

[6] 0004.0EDC::07/30/18-09:43:03.6638632 [classpnp] utils_c3460 DeviceProcessDsmTrimRequest() - DeviceProcessDsmTrimRequest (FFFFE001F5FA8060): UNMAP command issued. Returned NTSTATUS: STATUS_SUCCESS.

[6] 0004.0EDC::07/30/18-09:43:03.6638845 [ntfs] delnotify_c1160 NtfsMarkUnusedContextPostTrimProcessing() - NtfsMarkUnusedContextPostTrimProcessing: Vcb FFFFE001F7658180 - Releasing bitmap


NtfsDeallocateClusters -> increase DeallocatedClustersCount -> trigger trim -> unmap sent -> DeallocatedClustersCount decreased to 0 .

 

2. From iSCSI Initiator to iSCSI Target

 

[참고 자료]

Thin-provisioned disks with QEMU and KVM

https://wiki.qemu.org/images/4/45/Devconf14-bonzini-thin-provisioning.pdf

 

Information technology - SCSI Block Commands – 4 (SBC-4)

https://standards.incits.org/apps/group_public/download.php/124286/livelink

 

Logical block provisioning - SBC

https://gist.github.com/cathay4t/e80e02a737242a5f3824606543631bfe

 

The road for thin-provisioning

https://www.linux-kvm.org/images/7/77/2012-forum-thin-provisioning.pdf

 

SCSI Commands Reference Manual

https://www.seagate.com/files/staticfiles/support/docs/manual/Interface%20manuals/100293068j.pdf

 

File Systems and Thin Provisioning

https://www.snia.org/sites/default/files/files2/files2/SDC2011/presentations/monday/FrederickKnight_File_Systems_Thin_Provisioning.pdf

 

https://manpages.ubuntu.com/manpages/focal/en/man8/sg_write_same.8.html

https://linux.die.net/man/8/sg_write_same

 

 

'Storage' 카테고리의 다른 글

Corrupted journal resource cluster metadata  (0) 2023.07.26
How to use Hexdump  (0) 2023.07.26
How APD(All-Path Down) Works  (4) 2023.06.08
vSAN Objects(vDisk, Home Namespace and etc)  (0) 2023.06.01
ATS(Atomic Test & Set)  (0) 2023.05.15