본문 바로가기

Storage

SCSI Protocol(+SCSI Sense Code, Errors)

 

SAN Layout : https://www.pearsonitcertification.com/articles/article.aspx?p=1944878&seqNum=7

 

SCSI Protocol : https://www.cs.uml.edu/~bill/cs520/slides_07_scsisnia.pdf

  • SCSI는 Client-Server Protocol

  • Client는 Initiator라고 불리며, Server에 보낼 Request 생성
    • 단일 Initiator는 여러 Application Client를 통한 Request 생성 
  • Server는 Target이라고 불리며, Initiator의 Request를 수신 및 실행하고 이에 따른 결과를 Initiator에게 전달
    • Target에는 하나의 Task Manager가 있고, 숫자가 매겨진 LU(Logical Unit)을 가지고 있음 --> LUN
    • Target에는 Task Manager외에 Device Server가 있어 Initiator로부터 전달받은 Request를 처리하여, 특정 LUN에 전달
    • Target 쪽에는 Target에 실행할 명령어들(Task)을 Holding 하고 있는 Queue가 위치

  • Task 종류
    • Simple : 순서에 상관없이 실행 가능
    • Ordered : 반드시 순서에 맞게 실행 
    • Head of queue : Queue의 Front에 Task를 추가를 Target에 알리는 Task
    • Auto Contingent Allegiance(ACA) : 이전에 실행한 Command가 Error Condition으로 진입하는 경우
  • SCSI Protocol Service 종류
    • Execute와 Confirmation Service
    • Data Transfer Service
      • Command Phase : Command Descriptor Block(CDB)를 이용하여 Command와 Parameter를 전달
      • Data Phase : Command에 따른 Data 전송
      • Status Phase : Command 실행 결과에 따른 상태 정보 전송
  • SCSI I/O Operations 종류
    • Data 전송이 없는 I/O
      • Initiator가 SCSI Command를 Target에 전송한 후 Target은 Status만 Return
      • SCSI Command 종류
        • Test Unit Ready
        • Start/Stop Unit
        • Rewind
    • Data 전송이 있는 I/O
      • Data Phase를 이용하여 정보 교환
        • Data In/Out Transmits
        • Data는 한 번에 전송될 수도 있고 여러 Data Phase를 걸쳐서 전송될 수도 있음
      • SCSI Command 종류
        • Read/Write
        • Inquiry
  • Command Descriptor Block(CDB)
    • Initiator는 SCSI Command를 CDB에 담아서 Target에 전달
    • 첫 번째 Byte는 Operation Code
    • 마지막 Byte는 Control Code

    • CDB의 크기는 10,12, 16 Bytes 또는 가변 길이도 가능
  • 표준 SCSI Command 종류

  • SCSI Status
    • Initiator가 Target에 전송한 SCSI Command의 성공 여부 확인
    • Busy 또는 Not Ready 표현
    • Error Condition 표현
    • Target task set이 Full 난 경우도 표현

SCSI Sense Code : https://www.t10.org/lists/2sensekey.htm

  • SCSI Command가 CHECK CONDITION Status로 완료가 되는 경우에 Sense Data에서 SCSI Sense Key 확인

Frame 374: 98 bytes on wire (784 bits), 98 bytes captured (784 bits)
Ethernet II, Src: Netgear_5b:9b:a2 (00:0f:b5:5b:9b:a2), Dst: VMware_f9:ef:be (00:0c:29:f9:ef:be)
Internet Protocol Version 4, Src: 192.139.81.227, Dst: 192.168.1.208
Transmission Control Protocol, Src Port: 3260, Dst Port: 36247, Seq: 2165, Ack: 1361, Len: 32
[2 Reassembled TCP Segments (80 bytes): #373(48), #374(32)]
iSCSI (SCSI Response)
    Opcode: SCSI Response (0x21)
    Response: Command completed at target (0x00)
    Status: Check Condition (0x02)
    TotalAHSLength: 0 (0x00)
    DataSegmentLength: 31 (0x0000001f)
    InitiatorTaskTag: 0x0000000f
    StatSN: 16 (0x00000010)
    ExpCmdSN: 16 (0x00000010)
    MaxCmdSN: 48 (0x00000030)
    ExpDataSN: 0x00000000
    BidiReadResidualCount: 0 (0x00000000)
    ResidualCount: 0 (0x00000000)
    Request in: 372
    Time from request: 0.075611000 seconds
    SenseLength: 29 (0x001d)
Flags: 0x80
SCSI: SNS Info
    [LUN: 0x0001]
    .111 0000 = SNS Error Type: Current Error (0x70)
    Valid: 112
    0... .... = Filemark: False
    .0.. .... = EOM: False
    ..0. .... = ILI: False
    .... 0101 = Sense Key: Illegal Request (0x5)
    Sense Info: 0x00000000
    Additional Sense Length: 21
    Command-Specific Information: 00000000
    Additional Sense Code+Qualifier: Invalid Field In Cdb (0x2400)
    Field Replaceable Unit Code: 0x00
    0... .... = SKSV: False
    .000 0000 0000 0000 0000 0000 = Sense Key Specific: 0x000000

 

SCSI Errors

  • SCSI Error는 vmkernel.log에서 다음 Component 들과 연관
    • nmp_ThrottleLogForDevice
    • ScsiDeviceIO
    • HppThrottleLogForDevice

 

SCSI Errors - vmkernel.log

2023-03-07T22:26:20.681Z cpu34:2098047)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0x8a (0x45d92ca46a40, 2138740) to dev "naa.624a9370e793c401a188430c00019385" on path "vmhba2:C0:T2:L245" Failed:
 
2023-03-07T22:26:20.681Z cpu34:2098047)ScsiDeviceIO: 4277: Cmd(0x45d92ca46a40) 0x8a, CmdSN 0x8000006c from world 2138740 to dev "naa.624a9370e793c401a188430c00019385" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x29 0x3
 
2022-09-15T00:29:50.351Z cpu76:2098395)WARNING: HPP: HppThrottleLogForDevice:1136: Cmd 0x28 (0x45dd2d622788, 0) to dev "naa.50000398d8235271" on path "vmhba3:C0:T12:L0" Failed:

 

  • 예제
    • ScsciDeviceIO가 Report한 메시지

Reported by ScsiDeviceIO

2023-03-07T22:26:20.681Z cpu34:2098047)ScsiDeviceIO: 4277: Cmd(0x45d92ca46a40) 0x8a, CmdSN 0x8000006c from world 2138740 to dev "naa.624a9370e793c401a188430c00019385" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x29 0x3
 
Cmd 0x8a       ## SCSI Command (16bytes Write)
H:0x0          ## Host-Side / HBA
D:0x2          ## Device-Side / SP(Storage Processor)
P:0x0          ## Plug-in Side / NMP
0xb 0x29 0x3   ## Sense Key + Additional Sense Key + Additional Sense Code(ASC) + Additional Sense Code Qualifier(ASCQ)

 

  • NMP가 Report한 메시지

Reported by NMP

2023-03-07T22:26:20.681Z cpu34:2098047)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0x8a (0x45d92ca46a40, 2138740) to dev "naa.624a9370e793c401a188430c00019385" on path "vmhba2:C0:T2:L245" Failed:
 
vmhba2         ## SCSI Command가 보내졌던 HBA
:C0            ## Channel
:T2            ## Target
:L245          ## LUN ID

 

  • Host-side 문제 발생

 

Reported by Host-sde

Cmd(0x412e449523c0) 0x2a, CmdSN 0x46f43 from world 32797 to dev "naa.600143801259bd790000500002e10000" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x1
 
H:0x5   ## Abort

 

  • Device-side 문제 발생

Reported by Device-side

Cmd(0x412e40439fc0) 0x2a, CmdSN 0x800000ea from world 38844 to dev"naa.600143801259bd790000500001090000" failed H:0x0 D:0x28 P:0x0 Possible sense data: 0x2 0x3a 0x1
 
D:0x28   ## Task Set Full

 

  • Plugin-side 문제 발생

Reported by Plugin-side

Cmd(0x4124448131c0) 0x2a, CmdSN 0x17 from world 19189 to dev "naa.60000970000292603381533030394636" failed H:0x0 D:0x2 P:0x8 Possible sense data: 0x2 0x4 0x3
 
P:0x8   ## Backing pool for thin provisioned LUN is out of space

 

  • Valid sense data

Valid sense data

Cmd(0x4125411e8f00) 0x9e, CmdSN 0x158614 from world 8272 to dev "naa.60000970000292601192533030354546" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0
 
D:0x2
Valid sense data:
0x5        ## Illegal Request
0x25 0x0   ## Logical Unit Not Supported

 

    • Possible sense data

Possible sense data

Cmd(0x412e40439fc0) 0x2a, CmdSN 0x800000ea from world 38844 to dev"naa.600143801259bd790000500001090000" failed H:0x0 D:0x28 P:0x0 Possible sense data: 0x2 0x3a 0x1
 
D:0x28
Possible sense data:
0x2        ## Not Ready
0x3a 0x1   ##  Medium Not Present - Tray Closed

 

SCSI Disk Performance

performance has deteriorated - vmkernel.log

2023-02-09T11:27:25.558Z cpu11:66711)WARNING: ScsiDeviceIO: 1203: Device naa.600300570213ff3026a0f4ea0a156e16 performance has deteriorated. I/O latency increased from average value of 97 microseconds to 12551 microseconds.

 

[참고 자료]

Understanding the storage path failover sequence in VMware ESXi native multipathing (1027963)

https://ikb.vmware.com/s/article/1027963?lang=en_US

 

SCSI events that can trigger ESX server to fail a LUN over to another path (1003433)

https://ikb.vmware.com/s/article/1003433

 

분석 시 활용 Tip

  • HBA 목록 확인
  • # esxcli storage core adapter list
HBA Name  Driver      Link State  UID                                   Capabilities         Description
--------  ----------  ----------  ------------------------------------  -------------------  -----------
vmhba0    lsi_mr3     link-n/a    sas.52cea7f077cd4c00                                       (0000:18:00.0) Broadcom PERC H730P Mini
vmhba1    vmw_ahci    link-n/a    sata.vmhba1                                                (0000:00:11.5) Intel Corporation Lewisburg SATA AHCI Controller
vmhba2    vmw_ahci    link-n/a    sata.vmhba2                                                (0000:00:17.0) Intel Corporation Lewisburg SATA AHCI Controller
vmhba3    lpfc        link-down   fc.200000109bb46c09:100000109bb46c09  Second Level Lun ID  (0000:3b:00.0) Emulex Corporation Emulex LightPulse LPe32000 PCIe Fibre Channel Adapter
vmhba4    lpfc        link-down   fc.200000109bb46c0a:100000109bb46c0a  Second Level Lun ID  (0000:3b:00.1) Emulex Corporation Emulex LightPulse LPe32000 PCIe Fibre Channel Adapter
vmhba64   brcmnvmefc  link-down   fc.200000109bb46c09:100000109bb46c09                       (0000:3b:00.0) Emulex Corporation Emulex LightPulse LPe32000 PCIe Fibre Channel Adapter
vmhba65   brcmnvmefc  link-down   fc.200000109bb46c0a:100000109bb46c0a                       (0000:3b:00.1) Emulex Corporation Emulex LightPulse LPe32000 PCIe Fibre Channel Adapter
vmhba66   iscsi_vmk   online      iqn.1998-01.com.vmware:w2-tse-d12     Second Level Lun ID  iSCSI Software Adapter

 

  • FC Event 확인
    • # esxcli storage san fc events get

  • #  grep "Frame Dropped" vmkernel.log
2014-08-18T20:17:51.421Z cpu25:16409)WARNING: vmklinux: vmklnx_iodm_event:988:vmhba3: Frame Dropped 36 times in 60s, SAN connection check required.
2014-08-18T20:18:02.152Z cpu19:20266)WARNING: vmklinux: vmklnx_iodm_event:988:vmhba3: Frame Dropped 53 times in 60s, SAN connection check required.
2014-08-18T20:18:12.275Z cpu30:16414)WARNING: vmklinux: vmklnx_iodm_event:988:vmhba3: Frame Dropped 66 times in 60s, SAN connection check required.
2014-08-18T20:18:22.297Z cpu27:16411)WARNING: vmklinux: vmklnx_iodm_event:988:vmhba3: Frame Dropped 70 times in 60s, SAN connection check required.

 

  • SCSI error 확인
    • 결과에서 H, D, P를 우선 확인하여 Host-side / Device-side / Plugin-side 문제를 구분
    • # sed -n "s@.*naa\.[0-z]\{32\}.*\(vmhba[0134]:C[0-9]:T[0-9]\):L[0-9]\{1,3\}.*\(H:0x[0-z]\{1,2\} D:0x[0-z]\{1,2\} P:0x[0-z]\{1,3\}.*0x[0-z]\{1,2\} 0x[0-z]\{1,2\} 0x[0-z]\{1,2\}\).*@\1 \2@p" vmkernel.log | sort | uniq -c
    • # grep ScsiDeviceIO vmkernel.* | grep "Valid sense data" | awk'{print$13,$15,$16,$17,$18,$19,$20,$21,$22,$23}' | sort | uniq -c

 

  • HBA가 확인된 경우, NMP 로그 확인
    • # sed -n "s@.*NMP.*\(vmhba[35]:C[0-9]:T[0-9]\).*@\1@p" vmkernel.log | sort | uniq -c

 

  • HBA가 확인된 경우, 해당 HBA를 통해서 discover 된 Target 확인
    • # sed -n "s@.*\(vmhba5:C[0-9]:T[0-9]\).*\(Target:\).*\(WWPN:.*\)@\1 \2 \3@p" esxcfg-mpath_-b.txt | sort -u

 

 

'Storage' 카테고리의 다른 글

ATS(Atomic Test & Set)  (0) 2023.05.15
Locked Files  (0) 2023.04.25
vSAN UUID, Delete vSAN Object  (0) 2023.04.23
Driver/Firmware Check - HBA  (0) 2023.03.18
vSAN Health Service - Component Limits  (2) 2023.02.28