I am having issues with an LSI 9201 16e SAS HBA card (PCI2.0). I've tried the following

  • Upgrading firmware and BIOS on the card
  • Changing PCI slot
  • Disabling MSI and MSI-X interrupts in my OS (FreeBSD 12.1)

The issue is that the card appears to become unresponsive on moderate disk load. It typically lasts for some time without load, but when moderate load is applied I have observed it become unresponsive in as little as an hour. This results in a reboot (panic) because the OS is unable to reset the controller. The card is well cooled and never hot to the touch.

I wanted to check whether there might be some kind of compatibility issue between the card and this HP system board.





mps0: <Avago Technologies (LSI) SAS2116> port 0x2000-0x20ff mem 0xf8040000-0xf8043fff,0xf8000000-0xf803ffff irq 42 at device 0.0 numa-domain 1 on pci11
mps0: Firmware:, Driver:
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>


(da8:mps0:0:8:0): WRITE(10). CDB: 2a 00 24 42 79 b0 00 00 08 00 
(da8:mps0:0:8:0): CAM status: CAM subsystem is busy
(da8:mps0:0:8:0): Error 5, Retries exhausted
mps0: mpssas_action_scsiio: Freezing devq for target ID 10
mps0: (da10:mps0:0:10:0): WRITE(10). CDB: 2a 00 24 42 79 b0 00 00 08 00 
mpssas_action_scsiio: Freezing devq for target ID 8
(da10:mps0:0:10:0): CAM status: CAM subsystem is busy
(da8:mps0:0:8:0): WRITE(10). CDB: 2a 00 24 42 82 58 00 00 08 00 
(da10:mps0:0:10:0): Error 5, Retries exhausted
(da8:mps0:0:8:0): CAM status: CAM subsystem is busy
mps0: mpssas_action_scsiio: Freezing devq for target ID 9
(da8:mps0:0:8:0): Retrying command, 3 more tries remain
(da9:mps0:0:9:0): WRITE(10). CDB: 2a 00 24 42 81 88 00 00 08 00 
mps0: (da9:mps0:0:9:0): CAM status: CAM subsystem is busy
mpssas_action_scsiio: Freezing devq for target ID 10
(da9:mps0:0:9:0): Error 5, Retries exhausted
(da10:mps0:0:10:0): WRITE(10). CDB: 2a 00 24 42 82 58 00 00 08 00 
        (xpt0:mps0:0:9:0): SMID 1 task mgmt 0xfffffe000451e158 timed out
(da10:mps0:0:10:0): CAM status: CAM subsystem is busy
mps0: (da10:mps0:0:10:0): Retrying command, 3 more tries remain
Reinitializing controller
panic: mps_reinit hard reset failed with error 60
the SAS card that HP officially supported for this workstation was the 8888ELP


these cards are quite cheap on ebay, so you might want to simply swap out the card and see what happens


with that said, the 9201 is a "generic" LSI single chip "ROC" card that came in many different variants 9211/9212 being the most common all of the lowend 92xx cards use the same LSI chip, and they lack a dedicated xor chip for raid 5 calculations and have no onboard cache as such they are unsuitable for anything but light duty raid 0/1


if you need raid 5 and/or a card with sustained I/O and a large queue depth the 9201 is not a card to consider/use


are you using the card in raid or just as more SAS/SATA ports? IE-which firmware is on the card IR or IT ?


Update: forgot to mention, check/replace the cabling from the card to drive(s) and also try a different port on the card


i've seen bad cables and borked  (failing/defective)  ports on LSI cards

I don't actually know much about storage but I would be surprised if this load itself is becoming a problem for the card. I called it "moderate", but the 3 disks which have almost all of that load are just 2.5" 5400rpm 500gb consumer-grade laptop drives. I haven't observed them closely so I don't know if there has been a problem with those drives handling it, such as whether their queues fill up (and, I guess that could be the origin of the CAM messages in my trace output, though not the origin of the reinitialisation failure). On the other hand, I think this crash also happens far below peak load.  In any case I assumed that those drives would be a bottleneck before anything became a problem for the card.


I will note that the card does have 16 total disks connected.  I think the card is in IT mode, since it is operating as a JBOD, presenting each drive directly as a 'da' device to the OS. The only option I am familiar with on the BIOS is the 'Boot mode' which can be either PC only, BIOS only, PC/BIOS, Disabled; I have tried it on a few of those with the same crash.


I will take a look for a suitable 8888ELP card and see about trying to put drives on that instead, starting with the loaded ones.


I will also try shuffling the cables around.


There is also the question of why the card is not responding to the reinitialisation command. I guess the card itself could be crashing, if for example I am wrong about the load and it is too much. This failure is what is causing the system crash. Can this be caused by the bad ports / cable?



Also, I was curious whether I might run into any issues plugging these three consumer SATA drives (mentioned above) into the SAS connectors on the motherboard, since the SATA connectors are occupied. I had removed all the drives from the SAS connectors, because I appeared to be having some issues with my partition labels being corrupted or not visible, though many of these drives are >2TB. If the 500GB drives are OK to plug into SAS, that might provide more time between crashes (so a partial workaround) but also might give some clues since we would be taking the drives with the presumed problematic load off of the HBA card.

if you don't know what mode firmware is installed on your LSI card.......STOP and do nothing else until you do know


simply entering the cards bios will give you this information, if it offers to create a Raid in it's bios you have "IR" firmware installed


based on you statement that you have 11 disks in JBOD suggests you should be using "IT" firmware


the LSI 9201 card can do JBOD while running Raid firmware, but it's not recommended as the raid firmware adds a layer onto the drives I/O and can cause major issues with software raid with products like "ZFS"


the LSI 8888elp card is a raid only card (no JBOD) using external ports (ELP=ext ports)


i recommend you look into buying a LSI 9201/9211/9212 card off ebay that has the "IT" firmware already installed or

a adaptec ASR-71605 make sure the adaptec comes with the optional battery backup/cache module you will also need the correct sff-8087 FORWARD BREAKOUT CABLE(S), card to drive(s)

do not use the reverse type unless you are connecting the card to a backplane







OK, I confirmed that it is IT firmware. For reference here is output from the sas2flash utility.


        Adapter Selected is a LSI SAS: SAS2116_1(B1) 

        Controller Number              : 0
        Controller                     : SAS2116_1(B1) 
        PCI Address                    : 00:58:00:00
        SAS Address                    : 500062b-2-00de-1940
        NVDATA Version (Default)       :
        NVDATA Version (Persistent)    :
        Firmware Product ID            : 0x2213 (IT)
        Firmware Version               :
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9201-16e
        BIOS Version                   :
        UEFI BSD Version               :
        FCODE Version                  : N/A
        Board Name                     : SAS9201-16e




again, the LSI 9201/9211/9210 /9240 cards are all the same basically the same hardware, and although there were 3 different ROC chips used all have the same basic feature set  (LSI-2008 IT or IR firmware) / 2116 "IT" firmware only / 2308 pcie 3.0)


i don't use linux or unix so i can't comment on the OS's ability to do sustained I/O to numerous disks at the same time


rember,.....these cards are LOW END "basic" cards used to do raid 0/1 and/or expand a systems SAS/SATA ports


depending on what you are doing/using the cards for you may be exceeding the cards I/O ability or the OS's ability to provide a uninterrupted data flow or the application itself may be at fault


consider adding another 9201 to split the load between two cards or as i mentioned replace the card with one that has better specs like the adaptec card mentioned above


as for a hardware issue with the xw9400....................don't think so as this system came out quite a few years ago and if there were any hardware issues they would have been found and corrected/documented  a long time ago

