P812 “forgot” RAID6 – can’t see disks at boot time in ORCA

We have an MSA60 with 12x4TB non-HP issue Seagate Constellation ES.3 drives connected to a P812/FBWC, on which I created a RAID6 over all of those disks with hpacucli and started to copy data on them.

Also, I pulled one of the drives during the early part of the copying procedure and replaced it, just to see how badly a RAID6 rebuild would affect write (and later read) performance for our production scenario. (It wasn’t too bad, and it would’ve taken ~5 days to rebuild). This drive was at 75% rebuild.

Now I rebooted the DL385G7 with Debian Squeeze on it, to which the P812 is attached, and on reboot, no more array on the P812. The internal P410i array was intact. Hpacucli does see the drives, but lists them as unassigned. I googled a bit, and got the suggestion that re-creating the array the same would bring it back. I did do that. vgscan did not find the LVM volume.

I rebooted and went into ORCA. ORCA says that there are no volumes and no drives.

Now I’m a bit taken aback – what could be the problem? ORCA doesn’t see the drives but hpacucli does? Could this be the problem why the LD that I created with hpacucli and already used doesn’t pop up?

I have a replacement minisas cable and a replacement MSA60 I can play around with. A replacement P812 will take a while.

How do I debug this? What chance do I have getting the data back without the use of an external forensics company?

edit: Ok, now hpacucli doesn’t see the drives either anymore. I think I’ll go with replacing the MSA60 enclosure first.

edit2: Ok, ignoring all the “you’re only a professional if you have the money for HP-disk-tax” snobbery, the following has transpired:

  • I did not check for the MSA actually being there:

    => ctrl slot=1 enclosure all show
    Error: The specified device does not have any storage enclosures.
    

could’ve told me all I needed.

  • After swapping the cable and the port on the P812, I swapped the MSA60 (cold) and lo and behold, there was my array.
  • The previously rebuilding disk at 70-something% is now marked “OK”, prompting me to run filesystem checks. I suspect that the controller will continue the rebuild after the initial rescan.

Please not that I did not pull a disk “just for fun”. I pulled it to be able to judge if RAID6 was sufficient for our needs in production. Which I would encourage everybody to do for a new configuration – doesn’t matter if in storage, a piece of software or network equipment.

Answer

Your array is probably gone. I suspect that you probably ran into a firmware issue. Chances are that your P812 controller wasn’t at a good revision level. Also, the MSA60 went end-of-life back in 2008-2009.

  • Did you run any updates prior to configuring this array?
  • What version firmware is the Smart Array P812 controller running?
  • Is the MSA60 at a good level?
  • Were these SAS or SATA drives?
  • What link speed did the drive negotiate? 1.5Gbps? 3Gbps?
  • Can you boot the Array Configuration Utility and run an HP ADU diagnostic report?
  • Finally, pull the power on everything. Let the drives and enclosure spin down. Try again.

Failures on the MSA60 and Smart Array controllers are very rare. I think you ran into a bug. Using RAID6 (which is sub-optimal in most situations) and unsupported disks could be an issue. Especially with SATA. If anything, I’d run them RAID 1+0 to reduce the chance of controller issues.

Potential problems fixed by recent firmware (over the last year)…

  • Protection has been added to prevent potential Smart Array controller hangs under rare conditions when hot-adding hard drives.
  • On rare occasions, the Smart Array controller would reset the same SATA drive several times when a PHY is stuck longer than four seconds.
  • The Smart Array controller would not connect to hard drives within 20ms under heavy stress.
  • Fixed an issue with the HP P812 controller in which a rare lockup (code 0xD4) could occur upon reboot.
  • Fixed an issue where after hot-adding a SATA disk to an MSA-60, MSA-70, or HP DL180-G6 12-drive backplane, the storage controller could become unresponsive. Reference Customer Advisory c03011608.
  • Fixed an issue where simultaneous handling of many Unrecoverable Read Errors on SATA disks supporting Native Command Queuing could result in a lockup (code 0x15).
  • RAID 6/60 surface analysis could result in background parity scans that stop responding while doing excessive fault tolerance calculations.
  • A Smart Array P812 controller attached to multiple MSA 60 storage systems may encounter a lockup condition (lockup code 0XAB) during heavy I/O workload.
  • After hot-replacement of an HP Smart Array HDD, all drives, which were attached to the expander where the HDD was replaced, report as being in Bay 0. Issue occurs on the HP StorageWorks MSA60, HP StorageWorks MSA70, and HP ProLiant DL180 G6 with 12 bay and 25 bay backplanes.

Attribution
Source : Link , Question Author : elpollodiablo , Answer Author : ewwhite

Leave a Comment