Quantcast
Channel: Colocation to Virtualization » ramsan 500
Viewing all articles
Browse latest Browse all 3

RamSan 500 Pictures / Failure

$
0
0

We had two DRAM cache modules fail in our RamSan 500 recently.  The actual error was:

Uncorrected ECC event detected on boards 4,6

The DRAM cache fronts the Flash storage drives in the unit.  Basically, loosing these two boards took down the unit completely.  To get the unit back up, boards 4 and 6 needed to be removed.  RamSan actually had me remove 4 of the boards instead of just the two that were bad.  Guessing it has something to do with not being able to have slots not filled between active modules.  
Doing so took the DRAM cache from 64 gigs down to 32.  All data was lost on the FLASH RAID array, making us have to restore the Oracle database from backup.  After the cards were removed, a low level format of the RAID array was done, the unit was operational again.   Below are pictures from within the RamSan 500 for all the storage people out there.

RamSan Percentage Error.  What? Internal Batteries and Power Module DRAM Cache (left) and Fiber Cards (right) Internal Zoomed Out Cache With Cards Removed Cache DRAM module

Best part of this is we had racked two RamSan 630’s that were being configured as a mirrored pair.  The plan was to implement them in two weeks.  When the 500 failed, they were implemented that night.


Filed under: Datacenter, Hardware, RamSan, SAN (Storage Area Network)

Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images