Today one of my servers started sending me alerts about non-optimal RAID state. These were triggered by a very simple script run by cron -- if it detects that not all of the disks report 'Optimal' state, it sends an alert.
Now, the issue is that the RAID seems to be fine but the megacli -LDInfo -Lall -aALL command invoked by the script fails repeatedly leaving a cryptic error message in syslog:
megacli: Failed to alloc kernel SGL buffer for IOCTL. The curious thing is that the command does work sometimes and does return output, but most of the time it just returns two blank lines and the exit code:
# megacli -LDInfo -Lall -aALL Exit Code: 0x00
The same goes for megacli with other parameters like megacli -AdpAllInfo -aAll. Every time the command fails the said error appears in syslog.
This has never happened before, as far as I can remember. No changes were made at the server recently. The adapter is a PERC 6/i Integrated and the server runs under Debian Wheezy.
What could possibly be the issue and where do I start resolving this?
EDIT:
# megacli -v
MegaCLI SAS RAID Management Tool Ver 5.00.12 May 08, 2009
(c)Copyright 2009, LSI Corporation, All Rights Reserved.
Exit Code: 0x00
At least this command works every time without triggering the error ;) I've just realised this is an old release of megacli. Still, it shouldn't matter since the very same setup has been working a couple dozen of months with no problem and now suddenly decided to go wild.