[Grml] How to use GRML to check whether a hard disk is failing
Michael Whapples
mwhapples at aim.com
Tue Dec 22 21:24:44 CET 2009
Thanks, I had noticed smartctl but the amount of output was a bit much
at first.
Michael Whapples
On 22/12/09 19:06, David Maus wrote:
> At Tue, 22 Dec 2009 15:33:30 +0000,
> Michael Whapples wrote:
>
>> Hello,
>> I am wondering whether GRML can help me here. I have agreed to check a
>> computer (tomorrow) for someone as it isn't booting properly (its a
>> windows XP computer). By the sound of it I suspect the hard disk is
>> failing or totally failed or windows has become corrupted to the point
>> it won't boot.
>>
> IMHO the first question should be how important the data on the hard
> disk is. If the hdd sounds unhealthy chances are goot the it may be a
> mechanical defect that gets worse and destroys data simply by spinning
> the discs. I personally refuse to check computers whose hard disks
> make unhealthy noises.
>
> If you decide to check it, my second step would be booting grml and
> making a backup of the drive using ddrescue.
>
> To check the hdd I would use smartmontools that queries the internal
> log of the hard disk.
>
> smartctl -a /dev/<disk>
>
> Displays a overview over the hard disk's state. I normally check the line
>
> SMART overall-health self-assessment test result:
>
> and on the SMART Attributes
>
> - 196: Reallocated_Event_Count
>
> Physically damaged sectors are reallocated; it's okay if this
> happes sometimes but an increasing number of reallocated sectors
> is troubel ahead.
>
> - 197: Current_Pending_Sector
>
> Pending sectors are sectors that are marked for reallocation but
> can't be reallocated for some reason.
>
> Please be aware that the attribute table is hard to interpret because
> what most of the values actually /mean/ depends on the hard disk
> manufacturer. It is for instance normal for a "Seagate Barracuda
> 7200.10 family" that the raw value of attribute 1: Raw_Read_Error_Rate
> is about 124438548 etc.
>
> It's my practical expirience as a sysadmin that the attributes 196 and
> 197 are good indicators of failing hdds.
>
> You may also start an internal self-test of the hdd (smartctl -t) --
> the possible test routines depend on the hdd model but I would try a
> long selftest (smartctl -t long).
>
> As I had to debug a failing hdd recently I can only stress that what
> ever you do you should check the SMART values occasionally. In my case
> I noticed an increasing rate of reallocated sectors while trying to
> fix the filesystem.
>
> On the question how to check and/or fix a broken ntfs filesystem, I am
> lost.
>
> HTH
>
> -- David
>
>
More information about the Grml
mailing list