kernel errors in log/messages

Sun Dec 19 18:06:35 PST 2004

On Sunday 19 December 2004 9:07 am, someone claiming to be Matthew Carpenter 
wrote:
> Net Llama! wrote:
> | On 12/18/2004 07:33 AM, Tim Wunder wrote:
> |> Using FC3 (kernel 2.6.9-1.681_FC3) with XFS filesystem on an old
> |> K6-500 system with a fairly new Seagate 80 GB HDD. I'm trying to copy
> |> a couple websites over from one server to another using tar, and when
> |> untar'ing the file, I keep getting errors like this:
> |>
> |> tar: pn/html/parameters/Sports/modules/NS-Banners/lang/eng: Cannot
> |> mkdir: Input/output error
> |>
> |> /var/log/messages has the following:
> |> Dec 18 10:19:04 jeeves kernel: hda: dma_intr: status=0x51 { DriveReady
> |> SeekComplete Error }
> |> Dec 18 10:19:04 jeeves kernel: hda: dma_intr: error=0x40 {
> |> UncorrectableError }, LBAsect=96175142, high=5, low=12289062
> |> , sector=96175142
> |> Dec 18 10:19:04 jeeves kernel: ide: failed opcode was: unknown
> |> Dec 18 10:19:04 jeeves kernel: end_request: I/O error, dev hda, sector
> |> 96175142
> |> Dec 18 10:19:04 jeeves kernel: I/O error in filesystem ("hda8")
> |> meta-data dev hda8 block 0x49b7c10       ("xfs_trans_re
> |> ad_buf") error 5 buf count 4096
> |> Dec 18 10:19:08 jeeves kernel: hda: dma_intr: status=0x51 { DriveReady
> |> SeekComplete Error }
> |> Dec 18 10:19:08 jeeves kernel: hda: dma_intr: error=0x40 {
> |> UncorrectableError }, LBAsect=70408902, high=4, low=3300038,
> |>  sector=70408902
> |> Dec 18 10:19:08 jeeves kernel: ide: failed opcode was: unknown
> |> Dec 18 10:19:08 jeeves kernel: end_request: I/O error, dev hda, sector
> |> 70408902
> |> Dec 18 10:19:08 jeeves kernel: I/O error in filesystem ("hda8")
> |> meta-data dev hda8 block 0x31252b2       ("xfs_trans_re
> |> ad_buf") error 5 buf count 512
> |> Dec 18 10:19:12 jeeves kernel: hda: dma_intr: status=0x51 { DriveReady
> |> SeekComplete Error }
> |> Dec 18 10:19:12 jeeves kernel: hda: dma_intr: error=0x40 {
> |> UncorrectableError }, LBAsect=96175142, high=5, low=12289062
> |> , sector=96175142
> |> Dec 18 10:19:12 jeeves kernel: ide: failed opcode was: unknown
> |> Dec 18 10:19:12 jeeves kernel: end_request: I/O error, dev hda, sector
> |> 96175142
> |> Dec 18 10:19:12 jeeves kernel: I/O error in filesystem ("hda8")
> |> meta-data dev hda8 block 0x49b7c10       ("xfs_trans_re
> |> ad_buf") error 5 buf count 4096
> |> Dec 18 10:19:16 jeeves kernel: hda: dma_intr: status=0x51 { DriveReady
> |> SeekComplete Error }
> |>
> |> FWIW:
> |> # hdparm /dev/hda
> |>
> |> /dev/hda:
> |>  multcount    = 16 (on)
> |>  IO_support   =  1 (32-bit)
> |>  unmaskirq    =  1 (on)
> |>  using_dma    =  1 (on)
> |>  keepsettings =  0 (off)
> |>  readonly     =  0 (off)
> |>  readahead    = 256 (on)
> |>  geometry     = 16383/255/63, sectors = 80026361856, start = 0
> |>
> |> # hdparm -i /dev/hda
> |>
> |> /dev/hda:
> |>
> |>  Model=ST380013A, FwRev=3.06, SerialNo=5JV8TL2L
> |>  Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
> |>  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
> |>  BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
> |>  CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=156301488
> |>  IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
> |>  PIO modes:  pio0 pio1 pio2 pio3 pio4
> |>  DMA modes:  mdma0 mdma1 mdma2
> |>  UDMA modes: udma0 udma1 udma2 udma3 *udma4 udma5
> |>  AdvancedPM=no WriteCache=enabled
> |>  Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:
> |>
> |>  * signifies the current active mode
> |>
> |> Any idea what's going on?
> |
> | Your drive is dying?  Try running xfs_repair on the unmounted partition
> | and see if that helps any, but at this point, hope you have backups of
> | whatever is important on that drive.
>
> I agree. The only times (thankfully) I've seen the above messages have
> been on a dying CDROM device (and sometimes I believe just a bad
> CD-disc).  Replacing the drive made the messages go away.  So sorry to
> hear about your loss :(

Well, for kicks, I unmounted /dev/hda8 and rebuilt the xfs partition. I now 
get:
Dec 19 14:12:22 jeeves smartd[2083]: Device: /dev/hda, 18 Currently unreadable 
(pending) sectors
Dec 19 14:42:21 jeeves smartd[2083]: Device: /dev/hda, 18 Currently unreadable 
(pending) sectors
Dec 19 15:12:22 jeeves smartd[2083]: Device: /dev/hda, 18 Currently unreadable 
(pending) sectors

repeatedly, but, otherwise, the disk seems to be operating fine. 

# xfs_check /dev/hda8
results in no output.

# xfs_repair /dev/hda8
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - clearing existing "lost+found" inode
        - deleting existing "lost+found" entry
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
        - traversal finished ...
        - traversing all unattached subtrees ...
        - traversals finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

So... is the disk bad or not?
None of the other partitions ever had an error mentioned:
# fdisk -l /dev/hda

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1         182     1461883+  83  Linux
/dev/hda2             183        9729    76686277+   5  Extended
/dev/hda5             183         275      746991   82  Linux swap
/dev/hda6             276        1029     6056473+  83  Linux
/dev/hda7            1030        1175     1172713+  83  Linux
/dev/hda8            1176        9729    68709973+  83  Linux

I'll run the machine for another week or so before I say the disk is bad.

Thanks, 
Tim

-- 
Fedora Core 2, Kernel 2.6.9-1.6_FC2,  KDE 3.3.1, Xorg 6.7.0
 18:00:00 up 11 days, 23 min,  3 users,  load average: 1.60, 14.13, 20.44
It's what you learn after you know it all that counts