Pending SCSI hardware failure?
Tim Wunder
tim
Mon May 17 11:47:51 PDT 2004
I came in this morning to a hung server. Would not respond to pings,
couldn't access the console, had to pull the plug :-(
(Stupid me didn't try to get ssh access, but I doubt it woulda worked).
After looking at /var/log/messages, I find the following:
At around 1:00 AM, I get a bunch of these:
May 27 01:05:56 localserver kernel: (scsi0:0:5:-1) Unexpected busfree,
LASTPHASE = 0xe0, SEQADDR = 0xcc
May 27 01:05:56 localserver kernel: (scsi0:0:5:0) Invalid SCB during
SEQINT 0x71, SCB_TAG 255.
May 27 01:05:56 localserver kernel: (scsi0:0:3:0) No active SCB for
reconnecting target - Issuing BUS DEVICE RESET.
May 27 01:05:56 localserver kernel: (scsi0:0:3:0) SAVED_TCL=0x30,
ARG_1=0x1, SEQADDR=0x102
Followed by
May 27 01:06:26 localserver kernel: scsi : aborting command due to
timeout : pid 22992662, scsi0, channel 0, id 5, lun 0 Request Sense 00
00 00 10 00
May 27 01:06:26 localserver kernel: (scsi0:0:3:0) Synchronous at 20.0
Mbyte/sec, offset 15.
May 27 01:06:56 localserver kernel: SCSI host 0 abort (pid 22992662)
timed out - resetting
May 27 01:06:56 localserver kernel: SCSI bus is being reset for host 0
channel 0.
May 27 01:06:58 localserver kernel: SCSI host 0 channel 0 reset (pid
22992662) timed out - trying harder
May 27 01:06:58 localserver kernel: SCSI bus is being reset for host 0
channel 0.
Cron seems to still be working though:
May 27 01:10:00 localserver CROND[31532]: (root) CMD ( /sbin/rmmod -as)
May 27 01:10:00 localserver CROND[31533]: (root) CMD
(/usr/local/squid/bin/squid -k rotate)
May 27 01:10:00 localserver CROND[31535]: (root) CMD
(/usr/local/bin/webcal_remind.pl >> /var/webcal/message.log 2>&1)
THen I have a bunch of these:
May 27 01:17:38 localserver kernel: SCSI host 0 channel 0 reset (pid
22992662) timed out - trying harder
May 27 01:17:38 localserver kernel: SCSI bus is being reset for host 0
channel 0.
May 27 01:17:38 localserver kernel: (scsi0:0:3:0) Synchronous at 20.0
Mbyte/sec, offset 15.
May 27 01:17:38 localserver kernel: (scsi0:0:5:0) Synchronous at 20.0
Mbyte/sec, offset 15.
until around 2:30, when I get a bunch of these:
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without
command for SCB 1, SCB flags 0x0, cmd 0x0
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without
command for SCB 0, SCB flags 0x0, cmd 0x0
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without
command for SCB 1, SCB flags 0x0, cmd 0x0
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without
command for SCB 0, SCB flags 0x0, cmd 0x0
M
followed by
May 27 02:36:56 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without
command for SCB 0, SCB flags 0x0, cmd 0x0
May 27 02:36:56 localserver kernel: st0: Error with sense data:
[valid=0] Info fld=0x0, Current st09:00: sense key Unit Attention
May 27 02:36:56 localserver kernel: Additional sense indicates Power on,
reset, or bus device reset occurred
then entries like this...
May 27 02:37:20 localserver kernel: SCSI host 0 abort (pid 22992662)
timed out - resetting
May 27 02:37:22 localserver kernel: SCSI bus is being reset for host 0
channel 0.
May 27 02:37:22 localserver kernel: (scsi0:0:5:0) Synchronous at 20.0
Mbyte/sec, offset 15.
May 27 02:37:54 localserver kernel: SCSI host 0 channel 0 reset (pid
22992662) timed out - trying harder
May 27 02:37:54 localserver kernel: SCSI bus is being reset for host 0
channel 0.
May 27 02:37:54 localserver kernel: (scsi0:0:3:0) Synchronous at 20.0
Mbyte/sec, offset 15.
until May 27 04:06:58
FWIW:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 03 Lun: 00
Vendor: ARCHIVE Model: Python 06408-XXX Rev: 8130
Type: Sequential-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 05 Lun: 00
Vendor: NEC Model: CD-ROM DRIVE:466 Rev: 1.06
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: DELL Model: PERCRAID RAID5 Rev: 0001
Type: Direct-Access ANSI SCSI revision: 02
Does this mean my SCSI controller for my tape drive and CD-ROM is going
(or gone)?
Clueful insight welcome (since I seem to be lacking in cluefulness
lately...).
Thanks,
Tim
More information about the Linux-users
mailing list