Pending SCSI hardware failure?

Tim Wunder tim
Mon May 17 11:47:51 PDT 2004


I came in this morning to a hung server. Would not respond to pings, 
couldn't access the console, had to pull the plug :-(
(Stupid me didn't try to get ssh access, but I doubt it woulda worked).

After looking at /var/log/messages, I find the following:
At around 1:00 AM, I get a bunch of these:
May 27 01:05:56 localserver kernel: (scsi0:0:5:-1) Unexpected busfree, 
LASTPHASE = 0xe0, SEQADDR = 0xcc
May 27 01:05:56 localserver kernel: (scsi0:0:5:0) Invalid SCB during 
SEQINT 0x71, SCB_TAG 255.
May 27 01:05:56 localserver kernel: (scsi0:0:3:0) No active SCB for 
reconnecting target - Issuing BUS DEVICE RESET.
May 27 01:05:56 localserver kernel: (scsi0:0:3:0)       SAVED_TCL=0x30, 
ARG_1=0x1, SEQADDR=0x102

Followed by
May 27 01:06:26 localserver kernel: scsi : aborting command due to 
timeout : pid 22992662, scsi0, channel 0, id 5, lun 0 Request Sense 00 
00 00 10 00
May 27 01:06:26 localserver kernel: (scsi0:0:3:0) Synchronous at 20.0 
Mbyte/sec, offset 15.
May 27 01:06:56 localserver kernel: SCSI host 0 abort (pid 22992662) 
timed out - resetting
May 27 01:06:56 localserver kernel: SCSI bus is being reset for host 0 
channel 0.
May 27 01:06:58 localserver kernel: SCSI host 0 channel 0 reset (pid 
22992662) timed out - trying harder
May 27 01:06:58 localserver kernel: SCSI bus is being reset for host 0 
channel 0.

Cron seems to still be working though:
May 27 01:10:00 localserver CROND[31532]: (root) CMD (   /sbin/rmmod -as)
May 27 01:10:00 localserver CROND[31533]: (root) CMD 
(/usr/local/squid/bin/squid -k rotate)
May 27 01:10:00 localserver CROND[31535]: (root) CMD 
(/usr/local/bin/webcal_remind.pl >> /var/webcal/message.log 2>&1)

THen I have a bunch of these:
May 27 01:17:38 localserver kernel: SCSI host 0 channel 0 reset (pid 
22992662) timed out - trying harder
May 27 01:17:38 localserver kernel: SCSI bus is being reset for host 0 
channel 0.
May 27 01:17:38 localserver kernel: (scsi0:0:3:0) Synchronous at 20.0 
Mbyte/sec, offset 15.
May 27 01:17:38 localserver kernel: (scsi0:0:5:0) Synchronous at 20.0 
Mbyte/sec, offset 15.

until around 2:30, when I get a bunch of these:
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without 
command for SCB 1, SCB flags 0x0, cmd 0x0
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without 
command for SCB 0, SCB flags 0x0, cmd 0x0
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without 
command for SCB 1, SCB flags 0x0, cmd 0x0
May 27 02:36:50 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without 
command for SCB 0, SCB flags 0x0, cmd 0x0
M

followed by
May 27 02:36:56 localserver kernel: (scsi0:-1:-1:-1) CMDCMPLT without 
command for SCB 0, SCB flags 0x0, cmd 0x0
May 27 02:36:56 localserver kernel: st0: Error with sense data: 
[valid=0] Info fld=0x0, Current st09:00: sense key Unit Attention
May 27 02:36:56 localserver kernel: Additional sense indicates Power on, 
reset, or bus device reset occurred

then entries like this...
May 27 02:37:20 localserver kernel: SCSI host 0 abort (pid 22992662) 
timed out - resetting
May 27 02:37:22 localserver kernel: SCSI bus is being reset for host 0 
channel 0.
May 27 02:37:22 localserver kernel: (scsi0:0:5:0) Synchronous at 20.0 
Mbyte/sec, offset 15.
May 27 02:37:54 localserver kernel: SCSI host 0 channel 0 reset (pid 
22992662) timed out - trying harder
May 27 02:37:54 localserver kernel: SCSI bus is being reset for host 0 
channel 0.
May 27 02:37:54 localserver kernel: (scsi0:0:3:0) Synchronous at 20.0 
Mbyte/sec, offset 15.

until May 27 04:06:58

FWIW:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 03 Lun: 00
   Vendor: ARCHIVE  Model: Python 06408-XXX Rev: 8130
   Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 05 Lun: 00
   Vendor: NEC      Model: CD-ROM DRIVE:466 Rev: 1.06
   Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
   Vendor: DELL     Model: PERCRAID RAID5   Rev: 0001
   Type:   Direct-Access                    ANSI SCSI revision: 02



Does this mean my SCSI controller for my tape drive and CD-ROM is going 
(or gone)?

Clueful insight welcome (since I seem to be lacking in cluefulness 
lately...).

Thanks,
Tim



More information about the Linux-users mailing list