안녕하세요, 영자 디지문입니다.
11월 28일, 29일 동일 증상 및 원인으로 서버가 다운된 사실이 있었음을 알립니다.
1차 장애(11/28)
서비스 단절 시간 : 15:00 ~ 16:37
장애 최초 확인 당시 네트워크 핑은 가나 웹, FTP, SSH 등 서비스 접속 불가
2차 장애(11/29)
서비스 단절 시간: 20:00~ 22:22
증상 동일
▼ 1차 장애 발생 전후로 시스템 로그 상에 아래와 같은 패턴의 로그가 간헐적으로 발생
(SATA 전송대역폭이 UDMA/33으로 극저하)
Nov 29 12:09:52 digimoon kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted: P ) Nov 29 12:09:52 digimoon kernel: [<f8924391>] ata_sff_hsm_move+0x69a/0x6e9 [libata] Nov 29 12:09:52 digimoon kernel: [<f896e2d3>] direct_read_completion+0x63/0x70 [hptmv] Nov 29 12:09:52 digimoon kernel: [<f896fcde>] CheckPendingCall+0x3e/0x60 [hptmv] Nov 29 12:09:52 digimoon kernel: [<f8925083>] ata_sff_interrupt+0x12b/0x1bc [libata] Nov 29 12:09:52 digimoon kernel: [<c044f1b5>] handle_IRQ_event+0x45/0x8c Nov 29 12:09:52 digimoon kernel: [<c044f280>] __do_IRQ+0x84/0xd6 Nov 29 12:09:52 digimoon kernel: [<c044f1fc>] __do_IRQ+0x0/0xd6 Nov 29 12:09:52 digimoon kernel: [<c04074b2>] do_IRQ+0x99/0xc3 Nov 29 12:09:52 digimoon kernel: [<c0405946>] common_interrupt+0x1a/0x20 Nov 29 12:09:52 digimoon kernel: [<c0403ce7>] mwait_idle+0x25/0x38 Nov 29 12:09:52 digimoon kernel: [<c0403ca8>] cpu_idle+0x9f/0xb9 Nov 29 12:09:52 digimoon kernel: ======================= Nov 29 12:09:52 digimoon kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Nov 29 12:09:52 digimoon kernel: ata2.00: BMDMA stat 0x26 Nov 29 12:09:52 digimoon kernel: ata2.00: cmd 35/00:00:97:73:8d/00:04:02:00:00/e0 tag 0 dma 524288 out Nov 29 12:09:52 digimoon kernel: res 51/84:78:1f:75:8d/84:02:02:00:00/e0 Emask 0x30 (host bus error) Nov 29 12:09:52 digimoon kernel: ata2.00: status: { DRDY ERR } Nov 29 12:09:52 digimoon kernel: ata2.00: error: { ICRC ABRT } Nov 29 12:09:52 digimoon kernel: ata2: soft resetting link Nov 29 12:09:53 digimoon kernel: ata2.00: configured for UDMA/33 Nov 29 12:09:53 digimoon kernel: ata2: EH complete Nov 29 12:09:53 digimoon kernel: SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB) Nov 29 12:09:53 digimoon kernel: sdb: Write Protect is off Nov 29 12:09:53 digimoon kernel: SCSI device sdb: drive cache: write back
|
분석 결과 장애 발생 당시의 커널 버전이 2.6.18-194.8.1.el5인 점으로 보아 아래 링크의 사례에 해당하는 것으로 추정(커널 버그)
https://bugzilla.redhat.com/show_bug.cgi?id=524243
https://bugzilla.kernel.org/show_bug.cgi?id=11065
커널 업그레이드 후 UDMA133으로 회복되는 것을 확인
Nov 29 22:22:29 digimoon kernel: SCSI subsystem initialized Nov 29 22:22:29 digimoon kernel: ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 209 Nov 29 22:22:29 digimoon kernel: ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] Nov 29 22:22:29 digimoon kernel: scsi0 : ata_piix Nov 29 22:22:29 digimoon kernel: scsi1 : ata_piix Nov 29 22:22:29 digimoon kernel: ata1: SATA max UDMA/133 cmd 0xd400 ctl 0xd080 bmdma 0xc880 irq 209 Nov 29 22:22:29 digimoon kernel: ata2: SATA max UDMA/133 cmd 0xd000 ctl 0xcc00 bmdma 0xc888 irq 209 Nov 29 22:22:29 digimoon kernel: ata1.00: HPA detected: current 625140335, native 625142448 Nov 29 22:22:29 digimoon kernel: ata1.00: ATA-8: WDC WD3200AAKS-00V1A0, 05.01D05, max UDMA/133 Nov 29 22:22:29 digimoon kernel: ata1.00: 625140335 sectors, multi 16: LBA48 NCQ (depth 0/32) Nov 29 22:22:29 digimoon kernel: ata1.00: configured for UDMA/133 Nov 29 22:22:29 digimoon kernel: ata2.00: ATA-8: WDC WD3200AAKS-00V1A0, 05.01D05, max UDMA/133 Nov 29 22:22:29 digimoon kernel: ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 0/32) Nov 29 22:22:29 digimoon kernel: ata2.00: configured for UDMA/133 Nov 29 22:22:29 digimoon kernel: Vendor: ATA Model: WDC WD3200AAKS-0 Rev: 05.0 Nov 29 22:22:29 digimoon kernel: Type: Direct-Access ANSI SCSI revision: 05 Nov 29 22:22:29 digimoon kernel: SCSI device sda: 625140335 512-byte hdwr sectors (320072 MB) Nov 29 22:22:29 digimoon kernel: sda: Write Protect is off Nov 29 22:22:29 digimoon kernel: SCSI device sda: drive cache: write back Nov 29 22:22:29 digimoon kernel: SCSI device sda: 625140335 512-byte hdwr sectors (320072 MB) Nov 29 22:22:29 digimoon kernel: sda: Write Protect is off Nov 29 22:22:29 digimoon kernel: SCSI device sda: drive cache: write back
|
입주자 여러분의 불편이 없는 원활한 서버 운영을 위해 최선을 다하도록 하겠습니다.
덕분에 서버 속도도 많이 원활해진 것 같습니다. 백업 시간도 꽤 단축되었고요.
항상 힘써주시니 감사합니다 ^^*