PDA

Archiv verlassen und diese Seite im Standarddesign anzeigen : RAID5 nach PC Neustart meist defekt, Ursache?



Tiroler
07.03.09, 14:18
Hallo,

ich habe für meinen Keller VDR Server ein RAID 5 System mit Debian Etch etwa seit 6 Monaten laufen. Das Gerät läuft eigentlich höchst zufriedenstellend, wenn mir md2 nicht immer nach dem Neustart den Dienst verweigern würde. Die md0 (RAID1, boot Partition) sowie md1 (RAID5, System) laufen tadellos, nur md2 (Daten) meldet immer dass nur mehr eine Platte okay sei... ich kann zwar per "mdadm -assemble /dev/md2 /dev/sda3 /dev/sdc3" das RAID wieder zum Leben erwecken, wenn ich aber dann /dev/sdb3 hinzufügen möchte, dann bekomme ich ein Failed und md2 verweigert den Dienst komplett. Ich hatte das Spiel jetzt schon 2 Mal und jedes Mal habe ich md2 gelöscht und neu angelegt und die Daten dann wieder vom Backup retourkopiert... aber das kann ja nicht Sinn der Sache sein, schließlich möchte ich mir mit den RAID5 meine Sorgen vertreiben und nicht welche machen :p
Einen Plattenfehler schließe ich eigentlich aus, da die Herstellertools keinen Fehler finden und das System im Betrieb ansonsten wunderbar funktioniert. Stromausfälle fange ich mit einer USV ab und das System habe ich die letzten 6 Monate nur für Wartungszwecke neu gebootet...
Ich getraue mich jetzt gar nicht mehr neu zu starten, aber gelegentlich kommt man ja nicht umhin - was könnte also die Ursache sein?
Momentan fahre ich das System so:



srv01:~# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sda3[0] sdc3[2]
1855475200 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]

md1 : active raid5 sda2[0] sdc2[2] sdb2[1]
97658880 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sda1[0] sdc1[2](S) sdb1[1]
96256 blocks [2/2] [UU]

unused devices: <none>


Es ist also beim md2 die Partition /dev/sdb3 nicht mehr aktiv dabei, womit RAID5 also im Notbetrieb läuft...

Ich hoffe jemand hat eine Idee, hier noch das Syslog einer versuchten Reparatur:



Mar 7 00:46:45 srv01 kernel: [ 1003.205583] md: md2 stopped.
Mar 7 00:46:45 srv01 kernel: [ 1003.205634] md: unbind<sda3>
Mar 7 00:46:45 srv01 kernel: [ 1003.205674] md: export_rdev(sda3)
Mar 7 00:46:45 srv01 kernel: [ 1003.205716] md: unbind<sdc3>
Mar 7 00:46:45 srv01 kernel: [ 1003.205753] md: export_rdev(sdc3)
Mar 7 00:46:45 srv01 kernel: [ 1003.205793] md: unbind<sdb3>
Mar 7 00:46:45 srv01 kernel: [ 1003.205830] md: export_rdev(sdb3)
Mar 7 00:47:19 srv01 kernel: [ 1037.810629] md: md2 stopped.
Mar 7 00:47:19 srv01 kernel: [ 1037.880537] md: bind<sdc3>
Mar 7 00:47:19 srv01 kernel: [ 1037.880876] md: bind<sda3>
Mar 7 00:47:19 srv01 kernel: [ 1037.913797] raid5: device sda3 operational as raid disk 0
Mar 7 00:47:19 srv01 kernel: [ 1037.913844] raid5: device sdc3 operational as raid disk 2
Mar 7 00:47:19 srv01 kernel: [ 1037.914189] raid5: allocated 3169kB for md2
Mar 7 00:47:19 srv01 kernel: [ 1037.914279] RAID5 conf printout:
Mar 7 00:47:19 srv01 kernel: [ 1037.914314] --- rd:3 wd:2
Mar 7 00:47:19 srv01 kernel: [ 1037.914350] disk 0, o:1, dev:sda3
Mar 7 00:47:19 srv01 kernel: [ 1037.914386] disk 2, o:1, dev:sdc3
Mar 7 00:47:59 srv01 kernel: [ 1078.044864] md: bind<sdb3>
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] RAID5 conf printout:
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] --- rd:3 wd:2
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] disk 0, o:1, dev:sda3
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] disk 1, o:1, dev:sdb3
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] disk 2, o:1, dev:sdc3
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] md: recovery of RAID array md2
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Mar 7 00:47:59 srv01 kernel: [ 1078.069310] md: using 128k window, over a total of 927737600 blocks.
Mar 7 00:48:01 srv01 kernel: [ 1079.114223] dhfis 0x40 dmafis 0x40 sdbfis 0xBE
Mar 7 00:48:01 srv01 kernel: [ 1079.114223] res 41/40:3c:0b:21:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:01 srv01 last message repeated 5 times
Mar 7 00:48:01 srv01 kernel: [ 1079.114223] ata3: hard resetting link
Mar 7 00:48:01 srv01 kernel: [ 1079.432035] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 7 00:48:01 srv01 kernel: [ 1079.448271] ata3.00: configured for UDMA/133
Mar 7 00:48:01 srv01 kernel: [ 1079.448321] ata3: EH complete
Mar 7 00:48:01 srv01 kernel: [ 1079.448545] sd 3:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
Mar 7 00:48:01 srv01 kernel: [ 1079.448607] sd 3:0:0:0: [sdc] Write Protect is off
Mar 7 00:48:01 srv01 kernel: [ 1079.448668] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 7 00:48:03 srv01 kernel: [ 1081.252174] dhfis 0x3F dmafis 0x1 sdbfis 0x0
Mar 7 00:48:03 srv01 kernel: [ 1081.252649] res 41/40:2c:0b:22:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:03 srv01 kernel: [ 1081.252827] res 41/40:2c:0b:22:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:03 srv01 kernel: [ 1081.253004] res 41/40:2c:0b:22:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:03 srv01 kernel: [ 1081.253183] res 41/40:2c:0b:22:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:03 srv01 kernel: [ 1081.253361] res 41/40:2c:0b:22:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:03 srv01 kernel: [ 1081.253538] res 41/40:2c:0b:22:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:03 srv01 kernel: [ 1081.253718] ata3: hard resetting link
Mar 7 00:48:03 srv01 kernel: [ 1081.572037] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 7 00:48:03 srv01 kernel: [ 1081.588270] ata3.00: configured for UDMA/133
Mar 7 00:48:03 srv01 kernel: [ 1081.588321] ata3: EH complete
Mar 7 00:48:03 srv01 kernel: [ 1081.588492] sd 3:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
Mar 7 00:48:03 srv01 kernel: [ 1081.588554] sd 3:0:0:0: [sdc] Write Protect is off
Mar 7 00:48:03 srv01 kernel: [ 1081.588614] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 7 00:48:05 srv01 kernel: [ 1083.300645] dhfis 0x20 dmafis 0x20 sdbfis 0x18
Mar 7 00:48:05 srv01 kernel: [ 1083.300933] res 41/40:2c:0b:20:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:05 srv01 kernel: [ 1083.301115] ata3: hard resetting link
Mar 7 00:48:05 srv01 kernel: [ 1083.624628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 7 00:48:05 srv01 kernel: [ 1083.640860] ata3.00: configured for UDMA/133
Mar 7 00:48:05 srv01 kernel: [ 1083.640905] ata3: EH complete
Mar 7 00:48:05 srv01 kernel: [ 1083.641019] sd 3:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
Mar 7 00:48:05 srv01 kernel: [ 1083.641084] sd 3:0:0:0: [sdc] Write Protect is off
Mar 7 00:48:05 srv01 kernel: [ 1083.641143] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 7 00:48:07 srv01 kernel: [ 1085.456915] dhfis 0x1 dmafis 0x1 sdbfis 0x0
Mar 7 00:48:07 srv01 kernel: [ 1085.456915] res 41/40:04:0b:20:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:07 srv01 kernel: [ 1085.456915] ata3: hard resetting link
Mar 7 00:48:07 srv01 kernel: [ 1085.776037] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 7 00:48:07 srv01 kernel: [ 1085.792269] ata3.00: configured for UDMA/133
Mar 7 00:48:07 srv01 kernel: [ 1085.792314] ata3: EH complete
Mar 7 00:48:07 srv01 kernel: [ 1085.792429] sd 3:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
Mar 7 00:48:07 srv01 kernel: [ 1085.792491] sd 3:0:0:0: [sdc] Write Protect is off
Mar 7 00:48:07 srv01 kernel: [ 1085.792550] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 7 00:48:09 srv01 kernel: [ 1087.604953] dhfis 0x1 dmafis 0x1 sdbfis 0x0
Mar 7 00:48:09 srv01 kernel: [ 1087.604953] res 41/40:04:0b:20:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:09 srv01 kernel: [ 1087.604953] ata3: hard resetting link
Mar 7 00:48:09 srv01 kernel: [ 1087.924038] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 7 00:48:09 srv01 kernel: [ 1087.940269] ata3.00: configured for UDMA/133
Mar 7 00:48:09 srv01 kernel: [ 1087.940314] ata3: EH complete
Mar 7 00:48:09 srv01 kernel: [ 1087.940428] sd 3:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
Mar 7 00:48:09 srv01 kernel: [ 1087.940489] sd 3:0:0:0: [sdc] Write Protect is off
Mar 7 00:48:09 srv01 kernel: [ 1087.940549] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 7 00:48:11 srv01 kernel: [ 1089.877391] dhfis 0x1 dmafis 0x1 sdbfis 0x0
Mar 7 00:48:11 srv01 kernel: [ 1089.877391] res 41/40:04:0b:20:d5/40:00:05:00:00/40 Emask 0x9 (media error)
Mar 7 00:48:11 srv01 kernel: [ 1089.877391] ata3: hard resetting link
Mar 7 00:48:12 srv01 kernel: [ 1090.196035] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 7 00:48:12 srv01 kernel: [ 1090.212268] ata3.00: configured for UDMA/133
Mar 7 00:48:12 srv01 kernel: [ 1090.212321] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Mar 7 00:48:12 srv01 kernel: [ 1090.212397] sd 3:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor]
Mar 7 00:48:12 srv01 kernel: [ 1090.212531] Descriptor sense data with sense descriptors (in hex):
Mar 7 00:48:12 srv01 kernel: [ 1090.212592] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Mar 7 00:48:12 srv01 kernel: [ 1090.213011] 05 d5 20 0b
Mar 7 00:48:12 srv01 kernel: [ 1090.213157] sd 3:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed
Mar 7 00:48:12 srv01 kernel: [ 1090.213295] __ratelimit: 7 messages suppressed
Mar 7 00:48:12 srv01 kernel: [ 1090.213333] raid5:md2: read error not correctable (sector 1536 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213377] raid5: Operation continuing on 1 devices.
Mar 7 00:48:12 srv01 kernel: [ 1090.213454] raid5:md2: read error not correctable (sector 1544 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213495] raid5:md2: read error not correctable (sector 1552 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213541] raid5:md2: read error not correctable (sector 1560 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213582] raid5:md2: read error not correctable (sector 1568 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213623] raid5:md2: read error not correctable (sector 1576 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213664] raid5:md2: read error not correctable (sector 1584 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213706] raid5:md2: read error not correctable (sector 1592 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213747] raid5:md2: read error not correctable (sector 1600 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213788] raid5:md2: read error not correctable (sector 1608 on sdc3).
Mar 7 00:48:12 srv01 kernel: [ 1090.213849] ata3: EH complete
Mar 7 00:48:12 srv01 kernel: [ 1090.214015] sd 3:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
Mar 7 00:48:12 srv01 kernel: [ 1090.214113] sd 3:0:0:0: [sdc] Write Protect is off
Mar 7 00:48:12 srv01 kernel: [ 1090.214174] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 7 00:48:12 srv01 kernel: [ 1090.245603] md: md2: recovery done.
Mar 7 00:48:12 srv01 kernel: [ 1090.384188] RAID5 conf printout:
Mar 7 00:48:12 srv01 kernel: [ 1090.384230] --- rd:3 wd:1
Mar 7 00:48:12 srv01 kernel: [ 1090.384275] disk 0, o:1, dev:sda3
Mar 7 00:48:12 srv01 kernel: [ 1090.384312] disk 1, o:1, dev:sdb3
Mar 7 00:48:12 srv01 kernel: [ 1090.384358] disk 2, o:0, dev:sdc3
Mar 7 00:48:12 srv01 kernel: [ 1090.384837] RAID5 conf printout:
Mar 7 00:48:12 srv01 kernel: [ 1090.384877] --- rd:3 wd:1
Mar 7 00:48:12 srv01 kernel: [ 1090.384913] disk 0, o:1, dev:sda3
Mar 7 00:48:12 srv01 kernel: [ 1090.384949] disk 2, o:0, dev:sdc3
Mar 7 00:48:12 srv01 kernel: [ 1090.384991] RAID5 conf printout:
Mar 7 00:48:12 srv01 kernel: [ 1090.385027] --- rd:3 wd:1
Mar 7 00:48:12 srv01 kernel: [ 1090.385062] disk 0, o:1, dev:sda3
Mar 7 00:48:12 srv01 kernel: [ 1090.385098] disk 2, o:0, dev:sdc3
Mar 7 00:48:12 srv01 kernel: [ 1090.385386] RAID5 conf printout:
Mar 7 00:48:12 srv01 kernel: [ 1090.385423] --- rd:3 wd:1
Mar 7 00:48:12 srv01 kernel: [ 1090.385459] disk 0, o:1, dev:sda3
Mar 7 00:48:35 srv01 kernel: [ 1113.975949] md: md2 stopped.
Mar 7 00:48:35 srv01 kernel: [ 1113.975999] md: unbind<sdb3>
Mar 7 00:48:35 srv01 kernel: [ 1113.976050] md: export_rdev(sdb3)
Mar 7 00:48:35 srv01 kernel: [ 1113.976095] md: unbind<sda3>
Mar 7 00:48:35 srv01 kernel: [ 1113.976132] md: export_rdev(sda3)
Mar 7 00:48:35 srv01 kernel: [ 1113.976471] md: unbind<sdc3>
Mar 7 00:48:35 srv01 kernel: [ 1113.976510] md: export_rdev(sdc3)
Mar 7 00:48:46 srv01 kernel: [ 1124.268122] md: md2 stopped.
Mar 7 00:48:46 srv01 kernel: [ 1124.314191] md: bind<sdb3>
Mar 7 00:48:46 srv01 kernel: [ 1124.314191] md: bind<sda3>
Mar 7 00:49:06 srv01 kernel: [ 1144.841977] md: md2 stopped.
Mar 7 00:49:06 srv01 kernel: [ 1144.842033] md: unbind<sda3>
Mar 7 00:49:06 srv01 kernel: [ 1144.842077] md: export_rdev(sda3)
Mar 7 00:49:06 srv01 kernel: [ 1144.842120] md: unbind<sdb3>
Mar 7 00:49:06 srv01 kernel: [ 1144.842157] md: export_rdev(sdb3)
Mar 7 00:49:16 srv01 kernel: [ 1154.933888] md: md2 stopped.
Mar 7 00:49:16 srv01 kernel: [ 1154.959540] md: bind<sdc3>
Mar 7 00:49:16 srv01 kernel: [ 1154.959862] md: bind<sda3>
Mar 7 00:49:16 srv01 kernel: [ 1154.992756] raid5: device sda3 operational as raid disk 0
Mar 7 00:49:16 srv01 kernel: [ 1154.992803] raid5: device sdc3 operational as raid disk 2
Mar 7 00:49:16 srv01 kernel: [ 1154.993152] raid5: allocated 3169kB for md2
Mar 7 00:49:16 srv01 kernel: [ 1154.993242] RAID5 conf printout:
Mar 7 00:49:16 srv01 kernel: [ 1154.993278] --- rd:3 wd:2
Mar 7 00:49:16 srv01 kernel: [ 1154.993314] disk 0, o:1, dev:sda3
Mar 7 00:49:16 srv01 kernel: [ 1154.993351] disk 2, o:1, dev:sdc3
Mar 7 00:49:39 srv01 kernel: [ 1177.320664] kjournald starting. Commit interval 5 seconds
Mar 7 00:49:39 srv01 kernel: [ 1177.358236] EXT3 FS on md2, internal journal
Mar 7 00:49:39 srv01 kernel: [ 1177.358236] EXT3-fs: mounted filesystem with ordered data mode.


Hardware:
Gigabyte M55Plus-S3G
3x Samsung 1TB HDD 7200rpm sATA 32MB (HD103UJ)

Vielen Dank für eure Hilfe,
Martin