Anzeige:
Ergebnis 1 bis 3 von 3

Thema: Raid 6 mit Cache crash

  1. #1
    /linux/user Avatar von torsten_boese
    Registriert seit
    Dec 2003
    Beiträge
    681

    Raid 6 mit Cache crash

    Hallo zusammen,

    ich habe folgendes System aufgesetzt:
    Debian 10, Raid 6 (6 Devices + 1 Device(SSD) als Write-Cache).
    Das Debian läuft auf einer seperaten Festplatte.

    Code:
    fdisk -l |grep 'Disk /dev/sd'
    Disk /dev/sda: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk /dev/sdb: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk /dev/sdc: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk /dev/sdd: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk /dev/sde: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk /dev/sdf: 111,8 GiB, 120034123776 bytes, 234441648 sectors
    Disk /dev/sdg: 149,1 GiB, 160041885696 bytes, 312581808 sectors
    Disk /dev/sdh: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
    Code:
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
    md0 : active raid6 sdh1[1] sda1[0] sdf3[6](J) sde1[2] sdd1[3] sdc1[4] sdb1[5]
          15627540480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]

    Den Cache habe ich nach Seite 83 eingebunden:
    Code:
    mdadm --manage /dev/md0 --add-journal /dev/sdf3
    Auf dem Raid habe ich weiter ein LVM-Volume mit Lese-Cache aufgesetzt. Auf dem darauf befindlichen Dateisystem einnen LUKS-Container als Datei eingebunden.

    Code:
    root@fileserver:~# df -h
    Dateisystem                       Größe Benutzt Verf. Verw% Eingehängt auf
    udev                               974M       0  974M    0% /dev
    tmpfs                              198M    9,4M  189M    5% /run
    /dev/sdg4                           28G     11G   16G   40% /
    tmpfs                              990M       0  990M    0% /dev/shm
    tmpfs                              5,0M       0  5,0M    0% /run/lock
    tmpfs                              990M       0  990M    0% /sys/fs/cgroup
    /dev/sdg1                          453M     88M  338M   21% /boot
    /dev/sdg6                           20G    7,4G   12G   40% /var
    /dev/mapper/raid6--4T-r6_4T_files   15T     12T  1,8T   88% /files
    tmpfs                              198M       0  198M    0% /run/user/0
    tmpfs                              198M       0  198M    0% /run/user/1000
    /dev/mapper/tbna-home              2,9T    2,5T  312G   89% /files/sicherung/tbna
    So weit so gut. Nach dem Transfer von ca. 1 TB kommen in der Kern.log folgende Meldungen:

    Code:
    Mar 22 21:13:48 fileserver kernel: [36055.261296] nfsd: peername failed (err 107)!
    Mar 22 21:54:12 fileserver kernel: [38479.523603] usb 10-5.4: USB disconnect, device number 22
    Mar 22 21:54:12 fileserver kernel: [38479.523707] usb 10-5.4.1: USB disconnect, device number 23
    Mar 23 00:26:23 fileserver kernel: [47610.442518] INFO: task khugepaged:38 blocked for more than 120 seconds.
    Mar 23 00:26:23 fileserver kernel: [47610.442665]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:23 fileserver kernel: [47610.442784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:23 fileserver kernel: [47610.442933] khugepaged      D    0    38      2 0x80000000
    Mar 23 00:26:23 fileserver kernel: [47610.443046] Call Trace:
    Mar 23 00:26:23 fileserver kernel: [47610.443116]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:23 fileserver kernel: [47610.443199]  schedule+0x28/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.443269]  io_schedule+0x12/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.443346]  wbt_wait+0x205/0x300
    Mar 23 00:26:23 fileserver kernel: [47610.443420]  ? wbt_wait+0x300/0x300
    Mar 23 00:26:23 fileserver kernel: [47610.443499]  rq_qos_throttle+0x31/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.443582]  blk_mq_make_request+0x111/0x530
    Mar 23 00:26:23 fileserver kernel: [47610.443675]  generic_make_request+0x1a4/0x400
    Mar 23 00:26:23 fileserver kernel: [47610.443770]  ? end_swap_bio_read+0xc0/0xc0
    Mar 23 00:26:23 fileserver kernel: [47610.443857]  submit_bio+0x45/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.443932]  ? get_swap_bio+0xbb/0xf0
    Mar 23 00:26:23 fileserver kernel: [47610.444011]  __swap_writepage+0xf2/0x3c0
    Mar 23 00:26:23 fileserver kernel: [47610.444095]  ? __frontswap_store+0x6e/0xf2
    Mar 23 00:26:23 fileserver kernel: [47610.444185]  pageout.isra.49+0x117/0x340
    Mar 23 00:26:23 fileserver kernel: [47610.444272]  shrink_page_list+0xa47/0xc70
    Mar 23 00:26:23 fileserver kernel: [47610.444361]  shrink_inactive_list+0x207/0x590
    Mar 23 00:26:23 fileserver kernel: [47610.444456]  shrink_node_memcg+0x20c/0x780
    Mar 23 00:26:23 fileserver kernel: [47610.444546]  shrink_node+0xcf/0x450
    Mar 23 00:26:23 fileserver kernel: [47610.444625]  do_try_to_free_pages+0xc6/0x370
    Mar 23 00:26:23 fileserver kernel: [47610.444716]  try_to_free_pages+0xf0/0x1b0
    Mar 23 00:26:23 fileserver kernel: [47610.444805]  __alloc_pages_slowpath+0x35a/0xcb0
    Mar 23 00:26:23 fileserver kernel: [47610.444900]  ? __switch_to+0x8c/0x440
    Mar 23 00:26:23 fileserver kernel: [47610.444980]  ? put_prev_entity+0x20/0x100
    Mar 23 00:26:23 fileserver kernel: [47610.445068]  __alloc_pages_nodemask+0x28b/0x2b0
    Mar 23 00:26:23 fileserver kernel: [47610.445165]  khugepaged_alloc_page+0x17/0x50
    Mar 23 00:26:23 fileserver kernel: [47610.445256]  khugepaged+0xb6e/0x2110
    Mar 23 00:26:23 fileserver kernel: [47610.445340]  ? finish_wait+0x80/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.445419]  ? collapse_shmem+0xc00/0xc00
    Mar 23 00:26:23 fileserver kernel: [47610.445503]  kthread+0x112/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.445575]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:23 fileserver kernel: [47610.445656]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.445761] INFO: task md0_reclaim:188 blocked for more than 120 seconds.
    Mar 23 00:26:23 fileserver kernel: [47610.445895]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:23 fileserver kernel: [47610.446012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:23 fileserver kernel: [47610.446162] md0_reclaim     D    0   188      2 0x80000000
    Mar 23 00:26:23 fileserver kernel: [47610.446274] Call Trace:
    Mar 23 00:26:23 fileserver kernel: [47610.446340]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:23 fileserver kernel: [47610.446524]  schedule+0x28/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.446601]  io_schedule+0x12/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.446677]  wbt_wait+0x205/0x300
    Mar 23 00:26:23 fileserver kernel: [47610.446751]  ? wbt_wait+0x300/0x300
    Mar 23 00:26:23 fileserver kernel: [47610.446829]  rq_qos_throttle+0x31/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.446911]  blk_mq_make_request+0x111/0x530
    Mar 23 00:26:23 fileserver kernel: [47610.447004]  generic_make_request+0x1a4/0x400
    Mar 23 00:26:23 fileserver kernel: [47610.447097]  ? sched_clock+0x5/0x10
    Mar 23 00:26:23 fileserver kernel: [47610.447174]  submit_bio+0x45/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.447264]  ? md_super_write.part.63+0x90/0x120 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.447389]  md_update_sb.part.65+0x3a3/0x8d0 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.447510]  r5l_do_reclaim+0x32d/0x3b0 [raid456]
    Mar 23 00:26:23 fileserver kernel: [47610.447624]  ? md_rdev_init+0xb0/0xb0 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.447726]  ? r5l_reclaim_thread+0xe2/0x1f0 [raid456]
    Mar 23 00:26:23 fileserver kernel: [47610.447844]  ? md_rdev_init+0xb0/0xb0 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.447949]  md_thread+0x94/0x150 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.448039]  ? finish_wait+0x80/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.451995]  kthread+0x112/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.455936]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:23 fileserver kernel: [47610.459862]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.463730] INFO: task jbd2/dm-4-8:530 blocked for more than 120 seconds.
    Mar 23 00:26:23 fileserver kernel: [47610.467610]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:23 fileserver kernel: [47610.471478] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:23 fileserver kernel: [47610.475411] jbd2/dm-4-8     D    0   530      2 0x80000000
    Mar 23 00:26:23 fileserver kernel: [47610.479364] Call Trace:
    Mar 23 00:26:23 fileserver kernel: [47610.483404]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:23 fileserver kernel: [47610.487224]  ? bio_alloc_bioset+0xdc/0x220
    Mar 23 00:26:23 fileserver kernel: [47610.491041]  schedule+0x28/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.494873]  md_write_start+0x14b/0x220 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.498733]  ? finish_wait+0x80/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.502601]  ? finish_wait+0x80/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.506452]  raid5_make_request+0x83/0xb70 [raid456]
    Mar 23 00:26:23 fileserver kernel: [47610.510289]  ? part_round_stats+0xbb/0x170
    Mar 23 00:26:23 fileserver kernel: [47610.514154]  ? finish_wait+0x80/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.518034]  ? __split_and_process_non_flush+0x159/0x1f0 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.521955]  ? finish_wait+0x80/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.525881]  md_handle_request+0x119/0x190 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.529851]  md_make_request+0x78/0x160 [md_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.533831]  generic_make_request+0x1a4/0x400
    Mar 23 00:26:23 fileserver kernel: [47610.537802]  submit_bio+0x45/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.541763]  ? guard_bio_eod+0x32/0x100
    Mar 23 00:26:23 fileserver kernel: [47610.545727]  submit_bh_wbc+0x163/0x190
    Mar 23 00:26:23 fileserver kernel: [47610.549716]  jbd2_journal_commit_transaction+0x5d8/0x1820 [jbd2]
    Mar 23 00:26:23 fileserver kernel: [47610.553780]  kjournald2+0xbd/0x270 [jbd2]
    Mar 23 00:26:23 fileserver kernel: [47610.557844]  ? finish_wait+0x80/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.561908]  ? commit_timeout+0x10/0x10 [jbd2]
    Mar 23 00:26:23 fileserver kernel: [47610.565967]  kthread+0x112/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.570007]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:23 fileserver kernel: [47610.574149]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.582528] INFO: task loop0:1465 blocked for more than 120 seconds.
    Mar 23 00:26:23 fileserver kernel: [47610.587014]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:23 fileserver kernel: [47610.591492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:23 fileserver kernel: [47610.596079] loop0           D    0  1465      2 0x80000000
    Mar 23 00:26:23 fileserver kernel: [47610.600713] Call Trace:
    Mar 23 00:26:23 fileserver kernel: [47610.605312]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:23 fileserver kernel: [47610.609929]  ? bit_wait_timeout+0x90/0x90
    Mar 23 00:26:23 fileserver kernel: [47610.614567]  schedule+0x28/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.619194]  io_schedule+0x12/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.623761]  bit_wait_io+0xd/0x50
    Mar 23 00:26:23 fileserver kernel: [47610.628247]  __wait_on_bit+0x73/0x90
    Mar 23 00:26:23 fileserver kernel: [47610.632669]  out_of_line_wait_on_bit+0x91/0xb0
    Mar 23 00:26:23 fileserver kernel: [47610.637051]  ? init_wait_var_entry+0x40/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.641462]  do_get_write_access+0x2d5/0x430 [jbd2]
    Mar 23 00:26:23 fileserver kernel: [47610.645917]  ? ext4_dirty_inode+0x46/0x60 [ext4]
    Mar 23 00:26:23 fileserver kernel: [47610.650335]  jbd2_journal_get_write_access+0x37/0x50 [jbd2]
    Mar 23 00:26:23 fileserver kernel: [47610.654873]  __ext4_journal_get_write_access+0x36/0x70 [ext4]
    Mar 23 00:26:23 fileserver kernel: [47610.659446]  ext4_reserve_inode_write+0x96/0xc0 [ext4]
    Mar 23 00:26:23 fileserver kernel: [47610.664054]  ext4_mark_inode_dirty+0x51/0x1d0 [ext4]
    Mar 23 00:26:23 fileserver kernel: [47610.668637]  ? jbd2__journal_start+0xd9/0x1e0 [jbd2]
    Mar 23 00:26:23 fileserver kernel: [47610.673261]  ext4_dirty_inode+0x46/0x60 [ext4]
    Mar 23 00:26:23 fileserver kernel: [47610.677834]  __mark_inode_dirty+0x1ba/0x380
    Mar 23 00:26:23 fileserver kernel: [47610.682432]  generic_update_time+0xb6/0xd0
    Mar 23 00:26:23 fileserver kernel: [47610.687021]  file_update_time+0xe1/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.691581]  __generic_file_write_iter+0x98/0x1c0
    Mar 23 00:26:23 fileserver kernel: [47610.696201]  ext4_file_write_iter+0xc6/0x3b0 [ext4]
    Mar 23 00:26:23 fileserver kernel: [47610.700755]  do_iter_readv_writev+0x13a/0x1b0
    Mar 23 00:26:23 fileserver kernel: [47610.705504]  do_iter_write+0x80/0x190
    Mar 23 00:26:23 fileserver kernel: [47610.710033]  lo_write_bvec+0x62/0x100 [loop]
    Mar 23 00:26:23 fileserver kernel: [47610.714553]  loop_queue_work+0x1c2/0x9b0 [loop]
    Mar 23 00:26:23 fileserver kernel: [47610.719090]  ? loop_info64_to_compat+0x220/0x220 [loop]
    Mar 23 00:26:23 fileserver kernel: [47610.723643]  kthread_worker_fn+0x7c/0x1c0
    Mar 23 00:26:23 fileserver kernel: [47610.728207]  kthread+0x112/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.732748]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:23 fileserver kernel: [47610.737304]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.742311] INFO: task kworker/u9:4:6226 blocked for more than 120 seconds.
    Mar 23 00:26:23 fileserver kernel: [47610.746895]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:23 fileserver kernel: [47610.751442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:23 fileserver kernel: [47610.756000] kworker/u9:4    D    0  6226      2 0x80000000
    Mar 23 00:26:23 fileserver kernel: [47610.760602] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
    Mar 23 00:26:23 fileserver kernel: [47610.765236] Call Trace:
    Mar 23 00:26:23 fileserver kernel: [47610.769832]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:23 fileserver kernel: [47610.774461]  ? __percpu_counter_sum+0x56/0x60
    Mar 23 00:26:23 fileserver kernel: [47610.779072]  schedule+0x28/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.783658]  schedule_preempt_disabled+0xa/0x10
    Mar 23 00:26:23 fileserver kernel: [47610.788309]  __mutex_lock.isra.8+0x2b5/0x4a0
    Mar 23 00:26:23 fileserver kernel: [47610.792962]  kcryptd_crypt+0x26e/0x3b0 [dm_crypt]
    Mar 23 00:26:23 fileserver kernel: [47610.797629]  process_one_work+0x1a7/0x3a0
    Mar 23 00:26:23 fileserver kernel: [47610.802294]  worker_thread+0x30/0x390
    Mar 23 00:26:23 fileserver kernel: [47610.806928]  ? create_worker+0x1a0/0x1a0
    Mar 23 00:26:23 fileserver kernel: [47610.811514]  kthread+0x112/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.816169]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:23 fileserver kernel: [47610.820711]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.825247] INFO: task kworker/0:2:6229 blocked for more than 120 seconds.
    Mar 23 00:26:23 fileserver kernel: [47610.829879]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:23 fileserver kernel: [47610.834557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:23 fileserver kernel: [47610.839340] kworker/0:2     D    0  6229      2 0x80000000
    Mar 23 00:26:23 fileserver kernel: [47610.844148] Workqueue: kcopyd do_work [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.848897] Call Trace:
    Mar 23 00:26:23 fileserver kernel: [47610.853618]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:23 fileserver kernel: [47610.858354]  schedule+0x28/0x80
    Mar 23 00:26:23 fileserver kernel: [47610.863065]  io_schedule+0x12/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.867784]  wbt_wait+0x205/0x300
    Mar 23 00:26:23 fileserver kernel: [47610.872497]  ? wbt_wait+0x300/0x300
    Mar 23 00:26:23 fileserver kernel: [47610.877158]  rq_qos_throttle+0x31/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.881776]  blk_mq_make_request+0x111/0x530
    Mar 23 00:26:23 fileserver kernel: [47610.886335]  generic_make_request+0x1a4/0x400
    Mar 23 00:26:23 fileserver kernel: [47610.890838]  ? bvec_alloc+0x51/0xe0
    Mar 23 00:26:23 fileserver kernel: [47610.895328]  submit_bio+0x45/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.899793]  ? bio_add_page+0x48/0x60
    Mar 23 00:26:23 fileserver kernel: [47610.904280]  dispatch_io+0x1ae/0x3f0 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.908777]  ? dm_copy_name_and_uuid+0xa0/0xa0 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.913337]  ? list_get_page+0x30/0x30 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.917886]  ? blk_mq_run_hw_queue+0x88/0x110
    Mar 23 00:26:23 fileserver kernel: [47610.922485]  ? dm_kcopyd_do_callback+0x40/0x40 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.927102]  dm_io+0x111/0x220 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.931706]  ? dm_copy_name_and_uuid+0xa0/0xa0 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.936351]  ? list_get_page+0x30/0x30 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.940992]  ? blk_mq_run_hw_queue+0x88/0x110
    Mar 23 00:26:23 fileserver kernel: [47610.945602]  run_io_job+0xe0/0x1d0 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.950175]  ? dm_kcopyd_do_callback+0x40/0x40 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.954787]  process_jobs+0x89/0x230 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.959368]  ? dm_kcopyd_client_destroy+0x140/0x140 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.964032]  do_work+0xb9/0xf0 [dm_mod]
    Mar 23 00:26:23 fileserver kernel: [47610.968644]  process_one_work+0x1a7/0x3a0
    Mar 23 00:26:23 fileserver kernel: [47610.973249]  worker_thread+0x30/0x390
    Mar 23 00:26:23 fileserver kernel: [47610.977847]  ? create_worker+0x1a0/0x1a0
    Mar 23 00:26:23 fileserver kernel: [47610.982462]  kthread+0x112/0x130
    Mar 23 00:26:23 fileserver kernel: [47610.987041]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:23 fileserver kernel: [47610.991635]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:23 fileserver kernel: [47610.996325] INFO: task kworker/u8:4:6232 blocked for more than 120 seconds.
    Mar 23 00:26:23 fileserver kernel: [47611.000721]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:23 fileserver kernel: [47611.005102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:23 fileserver kernel: [47611.009559] kworker/u8:4    D    0  6232      2 0x80000000
    Mar 23 00:26:23 fileserver kernel: [47611.014022] Workqueue: writeback wb_workfn (flush-253:4)
    Mar 23 00:26:23 fileserver kernel: [47611.018485] Call Trace:
    Mar 23 00:26:23 fileserver kernel: [47611.022951]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:23 fileserver kernel: [47611.027429]  schedule+0x28/0x80
    Mar 23 00:26:24 fileserver kernel: [47611.031910]  md_write_start+0x14b/0x220 [md_mod]
    Mar 23 00:26:24 fileserver kernel: [47611.036375]  ? finish_wait+0x80/0x80
    Mar 23 00:26:24 fileserver kernel: [47611.040782]  ? finish_wait+0x80/0x80
    Mar 23 00:26:24 fileserver kernel: [47611.045073]  raid5_make_request+0x83/0xb70 [raid456]
    Mar 23 00:26:24 fileserver kernel: [47611.049349]  ? part_round_stats+0xbb/0x170
    Mar 23 00:26:24 fileserver kernel: [47611.053574]  ? finish_wait+0x80/0x80
    Mar 23 00:26:24 fileserver kernel: [47611.057783]  ? __split_and_process_non_flush+0x159/0x1f0 [dm_mod]
    Mar 23 00:26:24 fileserver kernel: [47611.062047]  ? finish_wait+0x80/0x80
    Mar 23 00:26:24 fileserver kernel: [47611.066324]  md_handle_request+0x119/0x190 [md_mod]
    Mar 23 00:26:24 fileserver kernel: [47611.070620]  md_make_request+0x78/0x160 [md_mod]
    Mar 23 00:26:24 fileserver kernel: [47611.074919]  generic_make_request+0x1a4/0x400
    Mar 23 00:26:24 fileserver kernel: [47611.079202]  ? set_next_entity+0x96/0x1b0
    Mar 23 00:26:24 fileserver kernel: [47611.083489]  submit_bio+0x45/0x130
    Mar 23 00:26:24 fileserver kernel: [47611.087804]  ext4_io_submit+0x49/0x60 [ext4]
    Mar 23 00:26:24 fileserver kernel: [47611.092140]  ext4_bio_write_page+0x24a/0x4d0 [ext4]
    Mar 23 00:26:24 fileserver kernel: [47611.096480]  mpage_submit_page+0x53/0x70 [ext4]
    Mar 23 00:26:24 fileserver kernel: [47611.100851]  mpage_process_page_bufs+0xe7/0xf0 [ext4]
    Mar 23 00:26:24 fileserver kernel: [47611.105232]  mpage_prepare_extent_to_map+0x1db/0x2b0 [ext4]
    Mar 23 00:26:24 fileserver kernel: [47611.109658]  ext4_writepages+0x3da/0xf00 [ext4]
    Mar 23 00:26:24 fileserver kernel: [47611.113984]  ? __ip_queue_xmit+0x15d/0x410
    Mar 23 00:26:24 fileserver kernel: [47611.118293]  ? do_writepages+0x41/0xd0
    Mar 23 00:26:24 fileserver kernel: [47611.122517]  do_writepages+0x41/0xd0
    Mar 23 00:26:24 fileserver kernel: [47611.126643]  ? __tcp_push_pending_frames+0x31/0xd0
    Mar 23 00:26:24 fileserver kernel: [47611.130769]  ? tcp_sendmsg_locked+0x491/0xd50
    Mar 23 00:26:24 fileserver kernel: [47611.134905]  __writeback_single_inode+0x3d/0x350
    Mar 23 00:26:24 fileserver kernel: [47611.139057]  writeback_sb_inodes+0x1e3/0x450
    Mar 23 00:26:24 fileserver kernel: [47611.143228]  __writeback_inodes_wb+0x5d/0xb0
    Mar 23 00:26:24 fileserver kernel: [47611.147404]  wb_writeback+0x25f/0x2f0
    Mar 23 00:26:24 fileserver kernel: [47611.151587]  ? get_nr_inodes+0x35/0x50
    Mar 23 00:26:24 fileserver kernel: [47611.155770]  ? cpumask_next+0x16/0x20
    Mar 23 00:26:24 fileserver kernel: [47611.159959]  wb_workfn+0x186/0x400
    Mar 23 00:26:24 fileserver kernel: [47611.164175]  ? call_transmit+0x1b6/0x210 [sunrpc]
    Mar 23 00:26:24 fileserver kernel: [47611.168378]  process_one_work+0x1a7/0x3a0
    Mar 23 00:26:24 fileserver kernel: [47611.172571]  worker_thread+0x30/0x390
    Mar 23 00:26:24 fileserver kernel: [47611.176773]  ? create_worker+0x1a0/0x1a0
    Mar 23 00:26:24 fileserver kernel: [47611.180964]  kthread+0x112/0x130
    Mar 23 00:26:24 fileserver kernel: [47611.185146]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:24 fileserver kernel: [47611.189294]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:24 fileserver kernel: [47611.193490] INFO: task kworker/u9:2:6459 blocked for more than 120 seconds.
    Mar 23 00:26:24 fileserver kernel: [47611.198110]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:24 fileserver kernel: [47611.202969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:24 fileserver kernel: [47611.207642] kworker/u9:2    D    0  6459      2 0x80000000
    Mar 23 00:26:24 fileserver kernel: [47611.212325] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
    Mar 23 00:26:24 fileserver kernel: [47611.216990] Call Trace:
    Mar 23 00:26:24 fileserver kernel: [47611.221641]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:24 fileserver kernel: [47611.226318]  ? __percpu_counter_sum+0x56/0x60
    Mar 23 00:26:24 fileserver kernel: [47611.230967]  schedule+0x28/0x80
    Mar 23 00:26:24 fileserver kernel: [47611.235574]  schedule_preempt_disabled+0xa/0x10
    Mar 23 00:26:24 fileserver kernel: [47611.240176]  __mutex_lock.isra.8+0x2b5/0x4a0
    Mar 23 00:26:24 fileserver kernel: [47611.244757]  kcryptd_crypt+0x26e/0x3b0 [dm_crypt]
    Mar 23 00:26:24 fileserver kernel: [47611.249357]  process_one_work+0x1a7/0x3a0
    Mar 23 00:26:24 fileserver kernel: [47611.253949]  worker_thread+0x30/0x390
    Mar 23 00:26:24 fileserver kernel: [47611.258536]  ? create_worker+0x1a0/0x1a0
    Mar 23 00:26:24 fileserver kernel: [47611.263120]  kthread+0x112/0x130
    Mar 23 00:26:24 fileserver kernel: [47611.267700]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:24 fileserver kernel: [47611.272276]  ret_from_fork+0x1f/0x40
    Mar 23 00:26:24 fileserver kernel: [47611.276833] INFO: task kworker/u9:0:6527 blocked for more than 120 seconds.
    Mar 23 00:26:24 fileserver kernel: [47611.281412]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:26:24 fileserver kernel: [47611.285950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:26:24 fileserver kernel: [47611.290541] kworker/u9:0    D    0  6527      2 0x80000000
    Mar 23 00:26:24 fileserver kernel: [47611.295135] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
    Mar 23 00:26:24 fileserver kernel: [47611.299713] Call Trace:
    Mar 23 00:26:24 fileserver kernel: [47611.304273]  ? __schedule+0x2a2/0x870
    Mar 23 00:26:24 fileserver kernel: [47611.308855]  ? __percpu_counter_sum+0x56/0x60
    Mar 23 00:26:24 fileserver kernel: [47611.313427]  schedule+0x28/0x80
    Mar 23 00:26:24 fileserver kernel: [47611.317973]  schedule_preempt_disabled+0xa/0x10
    Mar 23 00:26:24 fileserver kernel: [47611.322595]  __mutex_lock.isra.8+0x2b5/0x4a0
    Mar 23 00:26:24 fileserver kernel: [47611.327218]  kcryptd_crypt+0x26e/0x3b0 [dm_crypt]
    Mar 23 00:26:24 fileserver kernel: [47611.331842]  process_one_work+0x1a7/0x3a0
    Mar 23 00:26:24 fileserver kernel: [47611.336471]  worker_thread+0x30/0x390
    Mar 23 00:26:24 fileserver kernel: [47611.341072]  ? create_worker+0x1a0/0x1a0
    Mar 23 00:26:24 fileserver kernel: [47611.345625]  kthread+0x112/0x130
    Mar 23 00:26:24 fileserver kernel: [47611.350145]  ? kthread_bind+0x30/0x30
    Mar 23 00:26:24 fileserver kernel: [47611.354652]  ret_from_fork+0x1f/0x40
    Mar 23 00:28:26 fileserver kernel: [47733.329407] INFO: task khugepaged:38 blocked for more than 120 seconds.
    Mar 23 00:28:26 fileserver kernel: [47733.334326]       Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
    Mar 23 00:28:26 fileserver kernel: [47733.339055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar 23 00:28:26 fileserver kernel: [47733.343862] khugepaged      D    0    38      2 0x80000000
    Mar 23 00:28:26 fileserver kernel: [47733.348614] Call Trace:
    Mar 23 00:28:26 fileserver kernel: [47733.353322]  ? __schedule+0x2a2/0x870
    Mar 23 00:28:26 fileserver kernel: [47733.358052]  schedule+0x28/0x80
    Mar 23 00:28:26 fileserver kernel: [47733.362711]  io_schedule+0x12/0x40
    Mar 23 00:28:26 fileserver kernel: [47733.367343]  wbt_wait+0x205/0x300
    Mar 23 00:28:26 fileserver kernel: [47733.371989]  ? wbt_wait+0x300/0x300
    Mar 23 00:28:26 fileserver kernel: [47733.376592]  rq_qos_throttle+0x31/0x40
    Mar 23 00:28:26 fileserver kernel: [47733.381204]  blk_mq_make_request+0x111/0x530
    Mar 23 00:28:26 fileserver kernel: [47733.385851]  generic_make_request+0x1a4/0x400
    Mar 23 00:28:26 fileserver kernel: [47733.390457]  ? end_swap_bio_read+0xc0/0xc0
    Mar 23 00:28:26 fileserver kernel: [47733.395055]  submit_bio+0x45/0x130
    Mar 23 00:28:26 fileserver kernel: [47733.399658]  ? get_swap_bio+0xbb/0xf0
    Mar 23 00:28:26 fileserver kernel: [47733.404225]  __swap_writepage+0xf2/0x3c0
    Mar 23 00:28:26 fileserver kernel: [47733.408796]  ? __frontswap_store+0x6e/0xf2
    Mar 23 00:28:26 fileserver kernel: [47733.413406]  pageout.isra.49+0x117/0x340
    Mar 23 00:28:26 fileserver kernel: [47733.417993]  shrink_page_list+0xa47/0xc70
    Mar 23 00:28:26 fileserver kernel: [47733.422615]  shrink_inactive_list+0x207/0x590
    Mar 23 00:28:26 fileserver kernel: [47733.427195]  shrink_node_memcg+0x20c/0x780
    Mar 23 00:28:26 fileserver kernel: [47733.431823]  shrink_node+0xcf/0x450
    Mar 23 00:28:26 fileserver kernel: [47733.436350]  do_try_to_free_pages+0xc6/0x370
    Mar 23 00:28:26 fileserver kernel: [47733.440901]  try_to_free_pages+0xf0/0x1b0
    Mar 23 00:28:26 fileserver kernel: [47733.445485]  __alloc_pages_slowpath+0x35a/0xcb0
    Mar 23 00:28:26 fileserver kernel: [47733.450069]  ? __switch_to+0x8c/0x440
    Mar 23 00:28:26 fileserver kernel: [47733.454655]  ? put_prev_entity+0x20/0x100
    Mar 23 00:28:26 fileserver kernel: [47733.459184]  __alloc_pages_nodemask+0x28b/0x2b0
    Mar 23 00:28:26 fileserver kernel: [47733.463682]  khugepaged_alloc_page+0x17/0x50
    Mar 23 00:28:26 fileserver kernel: [47733.468108]  khugepaged+0xb6e/0x2110
    Mar 23 00:28:26 fileserver kernel: [47733.472463]  ? finish_wait+0x80/0x80
    Mar 23 00:28:26 fileserver kernel: [47733.476750]  ? collapse_shmem+0xc00/0xc00
    Mar 23 00:28:26 fileserver kernel: [47733.481019]  kthread+0x112/0x130
    Mar 23 00:28:26 fileserver kernel: [47733.485283]  ? kthread_bind+0x30/0x30
    Mar 23 00:28:26 fileserver kernel: [47733.489558]  ret_from_fork+0x1f/0x40
    Danach nehmen die Prozesse dmcrypt_write und md0_raid6 jeweils einen Thread mit 100% Auslastung in Beschlag.

    der Load Average ist mit 186 weit weg von allem "Normalen".

    Nach einem Neustart sieht das Raid so aus:

    Code:
    root@fileserver:~# root@fileserver:~# cat /proc/mdstat 
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : inactive sdh1[1](S) sda1[0](S) sde1[2](S) sdd1[3](S) sdc1[4](S) sdb1[5](S)
          23441312685 blocks super 1.2
           
    unused devices: <none>
    
    root@fileserver:~# mdadm --detail /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Thu Aug  8 15:30:29 2019
            Raid Level : raid6
         Used Dev Size : 18446744073709551615
          Raid Devices : 6
         Total Devices : 7
           Persistence : Superblock is persistent
    
           Update Time : Sat Mar 21 19:34:04 2020
                 State : clean, FAILED, Not Started 
        Active Devices : 6
       Working Devices : 7
        Failed Devices : 0
         Spare Devices : 0
    
                Layout : left-symmetric
            Chunk Size : 512K
    
    Consistency Policy : journal
    
                  Name : fileserver:0  (local to host fileserver)
                  UUID : 09f3e3e1:5f3a19d2:b3dc9e5d:8ad6180a
                Events : 51404
    
        Number   Major   Minor   RaidDevice State
           -       0        0        0      removed
           -       0        0        1      removed
           -       0        0        2      removed
           -       0        0        3      removed
           -       0        0        4      removed
           -       0        0        5      removed
    
           -       8        1        0      sync   /dev/sda1
           -       8       83        -      spare   /dev/sdf3
           -       8      113        1      sync   /dev/sdh1
           -       8       65        2      sync   /dev/sde1
           -       8       49        3      sync   /dev/sdd1
           -       8       33        4      sync   /dev/sdc1
           -       8       17        5      sync   /dev/sdb1
    sämtlich Versuche das Raid zum Laufen zu bewegen mit --assemble und --run führten nicht zum Erfolg.

    Erst mit folgender umständlicher Prozedur bekomme ich wieder ein lauffähiges RAID:

    • Dateisystem in /etc/fstab/ auskommentieren.
    • lvm2 deinstallieren, sonst meldet mdadm das RAID sei besetzt.
    • mittels fdisk die Cache-Pratition löschen.
    • Rechner neustarten.
    • Danach siehr das RAID so aus:
      Code:
      root@fileserver:~# mdadm --detail /dev/md0
      /dev/md0:
                 Version : 1.2
              Raid Level : raid0
           Total Devices : 6
             Persistence : Superblock is persistent
      
                   State : inactive
         Working Devices : 6
      
                    Name : fileserver:0  (local to host fileserver)
                    UUID : 09f3e3e1:5f3a19d2:b3dc9e5d:8ad6180a
                  Events : 51404
      
          Number   Major   Minor   RaidDevice
      
             -       8        1        -        /dev/sda1
             -       8      113        -        /dev/sdh1
             -       8       65        -        /dev/sde1
             -       8       49        -        /dev/sdd1
             -       8       33        -        /dev/sdc1
             -       8       17        -        /dev/sdb1
    • mit
      Code:
      mdadm --run
      das RAID starten
    • auf der SSD wieder mittels fdisk die Cache-Partition erstellen.
    • Cache mittels
      Code:
      mdadm --manage /dev/md0 --add-journal /dev/sdf3
      einbinden
    • lvm2 tools installieren
    • Dateisystem in der fstab wieder aktiv setzten.
    • Rechner neustarten


    Nun die 2 Fragen:
    1. Kann jemand anhand der Meldungen sagen was genau das Problem ist?
    2. Ich vermute es liegt am Cache des RAIDS. Weiß jemand wie man den wieder entfern? Es gibt zwar viele Quellen die zeigen, wie der erstellt wird, aber ich habe keine gefunden, die zegt wie man ihn wieder entfernt.


    Fehlen noch Informationen?

    Danke für eure Hilfe!
    Geändert von torsten_boese (24.03.20 um 10:08 Uhr) Grund: Seite 82 zu 83 korrigiert, eine Ausgabe von /cat/proc/mdstat korrigiert

  2. #2
    Registrierter Benutzer
    Registriert seit
    Jun 2004
    Beiträge
    1.423
    Hallo,
    hängt es evtl. nicht mit dem RAID an sich zusammen?
    So mal "ins Blaue" geraten: als allererstes fallen da einem die Meldungen von khugepaged auf.
    Vielleicht ist der Artikel https://www.cmo.de/wissensdatenbank/...n-120-seconds/ hilfreich.

    Grüße, temir.

  3. #3
    /linux/user Avatar von torsten_boese
    Registriert seit
    Dec 2003
    Beiträge
    681
    Also ich habe den journaling mode in der
    Code:
    /sys/block/md0/md/journal_mode
    von write-back auf write-through umgestellt. BIslang kam der Fehler nicht mehr. Ich vermute, dass die SSD minderer Qualität ist, allerdings kann ich das nicht durch log-Dateien belegen.

Ähnliche Themen

  1. Crash bei Raid 5
    Von droide im Forum System installieren und konfigurieren
    Antworten: 43
    Letzter Beitrag: 28.07.12, 12:01
  2. Raid 1 - Software-Raid oder Hardware-Raid?
    Von Mister-X im Forum stationäre Hardware
    Antworten: 5
    Letzter Beitrag: 08.05.10, 23:47
  3. Antworten: 24
    Letzter Beitrag: 22.06.07, 11:36
  4. OpenSuSE 10.2 auf RAID 1 installiert, jetzt RAID auflösen, aber Linux mag nicht
    Von Alhifi im Forum System installieren und konfigurieren
    Antworten: 10
    Letzter Beitrag: 13.06.07, 11:01
  5. Lieber Raid 5 oder Raid 1 (jew. HW-Raid)?
    Von der_dicke_alex im Forum System installieren und konfigurieren
    Antworten: 10
    Letzter Beitrag: 06.03.06, 12:56

Lesezeichen

Berechtigungen

  • Neue Themen erstellen: Nein
  • Themen beantworten: Nein
  • Anhänge hochladen: Nein
  • Beiträge bearbeiten: Nein
  •