[
https://ovirt-jira.atlassian.net/browse/OVIRT-1015?page=com.atlassian.jir...
]
Evgheni Dereveanchin commented on OVIRT-1015:
---------------------------------------------
The host has nothing specific in dmesg and has a pretty recent kernel.
The node is installed the following way:
15:58:16 virt-install \
15:58:16 --name node-2017-01-10-1558 \
15:58:16 --boot menu=off \
15:58:16 --network none \
15:58:16 --memory 4096 \
15:58:16 --vcpus 4 \
15:58:16 --os-variant rhel7 \
15:58:16 --rng random \
15:58:16 --noreboot \
15:58:16 --location boot.iso \
15:58:16 --extra-args "inst.ks=file:///ci-image-install.ks console=ttyS0" \
15:58:16 --initrd-inject data/ci-image-install.ks \
15:58:16 --check disk_size=off,path_in_use=off \
15:58:16 --graphics none \
15:58:16 --wait 60 \
15:58:16 --disk
path=ovirt-node-ng-image.installed.qcow2,bus=virtio,cache=unsafe,discard=unmap,format=qcow2
\
15:58:16 --disk
path=ovirt-node-ng-image.squashfs.img,readonly=on,device=disk,bus=virtio,serial=livesrc
and here's where it crashes:
15:59:54 Running pre-installation scripts
15:59:54 .
15:59:54 Installing software 100%
16:04:04 [ 330.219115] BUG: unable to handle kernel paging request at 0000000000172001
16:04:04 [ 330.230847] IP: [<ffffffff813250c7>] clear_page_c+0x7/0x10
16:04:04 [ 330.230847] PGD 0
16:04:04 [ 330.230847] Oops: 0000 [#1] SMP
16:04:04 [ 330.230847] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison
xfs fcoe libfcoe libfc scsi_transport_fc scsi_tgt ebtable_nat ebtable_broute bridge stp
llc ebtable_filter ebtables parport_pc sg pcspkr virtio_console i2c_piix4 parport
virtio_rng virtio_balloon i2c_core ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom
ata_generic virtio_blk pata_acpi crct10dif_pclmul crct10dif_common crc32_pclmul
crc32c_intel ghash_clmulni_intel virtio_pci aesni_intel glue_helper ata_piix ablk_helper
virtio_ring serio_raw libata cryptd virtio sunrpc xts lrw gf128mul dm_crypt dm_round_robin
dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear
raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor
async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs floppy iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi squashfs cramfs edd
16:04:04 [ 330.230847] CPU: 1 PID: 1625 Comm: rsync Not tainted 3.10.0-514.el7.x86_64 #1
16:04:04 [ 330.230847] Hardware name: Red Hat KVM, BIOS 1.9.1-5.el7 04/01/2014
16:04:04 [ 330.230847] task: ffff88007e04af10 ti: ffff8800a3454000 task.ti:
ffff8800a3454000
16:04:04 [ 330.230847] RIP: 0010:[<ffffffff813250c7>] [<ffffffff813250c7>]
clear_page_c+0x7/0x10
16:04:04 [ 330.230847] RSP: 0000:ffff8800a3457bd8 EFLAGS: 00010246
16:04:04 [ 330.230847] RAX: 0000000000000000 RBX: 00000000043b4140 RCX: 0000000000000200
16:04:04 [ 330.230847] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88010ed05000
16:04:04 [ 330.230847] RBP: ffff8800a3457ce0 R08: ffffffff818df727 R09: ffffea00043b4180
16:04:04 [ 330.230847] R10: 0000000000001403 R11: 0000000000000000 R12: ffff8800a3457fd8
16:04:04 [ 330.230847] R13: 00000000043b4180 R14: ffffea00043b4140 R15: ffff8800a3454000
16:04:04 [ 330.230847] FS: 00007fb3d2247740(0000) GS:ffff88013fc80000(0000)
knlGS:0000000000000000
16:04:04 [ 330.230847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
16:04:04 [ 330.230847] CR2: 0000000000172001 CR3: 000000013034c000 CR4: 00000000000006e0
16:04:04 [ 330.230847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
16:04:04 [ 330.230847] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
16:04:04 [ 330.230847] Stack:
16:04:04 [ 330.230847] ffffffff8118a67a 0000000000000001 ffff88013ffd8008
000000007fffffff
16:04:04 [ 330.230847] 0000000000000002 00000000d34672a7 ffff88013fc9a098
ffff88013fc9a0c8
16:04:04 [ 330.230847] ffff88013ffd7068 0000000000000000 0000000300000001
ffff88013ffd8000
16:04:04 [ 330.230847] Call Trace:
16:04:04 [ 330.230847] [<ffffffff8118a67a>] ? get_page_from_freelist+0x51a/0x9f0
16:04:04 [ 330.230847] [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
16:04:05 [ 330.230847] [<ffffffff811d20ba>] alloc_pages_vma+0x9a/0x150
16:04:05 [ 330.230847] [<ffffffff811b137f>] handle_mm_fault+0xc6f/0xfe0
16:04:05 [ 330.230847] [<ffffffff811b76d5>] ? do_mmap_pgoff+0x305/0x3c0
16:04:05 [ 330.230847] [<ffffffff81691a94>] __do_page_fault+0x154/0x450
16:04:05 [ 330.230847] [<ffffffff81691e76>] trace_do_page_fault+0x56/0x150
16:04:05 [ 330.230847] [<ffffffff8169151b>] do_async_page_fault+0x1b/0xd0
16:04:05 [ 330.230847] [<ffffffff8168e0b8>] async_page_fault+0x28/0x30
16:04:05 [ 330.230847] Code: 4c 29 ea 39 da 89 d1 7f c4 85 d2 7f 9d 89 d0 eb bc 0f 1f 00
e8 0b 05 d6 ff 90 90 90 90 90 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3
0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
16:04:05 [ 330.230847] RIP [<ffffffff813250c7>] clear_page_c+0x7/0x10
16:04:05 [ 330.230847] RSP <ffff8800a3457bd8>
16:04:05 [ 330.230847] CR2: 0000000000172001
16:04:05 [ 330.230847] ---[ end trace 67ec205c6ac0a24f ]---
16:04:05 [ 330.230847] Kernel panic - not syncing: Fatal exception
16:04:05 [ 330.959703] ------------[ cut here ]------------
16:04:05 [ 330.960692] WARNING: at arch/x86/kernel/smp.c:125
native_smp_send_reschedule+0x5f/0x70()
16:04:05 [ 330.960692] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison
xfs fcoe libfcoe libfc scsi_transport_fc scsi_tgt ebtable_nat ebtable_broute bridge stp
llc ebtable_filter ebtables parport_pc sg pcspkr virtio_console i2c_piix4 parport
virtio_rng virtio_balloon i2c_core ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom
ata_generic virtio_blk pata_acpi crct10dif_pclmul crct10dif_common crc32_pclmul
crc32c_intel ghash_clmulni_intel virtio_pci aesni_intel glue_helper ata_piix ablk_helper
virtio_ring serio_raw libata cryptd virtio sunrpc xts lrw gf128mul dm_crypt dm_round_robin
dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear
raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor
async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs floppy iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi squashfs cramfs edd
16:04:05 [ 330.960692] CPU: 1 PID: 1625 Comm: rsync Tainted: G D ------------
3.10.0-514.el7.x86_64 #1
16:04:05 [ 330.960692] Hardware name: Red Hat KVM, BIOS 1.9.1-5.el7 04/01/2014
16:04:05 [ 330.960692] 0000000000000000 00000000d34672a7 ffff88013fc83d98
ffffffff81685fac
16:04:05 [ 330.960692] ffff88013fc83dd0 ffffffff81085820 0000000000000000
ffff88013fc96c40
16:04:05 [ 330.960692] 00000001000078ef ffff88013fc16c40 0000000000000001
ffff88013fc83de0
16:04:05 [ 330.960692] Call Trace:
16:04:05 [ 330.960692] <IRQ> [<ffffffff81685fac>] dump_stack+0x19/0x1b
16:04:05 [ 330.960692] [<ffffffff81085820>] warn_slowpath_common+0x70/0xb0
16:04:05 [ 330.960692] [<ffffffff8108596a>] warn_slowpath_null+0x1a/0x20
16:04:05 [ 330.960692] [<ffffffff8104e18f>] native_smp_send_reschedule+0x5f/0x70
16:04:05 [ 330.960692] [<ffffffff810d339d>] trigger_load_balance+0x16d/0x200
16:04:05 [ 330.960692] [<ffffffff810c3503>] scheduler_tick+0x103/0x150
16:04:05 [ 330.960692] [<ffffffff810f2f80>] ? tick_sched_handle.isra.13+0x60/0x60
16:04:05 [ 330.960692] [<ffffffff81099196>] update_process_times+0x66/0x80
16:04:05 [ 330.960692] [<ffffffff810f2f45>] tick_sched_handle.isra.13+0x25/0x60
16:04:05 [ 330.960692] [<ffffffff810f2fc1>] tick_sched_timer+0x41/0x70
16:04:05 [ 330.960692] [<ffffffff810b4862>] __hrtimer_run_queues+0xd2/0x260
16:04:05 [ 330.960692] [<ffffffff810b4e00>] hrtimer_interrupt+0xb0/0x1e0
16:04:05 [ 330.960692] [<ffffffff810510d7>] local_apic_timer_interrupt+0x37/0x60
16:04:05 [ 330.960692] [<ffffffff81698ccf>] smp_apic_timer_interrupt+0x3f/0x60
16:04:05 [ 330.960692] [<ffffffff8169721d>] apic_timer_interrupt+0x6d/0x80
16:04:05 [ 330.960692] <EOI> [<ffffffff8167f47e>] ? panic+0x1ae/0x1f2
16:04:05 [ 330.960692] [<ffffffff8168ee9b>] oops_end+0x12b/0x150
16:04:05 [ 330.960692] [<ffffffff8167ea93>] no_context+0x280/0x2a3
16:04:05 [ 330.960692] [<ffffffff8167eb29>] __bad_area_nosemaphore+0x73/0x1ca
16:04:05 [ 330.960692] [<ffffffff8167ec93>] bad_area_nosemaphore+0x13/0x15
16:04:05 [ 330.960692] [<ffffffff81691c1e>] __do_page_fault+0x2de/0x450
16:04:05 [ 330.960692] [<ffffffff81691e76>] trace_do_page_fault+0x56/0x150
16:04:05 [ 330.960692] [<ffffffff8169151b>] do_async_page_fault+0x1b/0xd0
16:04:05 [ 330.960692] [<ffffffff8168e0b8>] async_page_fault+0x28/0x30
16:04:05 [ 330.960692] [<ffffffff813250c7>] ? clear_page_c+0x7/0x10
16:04:05 [ 330.960692] [<ffffffff8118a67a>] ? get_page_from_freelist+0x51a/0x9f0
16:04:05 [ 330.960692] [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
16:04:05 [ 330.960692] [<ffffffff811d20ba>] alloc_pages_vma+0x9a/0x150
16:04:05 [ 330.960692] [<ffffffff811b137f>] handle_mm_fault+0xc6f/0xfe0
16:04:05 [ 330.960692] [<ffffffff811b76d5>] ? do_mmap_pgoff+0x305/0x3c0
16:04:05 [ 330.960692] [<ffffffff81691a94>] __do_page_fault+0x154/0x450
16:04:05 [ 330.960692] [<ffffffff81691e76>] trace_do_page_fault+0x56/0x150
16:04:05 [ 330.960692] [<ffffffff8169151b>] do_async_page_fault+0x1b/0xd0
16:04:05 [ 330.960692] [<ffffffff8168e0b8>] async_page_fault+0x28/0x30
16:04:05 [ 330.960692] ---[ end trace 67ec205c6ac0a250 ]---
So this crashes during an rsync process and the kernel is pretty old:
16:04:04 [ 330.230847] CPU: 1 PID: 1625 Comm: rsync Not tainted 3.10.0-514.el7.x86_64 #1
[~sbonazzo(a)redhat.com] thanks for reporting this. [~fdeutsch] is it possible to use a
newer kernel for the node ISO? I think that would be one of the steps to ensure it's
not an actual bug in the kernel that was already fixed upstream by now.
kernel panic in nested VM
-------------------------
Key: OVIRT-1015
URL:
https://ovirt-jira.atlassian.net/browse/OVIRT-1015
Project: oVirt - virtualization made easy
Issue Type: Outage
Reporter: Evgheni Dereveanchin
Assignee: infra
The following job failed due to a kernel panic inside a nested VM:
http://jenkins.ovirt.org/job/ovirt-node-ng_ovirt-4.0_build-artifacts-el7-...
The VM that the job was runing on is:
vm0085.workers-phx.ovirt.org (kernel-3.10.0-514.2.2.el7.x86_64)
--
This message was sent by Atlassian JIRA
(v1000.670.2#100024)