[JIRA] (OVIRT-1015) kernel panic in nested VM

Evgheni Dereveanchin (oVirt JIRA) jira at ovirt-jira.atlassian.net
Tue Jan 10 16:50:25 UTC 2017


    [ https://ovirt-jira.atlassian.net/browse/OVIRT-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=25412#comment-25412 ] 

Evgheni Dereveanchin commented on OVIRT-1015:
---------------------------------------------

The host has nothing specific in dmesg and has a pretty recent kernel.

The node is installed the following way:

15:58:16 virt-install \
15:58:16 	--name node-2017-01-10-1558 \
15:58:16 	--boot menu=off \
15:58:16 	--network none \
15:58:16 	--memory 4096 \
15:58:16 	--vcpus 4 \
15:58:16 	--os-variant rhel7 \
15:58:16 	--rng random \
15:58:16 	--noreboot \
15:58:16 	--location boot.iso \
15:58:16 	--extra-args "inst.ks=file:///ci-image-install.ks console=ttyS0" \
15:58:16 	--initrd-inject data/ci-image-install.ks \
15:58:16 	--check disk_size=off,path_in_use=off \
15:58:16 	--graphics none \
15:58:16 	--wait 60 \
15:58:16 	--disk path=ovirt-node-ng-image.installed.qcow2,bus=virtio,cache=unsafe,discard=unmap,format=qcow2 \
15:58:16 	--disk path=ovirt-node-ng-image.squashfs.img,readonly=on,device=disk,bus=virtio,serial=livesrc

and here's where it crashes:

15:59:54 Running pre-installation scripts
15:59:54 .
15:59:54 Installing software 100%
16:04:04 [  330.219115] BUG: unable to handle kernel paging request at 0000000000172001
16:04:04 [  330.230847] IP: [<ffffffff813250c7>] clear_page_c+0x7/0x10
16:04:04 [  330.230847] PGD 0 
16:04:04 [  330.230847] Oops: 0000 [#1] SMP 
16:04:04 [  330.230847] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison xfs fcoe libfcoe libfc scsi_transport_fc scsi_tgt ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables parport_pc sg pcspkr virtio_console i2c_piix4 parport virtio_rng virtio_balloon i2c_core ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom ata_generic virtio_blk pata_acpi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_pci aesni_intel glue_helper ata_piix ablk_helper virtio_ring serio_raw libata cryptd virtio sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs floppy iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs edd
16:04:04 [  330.230847] CPU: 1 PID: 1625 Comm: rsync Not tainted 3.10.0-514.el7.x86_64 #1
16:04:04 [  330.230847] Hardware name: Red Hat KVM, BIOS 1.9.1-5.el7 04/01/2014
16:04:04 [  330.230847] task: ffff88007e04af10 ti: ffff8800a3454000 task.ti: ffff8800a3454000
16:04:04 [  330.230847] RIP: 0010:[<ffffffff813250c7>]  [<ffffffff813250c7>] clear_page_c+0x7/0x10
16:04:04 [  330.230847] RSP: 0000:ffff8800a3457bd8  EFLAGS: 00010246
16:04:04 [  330.230847] RAX: 0000000000000000 RBX: 00000000043b4140 RCX: 0000000000000200
16:04:04 [  330.230847] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88010ed05000
16:04:04 [  330.230847] RBP: ffff8800a3457ce0 R08: ffffffff818df727 R09: ffffea00043b4180
16:04:04 [  330.230847] R10: 0000000000001403 R11: 0000000000000000 R12: ffff8800a3457fd8
16:04:04 [  330.230847] R13: 00000000043b4180 R14: ffffea00043b4140 R15: ffff8800a3454000
16:04:04 [  330.230847] FS:  00007fb3d2247740(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
16:04:04 [  330.230847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
16:04:04 [  330.230847] CR2: 0000000000172001 CR3: 000000013034c000 CR4: 00000000000006e0
16:04:04 [  330.230847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
16:04:04 [  330.230847] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
16:04:04 [  330.230847] Stack:
16:04:04 [  330.230847]  ffffffff8118a67a 0000000000000001 ffff88013ffd8008 000000007fffffff
16:04:04 [  330.230847]  0000000000000002 00000000d34672a7 ffff88013fc9a098 ffff88013fc9a0c8
16:04:04 [  330.230847]  ffff88013ffd7068 0000000000000000 0000000300000001 ffff88013ffd8000
16:04:04 [  330.230847] Call Trace:
16:04:04 [  330.230847]  [<ffffffff8118a67a>] ? get_page_from_freelist+0x51a/0x9f0
16:04:04 [  330.230847]  [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
16:04:05 [  330.230847]  [<ffffffff811d20ba>] alloc_pages_vma+0x9a/0x150
16:04:05 [  330.230847]  [<ffffffff811b137f>] handle_mm_fault+0xc6f/0xfe0
16:04:05 [  330.230847]  [<ffffffff811b76d5>] ? do_mmap_pgoff+0x305/0x3c0
16:04:05 [  330.230847]  [<ffffffff81691a94>] __do_page_fault+0x154/0x450
16:04:05 [  330.230847]  [<ffffffff81691e76>] trace_do_page_fault+0x56/0x150
16:04:05 [  330.230847]  [<ffffffff8169151b>] do_async_page_fault+0x1b/0xd0
16:04:05 [  330.230847]  [<ffffffff8168e0b8>] async_page_fault+0x28/0x30
16:04:05 [  330.230847] Code: 4c 29 ea 39 da 89 d1 7f c4 85 d2 7f 9d 89 d0 eb bc 0f 1f 00 e8 0b 05 d6 ff 90 90 90 90 90 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f 
16:04:05 [  330.230847] RIP  [<ffffffff813250c7>] clear_page_c+0x7/0x10
16:04:05 [  330.230847]  RSP <ffff8800a3457bd8>
16:04:05 [  330.230847] CR2: 0000000000172001
16:04:05 [  330.230847] ---[ end trace 67ec205c6ac0a24f ]---
16:04:05 [  330.230847] Kernel panic - not syncing: Fatal exception
16:04:05 [  330.959703] ------------[ cut here ]------------
16:04:05 [  330.960692] WARNING: at arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x5f/0x70()
16:04:05 [  330.960692] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison xfs fcoe libfcoe libfc scsi_transport_fc scsi_tgt ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables parport_pc sg pcspkr virtio_console i2c_piix4 parport virtio_rng virtio_balloon i2c_core ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom ata_generic virtio_blk pata_acpi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_pci aesni_intel glue_helper ata_piix ablk_helper virtio_ring serio_raw libata cryptd virtio sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs floppy iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs edd
16:04:05 [  330.960692] CPU: 1 PID: 1625 Comm: rsync Tainted: G      D        ------------   3.10.0-514.el7.x86_64 #1
16:04:05 [  330.960692] Hardware name: Red Hat KVM, BIOS 1.9.1-5.el7 04/01/2014
16:04:05 [  330.960692]  0000000000000000 00000000d34672a7 ffff88013fc83d98 ffffffff81685fac
16:04:05 [  330.960692]  ffff88013fc83dd0 ffffffff81085820 0000000000000000 ffff88013fc96c40
16:04:05 [  330.960692]  00000001000078ef ffff88013fc16c40 0000000000000001 ffff88013fc83de0
16:04:05 [  330.960692] Call Trace:
16:04:05 [  330.960692]  <IRQ>  [<ffffffff81685fac>] dump_stack+0x19/0x1b
16:04:05 [  330.960692]  [<ffffffff81085820>] warn_slowpath_common+0x70/0xb0
16:04:05 [  330.960692]  [<ffffffff8108596a>] warn_slowpath_null+0x1a/0x20
16:04:05 [  330.960692]  [<ffffffff8104e18f>] native_smp_send_reschedule+0x5f/0x70
16:04:05 [  330.960692]  [<ffffffff810d339d>] trigger_load_balance+0x16d/0x200
16:04:05 [  330.960692]  [<ffffffff810c3503>] scheduler_tick+0x103/0x150
16:04:05 [  330.960692]  [<ffffffff810f2f80>] ? tick_sched_handle.isra.13+0x60/0x60
16:04:05 [  330.960692]  [<ffffffff81099196>] update_process_times+0x66/0x80
16:04:05 [  330.960692]  [<ffffffff810f2f45>] tick_sched_handle.isra.13+0x25/0x60
16:04:05 [  330.960692]  [<ffffffff810f2fc1>] tick_sched_timer+0x41/0x70
16:04:05 [  330.960692]  [<ffffffff810b4862>] __hrtimer_run_queues+0xd2/0x260
16:04:05 [  330.960692]  [<ffffffff810b4e00>] hrtimer_interrupt+0xb0/0x1e0
16:04:05 [  330.960692]  [<ffffffff810510d7>] local_apic_timer_interrupt+0x37/0x60
16:04:05 [  330.960692]  [<ffffffff81698ccf>] smp_apic_timer_interrupt+0x3f/0x60
16:04:05 [  330.960692]  [<ffffffff8169721d>] apic_timer_interrupt+0x6d/0x80
16:04:05 [  330.960692]  <EOI>  [<ffffffff8167f47e>] ? panic+0x1ae/0x1f2
16:04:05 [  330.960692]  [<ffffffff8168ee9b>] oops_end+0x12b/0x150
16:04:05 [  330.960692]  [<ffffffff8167ea93>] no_context+0x280/0x2a3
16:04:05 [  330.960692]  [<ffffffff8167eb29>] __bad_area_nosemaphore+0x73/0x1ca
16:04:05 [  330.960692]  [<ffffffff8167ec93>] bad_area_nosemaphore+0x13/0x15
16:04:05 [  330.960692]  [<ffffffff81691c1e>] __do_page_fault+0x2de/0x450
16:04:05 [  330.960692]  [<ffffffff81691e76>] trace_do_page_fault+0x56/0x150
16:04:05 [  330.960692]  [<ffffffff8169151b>] do_async_page_fault+0x1b/0xd0
16:04:05 [  330.960692]  [<ffffffff8168e0b8>] async_page_fault+0x28/0x30
16:04:05 [  330.960692]  [<ffffffff813250c7>] ? clear_page_c+0x7/0x10
16:04:05 [  330.960692]  [<ffffffff8118a67a>] ? get_page_from_freelist+0x51a/0x9f0
16:04:05 [  330.960692]  [<ffffffff8118acc6>] __alloc_pages_nodemask+0x176/0x420
16:04:05 [  330.960692]  [<ffffffff811d20ba>] alloc_pages_vma+0x9a/0x150
16:04:05 [  330.960692]  [<ffffffff811b137f>] handle_mm_fault+0xc6f/0xfe0
16:04:05 [  330.960692]  [<ffffffff811b76d5>] ? do_mmap_pgoff+0x305/0x3c0
16:04:05 [  330.960692]  [<ffffffff81691a94>] __do_page_fault+0x154/0x450
16:04:05 [  330.960692]  [<ffffffff81691e76>] trace_do_page_fault+0x56/0x150
16:04:05 [  330.960692]  [<ffffffff8169151b>] do_async_page_fault+0x1b/0xd0
16:04:05 [  330.960692]  [<ffffffff8168e0b8>] async_page_fault+0x28/0x30
16:04:05 [  330.960692] ---[ end trace 67ec205c6ac0a250 ]---


So this crashes during an rsync process and the kernel is pretty old:
16:04:04 [  330.230847] CPU: 1 PID: 1625 Comm: rsync Not tainted 3.10.0-514.el7.x86_64 #1

[~sbonazzo at redhat.com] thanks for reporting this. [~fdeutsch] is it possible to use a newer kernel for the node ISO? I think that would be one of the steps to ensure it's not an actual bug in the kernel that was already fixed upstream by now.

> kernel panic in nested VM
> -------------------------
>
>                 Key: OVIRT-1015
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1015
>             Project: oVirt - virtualization made easy
>          Issue Type: Outage
>            Reporter: Evgheni Dereveanchin
>            Assignee: infra
>
> The following job failed due to a kernel panic inside a nested VM:
> http://jenkins.ovirt.org/job/ovirt-node-ng_ovirt-4.0_build-artifacts-el7-x86_64/210/console
> The VM that the job was runing on is:
> vm0085.workers-phx.ovirt.org (kernel-3.10.0-514.2.2.el7.x86_64)



--
This message was sent by Atlassian JIRA
(v1000.670.2#100024)


More information about the Infra mailing list