[JIRA] (OVIRT-736) soft lockup on el7-vm25

Evgheni Dereveanchin (oVirt JIRA) jira at ovirt-jira.atlassian.net
Wed Sep 21 08:57:00 UTC 2016


Evgheni Dereveanchin created OVIRT-736:
------------------------------------------

             Summary: soft lockup on el7-vm25
                 Key: OVIRT-736
                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-736
             Project: oVirt - virtualization made easy
          Issue Type: Bug
            Reporter: Evgheni Dereveanchin
            Assignee: infra


I've noticed some slaves going offline in Jenkins with 100% CPU reported on the Engine. They eventually return to normal state. CHecked the logs on el7-vm25.phx.ovirt.org which had these symptoms and there seems to be a soft lockup due to the qemu-kvm process:

Sep 21 04:57:18 el7-vm25 kernel: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-kvm:13768]
Sep 21 04:57:18 el7-vm25 kernel: Modules linked in: nls_utf8 isofs loop dm_mod xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter aesni_intel lrw gf128mul glue_helper ppdev ablk_helper cryptd sg pcspkr parport_pc parport i2c_piix4 kvm_intel nfsd kvm auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_console virtio_scsi virtio_net qxl syscopyarea sysfillrect sysimgblt drm_kms_helper
Sep 21 04:57:18 el7-vm25 kernel: ttm ata_piix crc32c_intel libata serio_raw virtio_pci virtio_ring virtio drm i2c_core floppy
Sep 21 04:57:18 el7-vm25 kernel: CPU: 0 PID: 13768 Comm: qemu-kvm Not tainted 3.10.0-327.28.3.el7.x86_64 #1
Sep 21 04:57:18 el7-vm25 kernel: Hardware name: oVirt oVirt Node, BIOS 0.5.1 01/01/2011
Sep 21 04:57:18 el7-vm25 kernel: task: ffff880210017300 ti: ffff8800363f8000 task.ti: ffff8800363f8000
Sep 21 04:57:18 el7-vm25 kernel: RIP: 0010:[<ffffffff810e69da>]  [<ffffffff810e69da>] generic_exec_single+0xfa/0x1a0
Sep 21 04:57:18 el7-vm25 kernel: RSP: 0018:ffff8800363fbc40  EFLAGS: 00000202
Sep 21 04:57:18 el7-vm25 kernel: RAX: 0000000000000020 RBX: ffff8800363fbc10 RCX: 0000000000000020
Sep 21 04:57:18 el7-vm25 kernel: RDX: 00000000ffffffff RSI: 0000000000000020 RDI: 0000000000000282
Sep 21 04:57:18 el7-vm25 kernel: RBP: ffff8800363fbc88 R08: ffffffff8165fbe0 R09: ffffea000357c4c0
Sep 21 04:57:18 el7-vm25 kernel: R10: 0000000000003496 R11: 0000000000000206 R12: ffff880210017300
Sep 21 04:57:18 el7-vm25 kernel: R13: ffff880210017300 R14: 0000000000000001 R15: ffff880210017300
Sep 21 04:57:18 el7-vm25 kernel: FS:  00007fe7b288e700(0000) GS:ffff880216e00000(0000) knlGS:0000000000000000
Sep 21 04:57:18 el7-vm25 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 21 04:57:18 el7-vm25 kernel: CR2: 00000000ffffffff CR3: 0000000211bcb000 CR4: 00000000000026f0
Sep 21 04:57:18 el7-vm25 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 21 04:57:18 el7-vm25 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 21 04:57:18 el7-vm25 kernel: Stack:
Sep 21 04:57:18 el7-vm25 kernel: 0000000000000000 0000000000000000 ffffffff81065c90 ffff8800363fbd10
Sep 21 04:57:18 el7-vm25 kernel: 0000000000000003 000000009347ffdf 0000000000000001 ffffffff81065c90
Sep 21 04:57:18 el7-vm25 kernel: ffffffff81065c90 ffff8800363fbcb8 ffffffff810e6adf ffff8800363fbcb8
Sep 21 04:57:18 el7-vm25 kernel: Call Trace:
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065c90>] ? leave_mm+0x70/0x70
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065c90>] ? leave_mm+0x70/0x70
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065c90>] ? leave_mm+0x70/0x70
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810e6adf>] smp_call_function_single+0x5f/0xa0
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff812f3015>] ? cpumask_next_and+0x35/0x50
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810e7083>] smp_call_function_many+0x223/0x260
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065e58>] native_flush_tlb_others+0xb8/0xc0
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065f26>] flush_tlb_mm_range+0x66/0x140
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff811929d3>] tlb_flush_mmu.part.54+0x33/0xc0
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81193565>] tlb_finish_mmu+0x55/0x60
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81195afa>] zap_page_range+0x12a/0x170
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81192224>] SyS_madvise+0x394/0x820
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810aa86d>] ? hrtimer_nanosleep+0xad/0x170
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810e5820>] ? SyS_futex+0x80/0x180
Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
Sep 21 04:57:18 el7-vm25 kernel: Code: 80 72 01 00 48 89 de 48 03 14 c5 20 c9 a5 81 48 89 df e8 7a 03 22 00 84 c0 75 46 45 85 ed 74 11 f6 43 20 01 74 0b 0f 1f 00 f3 90 <f6> 43 20 01 75 f8 31 c0 48 8b 7c 24 28 65 48 33 3c 25 28 00 00 


need to find the root cause and fix the issue as this is negatively affecting jobs being run.



--
This message was sent by Atlassian JIRA
(v1000.350.2#100014)



More information about the Infra mailing list