vm unresponsive cpu soft lock on heavy operation

Basic setup notes: 3 node HCI running oVirt 4.3.3 using nodeNG for hosts. Storage is SSD backed with 10Gb network dedicated to gluster with jumbo frames enabled. The ovirt management network (which also acts as VM network) is 1Gb network Hosts are dell R720s w/ 256gb ram and E5-2690 procs I have a VM configured with 16Gb ram and 6 virtual CPUs. When this VM does a heavy operation (in this case it's dumping a large DB from a remote server) the load spikes quickly. When this happens the VM becomes unresponsive and in some cases I get cpu soft lock messages. I'm trying to determine where the bottleneck here is and how I can prevent the VM from becoming unresponsive when doing heavy tasks. Here is info from syslog showing the soft lockup: Jun 26 16:13:10 roble kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [pg_dump:4025] Jun 26 16:13:10 roble kernel: Modules linked in: binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg joydev parport_pc parport virtio_rng i2c_piix4 pcspkr ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic virtio_console virtio_net virtio_scsi pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel qxl drm_kms_helper syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops ttm drm floppy ata_piix libata virtio_pci virtio_ring virtio drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod Jun 26 16:13:10 roble kernel: CPU: 5 PID: 4025 Comm: pg_dump Kdump: loaded Not tainted 3.10.0-957.21.2.el7.x86_64 #1 Jun 26 16:13:10 roble kernel: Hardware name: oVirt oVirt Node, BIOS 1.11.0-2.el7 04/01/2014 Jun 26 16:13:10 roble kernel: task: ffffa01443499040 ti: ffffa014b24a4000 task.ti: ffffa014b24a4000 Jun 26 16:13:10 roble kernel: RIP: 0010:[<ffffffffaeb113ea>] [<ffffffffaeb113ea>] generic_exec_single+0xfa/0x1b0 Jun 26 16:13:10 roble kernel: RSP: 0018:ffffa014b24a7c30 EFLAGS: 00000202 Jun 26 16:13:10 roble kernel: RAX: 0000000000000010 RBX: ffffa014b24a7c00 RCX: 0000000000000030 Jun 26 16:13:10 roble kernel: RDX: 000000000000ffff RSI: 0000000000000010 RDI: 0000000000000286 Jun 26 16:13:10 roble kernel: RBP: ffffa014b24a7c78 R08: ffffffffaf213640 R09: 000000018040003f Jun 26 16:13:10 roble kernel: R10: 0000000000000001 R11: fffff83b8e91b540 R12: ffffa014b24a7bc0 Jun 26 16:13:10 roble kernel: R13: 0000000000000c9b R14: ffffa01477bf94e8 R15: ffffa01545074270 Jun 26 16:13:10 roble kernel: FS: 00007f9fefe89840(0000) GS:ffffa0172f340000(0000) knlGS:0000000000000000 Jun 26 16:13:10 roble kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 26 16:13:10 roble kernel: CR2: 00007f9fef391e90 CR3: 000000014aa00000 CR4: 00000000000606e0 Jun 26 16:13:10 roble kernel: Call Trace: Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110 Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110 Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110 Jun 26 16:13:10 roble kernel: [<ffffffffaeb114ff>] smp_call_function_single+0x5f/0xa0 Jun 26 16:13:10 roble kernel: [<ffffffffaed75cd5>] ? cpumask_next_and+0x35/0x50 Jun 26 16:13:10 roble kernel: [<ffffffffaeb11aab>] smp_call_function_many+0x22b/0x270 Jun 26 16:13:10 roble kernel: [<ffffffffaea7a6a8>] native_flush_tlb_others+0xb8/0xc0 Jun 26 16:13:10 roble kernel: [<ffffffffaea7a718>] flush_tlb_mm_range+0x68/0x140 Jun 26 16:13:10 roble kernel: [<ffffffffaebe4687>] tlb_flush_mmu.part.76+0x37/0xe0 Jun 26 16:13:10 roble kernel: [<ffffffffaebe5f85>] tlb_finish_mmu+0x55/0x60 Jun 26 16:13:10 roble kernel: [<ffffffffaebef624>] unmap_region+0xf4/0x140 Jun 26 16:13:10 roble kernel: [<ffffffffaecfc8d3>] ? selinux_file_free_security+0x23/0x30 Jun 26 16:13:10 roble kernel: [<ffffffffaebefbe1>] ? __vma_rb_erase+0x121/0x220 Jun 26 16:13:10 roble kernel: [<ffffffffaebf1c15>] do_munmap+0x2a5/0x480 Jun 26 16:13:10 roble kernel: [<ffffffffaebf1e55>] vm_munmap+0x65/0xb0 Jun 26 16:13:10 roble kernel: [<ffffffffaebf30e2>] SyS_munmap+0x22/0x30 Jun 26 16:13:10 roble kernel: [<ffffffffaf175ddb>] system_call_fastpath+0x22/0x27 Jun 26 16:13:10 roble kernel: [<ffffffffaf175d21>] ? system_call_after_swapgs+0xae/0x146 Jun 26 16:13:10 roble kernel: Code: 00 b7 01 00 48 89 de 48 03 14 c5 60 bc 74 af 48 89 df e8 4a b7 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 20 01 74 0b 0f 1f 00 f3 90 <f6> 43 20 01 75 f8 31 c0 48 8b 7c 24 28 65 48 33 3c 25 28 00 00 Jun 26 16:13:14 roble kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 27s! [kworker/4:3:16530] Jun 26 16:13:14 roble kernel: Modules linked in: binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg joydev parport_pc parport virtio_rng i2c_piix4 pcspkr ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic virtio_console virtio_net virtio_scsi pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel qxl drm_kms_helper syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops ttm drm floppy ata_piix libata virtio_pci virtio_ring virtio drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod Jun 26 16:13:14 roble kernel: CPU: 4 PID: 16530 Comm: kworker/4:3 Kdump: loaded Tainted: G L ------------ 3.10.0-957.21.2.el7.x86_64 #1 Jun 26 16:13:14 roble kernel: Hardware name: oVirt oVirt Node, BIOS 1.11.0-2.el7 04/01/2014 Jun 26 16:13:14 roble kernel: Workqueue: events tsc_refine_calibration_work Jun 26 16:13:14 roble kernel: task: ffffa016acf74100 ti: ffffa014f76a8000 task.ti: ffffa014f76a8000 Jun 26 16:13:14 roble kernel: RIP: 0010:[<ffffffffaefb97e0>] [<ffffffffaefb97e0>] acpi_pm_read_verified+0x10/0x60 Jun 26 16:13:14 roble kernel: RSP: 0018:ffffa014f76abdb8 EFLAGS: 00000202 Jun 26 16:13:14 roble kernel: RAX: 00000000000addf6 RBX: 0000000000000086 RCX: 0000000000f16644 Jun 26 16:13:14 roble kernel: RDX: 0000000000000608 RSI: 00000000002aac00 RDI: ffffffffaf971901 Jun 26 16:13:14 roble kernel: RBP: ffffa014f76abdb8 R08: 00000000aff71901 R09: 0001427407e2e9e0 Jun 26 16:13:14 roble kernel: R10: 0001427407e2e9e0 R11: 0000000000000000 R12: 0000000000000004 Jun 26 16:13:14 roble kernel: R13: ffffa014f76abd38 R14: ffffffffaf62ea00 R15: ffffa0172f313900 Jun 26 16:13:14 roble kernel: FS: 0000000000000000(0000) GS:ffffa0172f300000(0000) knlGS:0000000000000000 Jun 26 16:13:14 roble kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 26 16:13:14 roble kernel: CR2: 0000000000448d70 CR3: 000000017789a000 CR4: 00000000000606e0 Jun 26 16:13:14 roble kernel: Call Trace: Jun 26 16:13:14 roble kernel: [<ffffffffaea3426d>] tsc_read_refs+0x8d/0xb0 Jun 26 16:13:14 roble kernel: [<ffffffffaea344b2>] tsc_refine_calibration_work+0x1c2/0x220 Jun 26 16:13:14 roble kernel: [<ffffffffaeab9ebf>] process_one_work+0x17f/0x440 Jun 26 16:13:14 roble kernel: [<ffffffffaeabaf56>] worker_thread+0x126/0x3c0 Jun 26 16:13:14 roble kernel: [<ffffffffaeabae30>] ? manage_workers.isra.25+0x2a0/0x2a0 Jun 26 16:13:14 roble kernel: [<ffffffffaeac1da1>] kthread+0xd1/0xe0 Jun 26 16:13:14 roble kernel: [<ffffffffaeac1cd0>] ? insert_kthread_work+0x40/0x40 Jun 26 16:13:14 roble kernel: [<ffffffffaf175c37>] ret_from_fork_nospec_begin+0x21/0x21 Jun 26 16:13:14 roble kernel: [<ffffffffaeac1cd0>] ? insert_kthread_work+0x40/0x40 Jun 26 16:13:14 roble kernel: Code: 43 ea 74 00 30 98 fb ae c7 05 81 ea 74 00 78 00 00 00 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 8b 15 5d 74 7a 00 55 48 89 e5 ed <89> c6 81 e6 ff ff ff 00 ed 89 c1 81 e1 ff ff ff 00 ed 25 ff ff
participants (1)
-
Jayme