The ovirt management network (which also acts as VM network) is 1Gb network
Hosts are dell R720s w/ 256gb ram and E5-2690 procs
I have a VM configured with 16Gb ram and 6 virtual CPUs. When this VM does a heavy operation (in this case it's dumping a large DB from a remote server) the load spikes quickly. When this happens the VM becomes unresponsive and in some cases I get cpu soft lock messages.
I'm trying to determine where the bottleneck here is and how I can prevent the VM from becoming unresponsive when doing heavy tasks.
Here is info from syslog showing the soft lockup:
Jun 26 16:13:10 roble kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [pg_dump:4025]
Jun 26 16:13:10 roble kernel: Modules linked in: binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg joydev parport_pc parport virtio_rng i2c_piix4 pcspkr ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic virtio_console virtio_net virtio_scsi pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel qxl drm_kms_helper syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops ttm drm floppy ata_piix libata virtio_pci virtio_ring virtio drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
Jun 26 16:13:10 roble kernel: CPU: 5 PID: 4025 Comm: pg_dump Kdump: loaded Not tainted 3.10.0-957.21.2.el7.x86_64 #1
Jun 26 16:13:10 roble kernel: Hardware name: oVirt oVirt Node, BIOS 1.11.0-2.el7 04/01/2014
Jun 26 16:13:10 roble kernel: task: ffffa01443499040 ti: ffffa014b24a4000 task.ti: ffffa014b24a4000
Jun 26 16:13:10 roble kernel: RIP: 0010:[<ffffffffaeb113ea>] [<ffffffffaeb113ea>] generic_exec_single+0xfa/0x1b0
Jun 26 16:13:10 roble kernel: RSP: 0018:ffffa014b24a7c30 EFLAGS: 00000202
Jun 26 16:13:10 roble kernel: RAX: 0000000000000010 RBX: ffffa014b24a7c00 RCX: 0000000000000030
Jun 26 16:13:10 roble kernel: RDX: 000000000000ffff RSI: 0000000000000010 RDI: 0000000000000286
Jun 26 16:13:10 roble kernel: RBP: ffffa014b24a7c78 R08: ffffffffaf213640 R09: 000000018040003f
Jun 26 16:13:10 roble kernel: R10: 0000000000000001 R11: fffff83b8e91b540 R12: ffffa014b24a7bc0
Jun 26 16:13:10 roble kernel: R13: 0000000000000c9b R14: ffffa01477bf94e8 R15: ffffa01545074270
Jun 26 16:13:10 roble kernel: FS: 00007f9fefe89840(0000) GS:ffffa0172f340000(0000) knlGS:0000000000000000
Jun 26 16:13:10 roble kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 16:13:10 roble kernel: CR2: 00007f9fef391e90 CR3: 000000014aa00000 CR4: 00000000000606e0
Jun 26 16:13:10 roble kernel: Call Trace:
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110
Jun 26 16:13:10 roble kernel: [<ffffffffaeb114ff>] smp_call_function_single+0x5f/0xa0
Jun 26 16:13:10 roble kernel: [<ffffffffaed75cd5>] ? cpumask_next_and+0x35/0x50
Jun 26 16:13:10 roble kernel: [<ffffffffaeb11aab>] smp_call_function_many+0x22b/0x270
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a6a8>] native_flush_tlb_others+0xb8/0xc0
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a718>] flush_tlb_mm_range+0x68/0x140
Jun 26 16:13:10 roble kernel: [<ffffffffaebe4687>] tlb_flush_mmu.part.76+0x37/0xe0
Jun 26 16:13:10 roble kernel: [<ffffffffaebe5f85>] tlb_finish_mmu+0x55/0x60
Jun 26 16:13:10 roble kernel: [<ffffffffaebef624>] unmap_region+0xf4/0x140
Jun 26 16:13:10 roble kernel: [<ffffffffaecfc8d3>] ? selinux_file_free_security+0x23/0x30
Jun 26 16:13:10 roble kernel: [<ffffffffaebefbe1>] ? __vma_rb_erase+0x121/0x220
Jun 26 16:13:10 roble kernel: [<ffffffffaebf1c15>] do_munmap+0x2a5/0x480
Jun 26 16:13:10 roble kernel: [<ffffffffaebf1e55>] vm_munmap+0x65/0xb0
Jun 26 16:13:10 roble kernel: [<ffffffffaebf30e2>] SyS_munmap+0x22/0x30
Jun 26 16:13:10 roble kernel: [<ffffffffaf175ddb>] system_call_fastpath+0x22/0x27
Jun 26 16:13:10 roble kernel: [<ffffffffaf175d21>] ? system_call_after_swapgs+0xae/0x146
Jun 26 16:13:10 roble kernel: Code: 00 b7 01 00 48 89 de 48 03 14 c5 60 bc 74 af 48 89 df e8 4a b7 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 20 01 74 0b 0f 1f 00 f3 90 <f6> 43 20 01 75 f8 31 c0 48 8b 7c 24 28 65 48 33 3c 25 28 00 00
Jun 26 16:13:14 roble kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 27s! [kworker/4:3:16530]
Jun 26 16:13:14 roble kernel: Modules linked in: binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg joydev parport_pc parport virtio_rng i2c_piix4 pcspkr ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic virtio_console virtio_net virtio_scsi pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel qxl drm_kms_helper syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops ttm drm floppy ata_piix libata virtio_pci virtio_ring virtio drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
Jun 26 16:13:14 roble kernel: CPU: 4 PID: 16530 Comm: kworker/4:3 Kdump: loaded Tainted: G L ------------ 3.10.0-957.21.2.el7.x86_64 #1
Jun 26 16:13:14 roble kernel: Hardware name: oVirt oVirt Node, BIOS 1.11.0-2.el7 04/01/2014
Jun 26 16:13:14 roble kernel: Workqueue: events tsc_refine_calibration_work
Jun 26 16:13:14 roble kernel: task: ffffa016acf74100 ti: ffffa014f76a8000 task.ti: ffffa014f76a8000
Jun 26 16:13:14 roble kernel: RIP: 0010:[<ffffffffaefb97e0>] [<ffffffffaefb97e0>] acpi_pm_read_verified+0x10/0x60
Jun 26 16:13:14 roble kernel: RSP: 0018:ffffa014f76abdb8 EFLAGS: 00000202
Jun 26 16:13:14 roble kernel: RAX: 00000000000addf6 RBX: 0000000000000086 RCX: 0000000000f16644
Jun 26 16:13:14 roble kernel: RDX: 0000000000000608 RSI: 00000000002aac00 RDI: ffffffffaf971901
Jun 26 16:13:14 roble kernel: RBP: ffffa014f76abdb8 R08: 00000000aff71901 R09: 0001427407e2e9e0
Jun 26 16:13:14 roble kernel: R10: 0001427407e2e9e0 R11: 0000000000000000 R12: 0000000000000004
Jun 26 16:13:14 roble kernel: R13: ffffa014f76abd38 R14: ffffffffaf62ea00 R15: ffffa0172f313900
Jun 26 16:13:14 roble kernel: FS: 0000000000000000(0000) GS:ffffa0172f300000(0000) knlGS:0000000000000000
Jun 26 16:13:14 roble kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 16:13:14 roble kernel: CR2: 0000000000448d70 CR3: 000000017789a000 CR4: 00000000000606e0
Jun 26 16:13:14 roble kernel: Call Trace:
Jun 26 16:13:14 roble kernel: [<ffffffffaea3426d>] tsc_read_refs+0x8d/0xb0
Jun 26 16:13:14 roble kernel: [<ffffffffaea344b2>] tsc_refine_calibration_work+0x1c2/0x220
Jun 26 16:13:14 roble kernel: [<ffffffffaeab9ebf>] process_one_work+0x17f/0x440
Jun 26 16:13:14 roble kernel: [<ffffffffaeabaf56>] worker_thread+0x126/0x3c0
Jun 26 16:13:14 roble kernel: [<ffffffffaeabae30>] ? manage_workers.isra.25+0x2a0/0x2a0
Jun 26 16:13:14 roble kernel: [<ffffffffaeac1da1>] kthread+0xd1/0xe0
Jun 26 16:13:14 roble kernel: [<ffffffffaeac1cd0>] ? insert_kthread_work+0x40/0x40
Jun 26 16:13:14 roble kernel: [<ffffffffaf175c37>] ret_from_fork_nospec_begin+0x21/0x21
Jun 26 16:13:14 roble kernel: [<ffffffffaeac1cd0>] ? insert_kthread_work+0x40/0x40
Jun 26 16:13:14 roble kernel: Code: 43 ea 74 00 30 98 fb ae c7 05 81 ea 74 00 78 00 00 00 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 8b 15 5d 74 7a 00 55 48 89 e5 ed <89> c6 81 e6 ff ff ff 00 ed 89 c1 81 e1 ff ff ff 00 ed 25 ff ff