Basic setup notes: 3 node HCI running oVirt 4.3.3 using nodeNG for hosts.
Storage is SSD backed with 10Gb network dedicated to gluster with jumbo
frames enabled.
The ovirt management network (which also acts as VM network) is 1Gb network
Hosts are dell R720s w/ 256gb ram and E5-2690 procs
I have a VM configured with 16Gb ram and 6 virtual CPUs. When this VM does
a heavy operation (in this case it's dumping a large DB from a remote
server) the load spikes quickly. When this happens the VM becomes
unresponsive and in some cases I get cpu soft lock messages.
I'm trying to determine where the bottleneck here is and how I can prevent
the VM from becoming unresponsive when doing heavy tasks.
Here is info from syslog showing the soft lockup:
Jun 26 16:13:10 roble kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck
for 23s! [pg_dump:4025]
Jun 26 16:13:10 roble kernel: Modules linked in: binfmt_misc
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw
gf128mul glue_helper ablk_helper cryptd sg joydev parport_pc parport
virtio_rng i2c_piix4 pcspkr ip_tables xfs libcrc32c sd_mod crc_t10dif
crct10dif_generic sr_mod cdrom ata_generic virtio_console virtio_net
virtio_scsi pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel qxl
drm_kms_helper syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops ttm
drm floppy ata_piix libata virtio_pci virtio_ring virtio
drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
Jun 26 16:13:10 roble kernel: CPU: 5 PID: 4025 Comm: pg_dump Kdump: loaded
Not tainted 3.10.0-957.21.2.el7.x86_64 #1
Jun 26 16:13:10 roble kernel: Hardware name: oVirt oVirt Node, BIOS
1.11.0-2.el7 04/01/2014
Jun 26 16:13:10 roble kernel: task: ffffa01443499040 ti: ffffa014b24a4000
task.ti: ffffa014b24a4000
Jun 26 16:13:10 roble kernel: RIP: 0010:[<ffffffffaeb113ea>]
[<ffffffffaeb113ea>] generic_exec_single+0xfa/0x1b0
Jun 26 16:13:10 roble kernel: RSP: 0018:ffffa014b24a7c30 EFLAGS: 00000202
Jun 26 16:13:10 roble kernel: RAX: 0000000000000010 RBX: ffffa014b24a7c00
RCX: 0000000000000030
Jun 26 16:13:10 roble kernel: RDX: 000000000000ffff RSI: 0000000000000010
RDI: 0000000000000286
Jun 26 16:13:10 roble kernel: RBP: ffffa014b24a7c78 R08: ffffffffaf213640
R09: 000000018040003f
Jun 26 16:13:10 roble kernel: R10: 0000000000000001 R11: fffff83b8e91b540
R12: ffffa014b24a7bc0
Jun 26 16:13:10 roble kernel: R13: 0000000000000c9b R14: ffffa01477bf94e8
R15: ffffa01545074270
Jun 26 16:13:10 roble kernel: FS: 00007f9fefe89840(0000)
GS:ffffa0172f340000(0000) knlGS:0000000000000000
Jun 26 16:13:10 roble kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 26 16:13:10 roble kernel: CR2: 00007f9fef391e90 CR3: 000000014aa00000
CR4: 00000000000606e0
Jun 26 16:13:10 roble kernel: Call Trace:
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a4e0>] ? leave_mm+0x110/0x110
Jun 26 16:13:10 roble kernel: [<ffffffffaeb114ff>]
smp_call_function_single+0x5f/0xa0
Jun 26 16:13:10 roble kernel: [<ffffffffaed75cd5>] ?
cpumask_next_and+0x35/0x50
Jun 26 16:13:10 roble kernel: [<ffffffffaeb11aab>]
smp_call_function_many+0x22b/0x270
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a6a8>]
native_flush_tlb_others+0xb8/0xc0
Jun 26 16:13:10 roble kernel: [<ffffffffaea7a718>]
flush_tlb_mm_range+0x68/0x140
Jun 26 16:13:10 roble kernel: [<ffffffffaebe4687>]
tlb_flush_mmu.part.76+0x37/0xe0
Jun 26 16:13:10 roble kernel: [<ffffffffaebe5f85>] tlb_finish_mmu+0x55/0x60
Jun 26 16:13:10 roble kernel: [<ffffffffaebef624>] unmap_region+0xf4/0x140
Jun 26 16:13:10 roble kernel: [<ffffffffaecfc8d3>] ?
selinux_file_free_security+0x23/0x30
Jun 26 16:13:10 roble kernel: [<ffffffffaebefbe1>] ?
__vma_rb_erase+0x121/0x220
Jun 26 16:13:10 roble kernel: [<ffffffffaebf1c15>] do_munmap+0x2a5/0x480
Jun 26 16:13:10 roble kernel: [<ffffffffaebf1e55>] vm_munmap+0x65/0xb0
Jun 26 16:13:10 roble kernel: [<ffffffffaebf30e2>] SyS_munmap+0x22/0x30
Jun 26 16:13:10 roble kernel: [<ffffffffaf175ddb>]
system_call_fastpath+0x22/0x27
Jun 26 16:13:10 roble kernel: [<ffffffffaf175d21>] ?
system_call_after_swapgs+0xae/0x146
Jun 26 16:13:10 roble kernel: Code: 00 b7 01 00 48 89 de 48 03 14 c5 60 bc
74 af 48 89 df e8 4a b7 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 20 01 74 0b
0f 1f 00 f3 90 <f6> 43 20 01 75 f8 31 c0 48 8b 7c 24 28 65 48 33 3c 25 28
00 00
Jun 26 16:13:14 roble kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck
for 27s! [kworker/4:3:16530]
Jun 26 16:13:14 roble kernel: Modules linked in: binfmt_misc
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw
gf128mul glue_helper ablk_helper cryptd sg joydev parport_pc parport
virtio_rng i2c_piix4 pcspkr ip_tables xfs libcrc32c sd_mod crc_t10dif
crct10dif_generic sr_mod cdrom ata_generic virtio_console virtio_net
virtio_scsi pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel qxl
drm_kms_helper syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops ttm
drm floppy ata_piix libata virtio_pci virtio_ring virtio
drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
Jun 26 16:13:14 roble kernel: CPU: 4 PID: 16530 Comm: kworker/4:3 Kdump:
loaded Tainted: G L ------------ 3.10.0-957.21.2.el7.x86_64 #1
Jun 26 16:13:14 roble kernel: Hardware name: oVirt oVirt Node, BIOS
1.11.0-2.el7 04/01/2014
Jun 26 16:13:14 roble kernel: Workqueue: events tsc_refine_calibration_work
Jun 26 16:13:14 roble kernel: task: ffffa016acf74100 ti: ffffa014f76a8000
task.ti: ffffa014f76a8000
Jun 26 16:13:14 roble kernel: RIP: 0010:[<ffffffffaefb97e0>]
[<ffffffffaefb97e0>] acpi_pm_read_verified+0x10/0x60
Jun 26 16:13:14 roble kernel: RSP: 0018:ffffa014f76abdb8 EFLAGS: 00000202
Jun 26 16:13:14 roble kernel: RAX: 00000000000addf6 RBX: 0000000000000086
RCX: 0000000000f16644
Jun 26 16:13:14 roble kernel: RDX: 0000000000000608 RSI: 00000000002aac00
RDI: ffffffffaf971901
Jun 26 16:13:14 roble kernel: RBP: ffffa014f76abdb8 R08: 00000000aff71901
R09: 0001427407e2e9e0
Jun 26 16:13:14 roble kernel: R10: 0001427407e2e9e0 R11: 0000000000000000
R12: 0000000000000004
Jun 26 16:13:14 roble kernel: R13: ffffa014f76abd38 R14: ffffffffaf62ea00
R15: ffffa0172f313900
Jun 26 16:13:14 roble kernel: FS: 0000000000000000(0000)
GS:ffffa0172f300000(0000) knlGS:0000000000000000
Jun 26 16:13:14 roble kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 26 16:13:14 roble kernel: CR2: 0000000000448d70 CR3: 000000017789a000
CR4: 00000000000606e0
Jun 26 16:13:14 roble kernel: Call Trace:
Jun 26 16:13:14 roble kernel: [<ffffffffaea3426d>] tsc_read_refs+0x8d/0xb0
Jun 26 16:13:14 roble kernel: [<ffffffffaea344b2>]
tsc_refine_calibration_work+0x1c2/0x220
Jun 26 16:13:14 roble kernel: [<ffffffffaeab9ebf>]
process_one_work+0x17f/0x440
Jun 26 16:13:14 roble kernel: [<ffffffffaeabaf56>] worker_thread+0x126/0x3c0
Jun 26 16:13:14 roble kernel: [<ffffffffaeabae30>] ?
manage_workers.isra.25+0x2a0/0x2a0
Jun 26 16:13:14 roble kernel: [<ffffffffaeac1da1>] kthread+0xd1/0xe0
Jun 26 16:13:14 roble kernel: [<ffffffffaeac1cd0>] ?
insert_kthread_work+0x40/0x40
Jun 26 16:13:14 roble kernel: [<ffffffffaf175c37>]
ret_from_fork_nospec_begin+0x21/0x21
Jun 26 16:13:14 roble kernel: [<ffffffffaeac1cd0>] ?
insert_kthread_work+0x40/0x40
Jun 26 16:13:14 roble kernel: Code: 43 ea 74 00 30 98 fb ae c7 05 81 ea 74
00 78 00 00 00 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 8b 15 5d 74 7a 00
55 48 89 e5 ed <89> c6 81 e6 ff ff ff 00 ed 89 c1 81 e1 ff ff ff 00 ed 25
ff ff
Show replies by date