[ovirt-users] VM pauses/hangs after migration

Michal Skrivanek michal.skrivanek at redhat.com
Fri Sep 30 13:35:04 UTC 2016


> On 29 Sep 2016, at 16:23, Davide Ferrari <davide at billymob.com> wrote:
> 
> Ok, what I said is not true :( I didn't try to migrate again to the same host that gave the initial problem, and the problem is still there. The destination host has no HW problem (at least nothing that the system reports, maybe I should try with an extensive memtest86) and the source problem now has no memory issues neither. So, my question is now: how can I debug this problem?

that is a very low level error really pointing at HW issues. It may or may not be detected by memtest…but I would give it a try

> The only difference that this host (vmhost01) has is that it was the first host installed in my self-hosted engine installation. But I have already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from 4.0.3.

does it happen only for the big 96GB VM? The others which you said are working, are they all small?
Might be worth trying other system stability tests, playing with safer/slower settings in BIOS, use lower CPU cluster, etc

> 
> Any idea?
> 
> 2016-09-29 13:59 GMT+02:00 Davide Ferrari <davide at billymob.com <mailto:davide at billymob.com>>:
> Hello
> 
> Today I've the faulty DIMMs replaced, started the same VM again and did the same migration and this time worked, so it was 100% due to that.
> 
> The problem that make me wonder a bit is: if it's the source host with memory problem the one which blocks the correct migration, a faulty DIMM will force you to stop the VMs running on that host, because you cannot simply migrate them away to do the maintenence tasks...
> 
> 
> 2016-09-29 13:53 GMT+02:00 Tomas Jelinek <tjelinek at redhat.com <mailto:tjelinek at redhat.com>>:
> 
> 
> ----- Original Message -----
> > From: "Davide Ferrari" <davide at billymob.com <mailto:davide at billymob.com>>
> > To: "users" <users at ovirt.org <mailto:users at ovirt.org>>
> > Sent: Wednesday, September 28, 2016 2:59:59 PM
> > Subject: [ovirt-users] VM pauses/hangs after migration
> >
> > Hello
> >
> > trying to migrate a VM from one host to another, a big VM with 96GB of RAM, I
> > found that when the migration completes, the VM goes to a paused satte and
> > cannot be resumed. The libvirt/qemu log it gives is this:
> >
> > 2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state section id
> > 2(ram)
> > 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: Input/output
> > error
> > 2016-09-28 12:18:15.872+0000: shutting down
> > 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, package:
> > 13.el7_2.5 (CentOS BuildSystem < http://bugs.centos.org <http://bugs.centos.org/> >,
> > 2016-06-23-14 <tel:2016-06-23-14>:23:27, worker1.bsys.centos.org <http://worker1.bsys.centos.org/> ), qemu version: 2.3.0
> > (qemu-kvm-ev-2.3.0-31.el7.16.1)
> > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> > QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name front04.billydomain.com <http://front04.billydomain.com/> -S
> > -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX -m
> > size=100663296k,slots=16,maxmem=4294967296k -realtime mlock=off -smp
> > 32,sockets=16,cores=1,threads=2 -numa node,nodeid=0,cpus=0-31,mem=98304
> > -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios
> > type=1,manufacturer=oVirt,product=oVirt
> > Node,version=7-2.1511.el7.centos.2.10,serial=4C4C4544-004A-3310-8054-B2C04F474432,uuid=4511d1c0-6607-418f-ae75-34f605b2ad68
> > -no-user-config -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/
> > domain-front04.billydomain.com/monitor.sock,server,nowait <http://domain-front04.billydomain.com/monitor.sock,server,nowait> -mon
> > chardev=charmonitor,id=monitor,mode=control -rtc
> > base=2016-09-28T14:22:21,driftfix=slew -global
> > kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on
> > -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device
> > virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive
> > if=none,id=drive-ide0-1-0,readonly=on,format=raw -device
> > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive
> > file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/b5b49d5c-2378-4639-9469-362e37ae7473/24fd0d3c-309b-458d-9818-4321023afacf,if=none,id=drive-virtio-disk0,format=qcow2,serial=b5b49d5c-2378-4639-9469-362e37ae7473,cache=none,werror=stop,rerror=stop,aio=threads
> > -device
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> > -drive
> > file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527,if=none,id=drive-virtio-disk1,format=raw,serial=f02ac1ce-52cd-4b81-8b29-f8006d0469e0,cache=none,werror=stop,rerror=stop,aio=threads
> > -device
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
> > -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
> > virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:56,bus=pci.0,addr=0x3
> > -chardev
> > socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4511d1c0-6607-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait
> > -device
> > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm
> > -chardev
> > socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4511d1c0-6607-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait
> > -device
> > virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
> > -chardev spicevmc,id=charchannel2,name=vdagent -device
> > virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0
> > -vnc 192.168.10.225:1 <http://192.168.10.225:1/> ,password -k es -spice
> > tls-port=5902,addr=192.168.10.225,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on
> > -k es -device
> > qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vgamem_mb=16,bus=pci.0,addr=0x2
> > -incoming tcp: 0.0.0.0:49156 <http://0.0.0.0:49156/> -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
> > Domain id=5 is tainted: hook-script
> > red_dispatcher_loadvm_commands:
> > KVM: entry failed, hardware error 0x8
> > RAX=00000000ffffffed RBX=ffff8817ba00c000 RCX=0100000000000000
> > RDX=0000000000000000
> > RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00fe98
> > RSP=ffff8817ba00fe98
> > R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
> > R11=0000000000000000
> > R12=0000000000000006 R13=ffff8817ba00c000 R14=ffff8817ba00c000
> > R15=0000000000000000
> > RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0000 0000000000000000 ffffffff 00000000
> > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> > DS =0000 0000000000000000 ffffffff 00000000
> > FS =0000 0000000000000000 ffffffff 00000000
> > GS =0000 ffff8817def80000 ffffffff 00000000
> > LDT=0000 0000000000000000 ffffffff 00000000
> > TR =0040 ffff8817def93b80 00002087 00008b00 DPL=0 TSS64-busy
> > GDT= ffff8817def89000 0000007f
> > IDT= ffffffffff529000 00000fff
> > CR0=80050033 CR2=00000000ffffffff CR3=00000017b725b000 CR4=001406e0
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000d01
> > Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f
> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49
> > 89 ca
> > KVM: entry failed, hardware error 0x8
> > RAX=00000000ffffffed RBX=ffff8817ba008000 RCX=0100000000000000
> > RDX=0000000000000000
> > RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00be98
> > RSP=ffff8817ba00be98
> > R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
> > R11=0000000000000000
> > R12=0000000000000005 R13=ffff8817ba008000 R14=ffff8817ba008000
> > R15=0000000000000000
> > RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0000 0000000000000000 ffffffff 00000000
> > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> > DS =0000 0000000000000000 ffffffff 00000000
> > FS =0000 0000000000000000 ffffffff 00000000
> > GS =0000 ffff8817def40000 ffffffff 00000000
> > LDT=0000 0000000000000000 ffffffff 00000000
> > TR =0040 ffff8817def53b80 00002087 00008b00 DPL=0 TSS64-busy
> > GDT= ffff8817def49000 0000007f
> > IDT= ffffffffff529000 00000fff
> > CR0=80050033 CR2=00000000ffffffff CR3=00000017b3c9a000 CR4=001406e0
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000d01
> > Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f
> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49
> > 89 ca
> > KVM: entry failed, hardware error 0x80000021
> >
> > If you're running a guest on an Intel machine without unrestricted mode
> > support, the failure can be most likely due to the guest entering an invalid
> > state for Intel VT. For example, the guest maybe running in big real mode
> > which is not supported on less recent Intel processors.
> >
> > EAX=ffffffed EBX=ba020000 ECX=00000000 EDX=00000000
> > ESI=00000000 EDI=00000046 EBP=ba023e98 ESP=ba023e98
> > EIP=81058e96 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
> > CS =f000 ffff0000 0000ffff 00009b00 DPL=0 CS16 [-RA]
> > SS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
> > DS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
> > FS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
> > GS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
> > LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
> > TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS64-busy
> > GDT= 0000000000000000 0000ffff
> > IDT= 0000000000000000 0000ffff
> > CR0=80050033 CR2=00007fd826ac20a0 CR3=000000003516c000 CR4=00140060
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000d01
> > Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ??
> > ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> > ?? ??
> >
> >
> > Searching for errors like this I found some bug report about kernel issues
> > but I don't think it's the case, other VMs spawned from the same image
> > migrate without any issue. I have toi say that the original host running the
> > VM has some RAM problem (ECC multibit fault in one DIMM). Maybe that's the
> > problem?
> 
> that seems quite likely. If you run the same VM on a different host and try to migrate
> it, does it work?
> 
> > How can I properly read this error log?
> >
> > Thanks
> >
> > --
> > Davide Ferrari
> > Senior Systems Engineer
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org <mailto:Users at ovirt.org>
> > http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
> >
> 
> 
> 
> -- 
> Davide Ferrari
> Senior Systems Engineer
> 
> 
> 
> -- 
> Davide Ferrari
> Senior Systems Engineer
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160930/c2901782/attachment-0001.html>


More information about the Users mailing list