[ovirt-users] VM pauses/hangs after migration

Davide Ferrari davide at billymob.com
Thu Sep 29 14:23:03 UTC 2016


Ok, what I said is not true :( I didn't try to migrate again to the same
host that gave the initial problem, and the problem is still there. The
destination host has no HW problem (at least nothing that the system
reports, maybe I should try with an extensive memtest86) and the source
problem now has no memory issues neither. So, my question is now: how can I
debug this problem? The only difference that this host (vmhost01) has is
that it was the first host installed in my self-hosted engine installation.
But I have already reinstalled it from GUI and menawhile I've upgraded to
4.0.4 from 4.0.3.

Any idea?

2016-09-29 13:59 GMT+02:00 Davide Ferrari <davide at billymob.com>:

> Hello
>
> Today I've the faulty DIMMs replaced, started the same VM again and did
> the same migration and this time worked, so it was 100% due to that.
>
> The problem that make me wonder a bit is: if it's the source host with
> memory problem the one which blocks the correct migration, a faulty DIMM
> will force you to stop the VMs running on that host, because you cannot
> simply migrate them away to do the maintenence tasks...
>
>
> 2016-09-29 13:53 GMT+02:00 Tomas Jelinek <tjelinek at redhat.com>:
>
>>
>>
>> ----- Original Message -----
>> > From: "Davide Ferrari" <davide at billymob.com>
>> > To: "users" <users at ovirt.org>
>> > Sent: Wednesday, September 28, 2016 2:59:59 PM
>> > Subject: [ovirt-users] VM pauses/hangs after migration
>> >
>> > Hello
>> >
>> > trying to migrate a VM from one host to another, a big VM with 96GB of
>> RAM, I
>> > found that when the migration completes, the VM goes to a paused satte
>> and
>> > cannot be resumed. The libvirt/qemu log it gives is this:
>> >
>> > 2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state section
>> id
>> > 2(ram)
>> > 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed:
>> Input/output
>> > error
>> > 2016-09-28 12:18:15.872+0000: shutting down
>> > 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17,
>> package:
>> > 13.el7_2.5 (CentOS BuildSystem < http://bugs.centos.org >,
>> > 2016-06-23-14:23:27, worker1.bsys.centos.org ), qemu version: 2.3.0
>> > (qemu-kvm-ev-2.3.0-31.el7.16.1)
>> > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
>> > QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name
>> front04.billydomain.com -S
>> > -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX -m
>> > size=100663296k,slots=16,maxmem=4294967296k -realtime mlock=off -smp
>> > 32,sockets=16,cores=1,threads=2 -numa node,nodeid=0,cpus=0-31,mem=98304
>> > -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios
>> > type=1,manufacturer=oVirt,product=oVirt
>> > Node,version=7-2.1511.el7.centos.2.10,serial=4C4C4544-004A-
>> 3310-8054-B2C04F474432,uuid=4511d1c0-6607-418f-ae75-34f605b2ad68
>> > -no-user-config -nodefaults -chardev
>> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/
>> > domain-front04.billydomain.com/monitor.sock,server,nowait -mon
>> > chardev=charmonitor,id=monitor,mode=control -rtc
>> > base=2016-09-28T14:22:21,driftfix=slew -global
>> > kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on
>> > -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
>> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device
>> > virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4
>> -drive
>> > if=none,id=drive-ide0-1-0,readonly=on,format=raw -device
>> > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive
>> > file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/
>> ba2bd397-9222-424d-aecc-eb652c0169d9/images/b5b49d5c-
>> 2378-4639-9469-362e37ae7473/24fd0d3c-309b-458d-9818-
>> 4321023afacf,if=none,id=drive-virtio-disk0,format=qcow2,
>> serial=b5b49d5c-2378-4639-9469-362e37ae7473,cache=none,
>> werror=stop,rerror=stop,aio=threads
>> > -device
>> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virti
>> o-disk0,id=virtio-disk0,bootindex=1
>> > -drive
>> > file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/
>> ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-
>> 52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-
>> 18a67615c527,if=none,id=drive-virtio-disk1,format=raw,
>> serial=f02ac1ce-52cd-4b81-8b29-f8006d0469e0,cache=none,
>> werror=stop,rerror=stop,aio=threads
>> > -device
>> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virti
>> o-disk1,id=virtio-disk1
>> > -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
>> > virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:
>> 56,bus=pci.0,addr=0x3
>> > -chardev
>> > socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4
>> 511d1c0-6607-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait
>> > -device
>> > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel
>> 0,id=channel0,name=com.redhat.rhevm.vdsm
>> > -chardev
>> > socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4
>> 511d1c0-6607-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait
>> > -device
>> > virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel
>> 1,id=channel1,name=org.qemu.guest_agent.0
>> > -chardev spicevmc,id=charchannel2,name=vdagent -device
>> > virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel
>> 2,id=channel2,name=com.redhat.spice.0
>> > -vnc 192.168.10.225:1 ,password -k es -spice
>> > tls-port=5902,addr=192.168.10.225,x509-dir=/etc/pki/vdsm/lib
>> virt-spice,tls-channel=default,tls-channel=main,tls-channel=
>> display,tls-channel=inputs,tls-channel=cursor,tls-channel
>> =playback,tls-channel=record,tls-channel=smartcard,tls-
>> channel=usbredir,seamless-migration=on
>> > -k es -device
>> > qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vgamem
>> _mb=16,bus=pci.0,addr=0x2
>> > -incoming tcp: 0.0.0.0:49156 -device
>> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
>> > Domain id=5 is tainted: hook-script
>> > red_dispatcher_loadvm_commands:
>> > KVM: entry failed, hardware error 0x8
>> > RAX=00000000ffffffed RBX=ffff8817ba00c000 RCX=0100000000000000
>> > RDX=0000000000000000
>> > RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00fe98
>> > RSP=ffff8817ba00fe98
>> > R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
>> > R11=0000000000000000
>> > R12=0000000000000006 R13=ffff8817ba00c000 R14=ffff8817ba00c000
>> > R15=0000000000000000
>> > RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> > ES =0000 0000000000000000 ffffffff 00000000
>> > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> > DS =0000 0000000000000000 ffffffff 00000000
>> > FS =0000 0000000000000000 ffffffff 00000000
>> > GS =0000 ffff8817def80000 ffffffff 00000000
>> > LDT=0000 0000000000000000 ffffffff 00000000
>> > TR =0040 ffff8817def93b80 00002087 00008b00 DPL=0 TSS64-busy
>> > GDT= ffff8817def89000 0000007f
>> > IDT= ffffffffff529000 00000fff
>> > CR0=80050033 CR2=00000000ffffffff CR3=00000017b725b000 CR4=001406e0
>> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>> > DR3=0000000000000000
>> > DR6=00000000ffff0ff0 DR7=0000000000000400
>> > EFER=0000000000000d01
>> > Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d>
>> c3 0f
>> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55
>> 49
>> > 89 ca
>> > KVM: entry failed, hardware error 0x8
>> > RAX=00000000ffffffed RBX=ffff8817ba008000 RCX=0100000000000000
>> > RDX=0000000000000000
>> > RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00be98
>> > RSP=ffff8817ba00be98
>> > R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
>> > R11=0000000000000000
>> > R12=0000000000000005 R13=ffff8817ba008000 R14=ffff8817ba008000
>> > R15=0000000000000000
>> > RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> > ES =0000 0000000000000000 ffffffff 00000000
>> > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> > DS =0000 0000000000000000 ffffffff 00000000
>> > FS =0000 0000000000000000 ffffffff 00000000
>> > GS =0000 ffff8817def40000 ffffffff 00000000
>> > LDT=0000 0000000000000000 ffffffff 00000000
>> > TR =0040 ffff8817def53b80 00002087 00008b00 DPL=0 TSS64-busy
>> > GDT= ffff8817def49000 0000007f
>> > IDT= ffffffffff529000 00000fff
>> > CR0=80050033 CR2=00000000ffffffff CR3=00000017b3c9a000 CR4=001406e0
>> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>> > DR3=0000000000000000
>> > DR6=00000000ffff0ff0 DR7=0000000000000400
>> > EFER=0000000000000d01
>> > Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d>
>> c3 0f
>> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55
>> 49
>> > 89 ca
>> > KVM: entry failed, hardware error 0x80000021
>> >
>> > If you're running a guest on an Intel machine without unrestricted mode
>> > support, the failure can be most likely due to the guest entering an
>> invalid
>> > state for Intel VT. For example, the guest maybe running in big real
>> mode
>> > which is not supported on less recent Intel processors.
>> >
>> > EAX=ffffffed EBX=ba020000 ECX=00000000 EDX=00000000
>> > ESI=00000000 EDI=00000046 EBP=ba023e98 ESP=ba023e98
>> > EIP=81058e96 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> > ES =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
>> > CS =f000 ffff0000 0000ffff 00009b00 DPL=0 CS16 [-RA]
>> > SS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
>> > DS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
>> > FS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
>> > GS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]
>> > LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
>> > TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> > GDT= 0000000000000000 0000ffff
>> > IDT= 0000000000000000 0000ffff
>> > CR0=80050033 CR2=00007fd826ac20a0 CR3=000000003516c000 CR4=00140060
>> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>> > DR3=0000000000000000
>> > DR6=00000000ffff0ff0 DR7=0000000000000400
>> > EFER=0000000000000d01
>> > Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??>
>> ?? ??
>> > ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> ??
>> > ?? ??
>> >
>> >
>> > Searching for errors like this I found some bug report about kernel
>> issues
>> > but I don't think it's the case, other VMs spawned from the same image
>> > migrate without any issue. I have toi say that the original host
>> running the
>> > VM has some RAM problem (ECC multibit fault in one DIMM). Maybe that's
>> the
>> > problem?
>>
>> that seems quite likely. If you run the same VM on a different host and
>> try to migrate
>> it, does it work?
>>
>> > How can I properly read this error log?
>> >
>> > Thanks
>> >
>> > --
>> > Davide Ferrari
>> > Senior Systems Engineer
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>>
>
>
>
> --
> Davide Ferrari
> Senior Systems Engineer
>



-- 
Davide Ferrari
Senior Systems Engineer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160929/b57f0c78/attachment-0001.html>


More information about the Users mailing list