VM pauses/hangs after migration

Hello trying to migrate a VM from one host to another, a big VM with 96GB of RAM, I found that when the migration completes, the VM goes to a paused satte and cannot be resumed. The libvirt/qemu log it gives is this: 2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state section id 2(ram) 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: Input/output error 2016-09-28 12:18:15.872+0000: shutting down 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, package: 13.el7_2.5 (CentOS BuildSystem <http://bugs.centos.org>, 2016-06-23-14:23:27, worker1.bsys.centos.org), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7.16.1) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name front04.billydomain.com -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX -m size=100663296k,slots=16,maxmem=4294967296k -realtime mlock=off -smp 32,sockets=16,cores=1,threads=2 -numa node,nodeid=0,cpus=0-31,mem=98304 -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=4C4C4544-004A-3310-8054-B2C04F474432,uuid=4511d1c0-6607-418f-ae75-34f605b2ad68 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ domain-front04.billydomain.com/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-09-28T14:22:21,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/b5b49d5c-2378-4639-9469-362e37ae7473/24fd0d3c-309b-458d-9818-4321023afacf,if=none,id=drive-virtio-disk0,format=qcow2,serial=b5b49d5c-2378-4639-9469-362e37ae7473,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527,if=none,id=drive-virtio-disk1,format=raw,serial=f02ac1ce-52cd-4b81-8b29-f8006d0469e0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:56,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4511d1c0-6607-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4511d1c0-6607-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 192.168.10.225:1,password -k es -spice tls-port=5902,addr=192.168.10.225,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k es -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming tcp:0.0.0.0:49156 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on Domain id=5 is tainted: hook-script red_dispatcher_loadvm_commands: KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba00c000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00fe98 RSP=ffff8817ba00fe98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000006 R13=ffff8817ba00c000 R14=ffff8817ba00c000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def80000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def93b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def89000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b725b000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba008000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00be98 RSP=ffff8817ba00be98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000005 R13=ffff8817ba008000 R14=ffff8817ba008000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def40000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def53b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def49000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b3c9a000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=ffffffed EBX=ba020000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000046 EBP=ba023e98 ESP=ba023e98 EIP=81058e96 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] CS =f000 ffff0000 0000ffff 00009b00 DPL=0 CS16 [-RA] SS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] DS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] FS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] GS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 0000000000000000 0000ffff IDT= 0000000000000000 0000ffff CR0=80050033 CR2=00007fd826ac20a0 CR3=000000003516c000 CR4=00140060 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? Searching for errors like this I found some bug report about kernel issues but I don't think it's the case, other VMs spawned from the same image migrate without any issue. I have toi say that the original host running the VM has some RAM problem (ECC multibit fault in one DIMM). Maybe that's the problem? How can I properly read this error log? Thanks -- Davide Ferrari Senior Systems Engineer

----- Original Message -----
From: "Davide Ferrari" <davide@billymob.com> To: "users" <users@ovirt.org> Sent: Wednesday, September 28, 2016 2:59:59 PM Subject: [ovirt-users] VM pauses/hangs after migration
Hello
trying to migrate a VM from one host to another, a big VM with 96GB of RAM, I found that when the migration completes, the VM goes to a paused satte and cannot be resumed. The libvirt/qemu log it gives is this:
2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state section id 2(ram) 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: Input/output error 2016-09-28 12:18:15.872+0000: shutting down 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, package: 13.el7_2.5 (CentOS BuildSystem < http://bugs.centos.org >, 2016-06-23-14:23:27, worker1.bsys.centos.org ), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7.16.1) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name front04.billydomain.com -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX -m size=100663296k,slots=16,maxmem=4294967296k -realtime mlock=off -smp 32,sockets=16,cores=1,threads=2 -numa node,nodeid=0,cpus=0-31,mem=98304 -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=4C4C4544-004A-3310-8054-B2C04F474432,uuid=4511d1c0-6607-418f-ae75-34f605b2ad68 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ domain-front04.billydomain.com/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-09-28T14:22:21,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/b5b49d5c-2378-4639-9469-362e37ae7473/24fd0d3c-309b-458d-9818-4321023afacf,if=none,id=drive-virtio-disk0,format=qcow2,serial=b5b49d5c-2378-4639-9469-362e37ae7473,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527,if=none,id=drive-virtio-disk1,format=raw,serial=f02ac1ce-52cd-4b81-8b29-f8006d0469e0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:56,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4511d1c0-6607-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4511d1c0-6607-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 192.168.10.225:1 ,password -k es -spice tls-port=5902,addr=192.168.10.225,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k es -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming tcp: 0.0.0.0:49156 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on Domain id=5 is tainted: hook-script red_dispatcher_loadvm_commands: KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba00c000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00fe98 RSP=ffff8817ba00fe98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000006 R13=ffff8817ba00c000 R14=ffff8817ba00c000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def80000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def93b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def89000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b725b000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba008000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00be98 RSP=ffff8817ba00be98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000005 R13=ffff8817ba008000 R14=ffff8817ba008000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def40000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def53b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def49000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b3c9a000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors.
EAX=ffffffed EBX=ba020000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000046 EBP=ba023e98 ESP=ba023e98 EIP=81058e96 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] CS =f000 ffff0000 0000ffff 00009b00 DPL=0 CS16 [-RA] SS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] DS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] FS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] GS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 0000000000000000 0000ffff IDT= 0000000000000000 0000ffff CR0=80050033 CR2=00007fd826ac20a0 CR3=000000003516c000 CR4=00140060 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Searching for errors like this I found some bug report about kernel issues but I don't think it's the case, other VMs spawned from the same image migrate without any issue. I have toi say that the original host running the VM has some RAM problem (ECC multibit fault in one DIMM). Maybe that's the problem?
that seems quite likely. If you run the same VM on a different host and try to migrate it, does it work?
How can I properly read this error log?
Thanks
-- Davide Ferrari Senior Systems Engineer
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello Today I've the faulty DIMMs replaced, started the same VM again and did the same migration and this time worked, so it was 100% due to that. The problem that make me wonder a bit is: if it's the source host with memory problem the one which blocks the correct migration, a faulty DIMM will force you to stop the VMs running on that host, because you cannot simply migrate them away to do the maintenence tasks... 2016-09-29 13:53 GMT+02:00 Tomas Jelinek <tjelinek@redhat.com>:
From: "Davide Ferrari" <davide@billymob.com> To: "users" <users@ovirt.org> Sent: Wednesday, September 28, 2016 2:59:59 PM Subject: [ovirt-users] VM pauses/hangs after migration
Hello
trying to migrate a VM from one host to another, a big VM with 96GB of RAM, I found that when the migration completes, the VM goes to a paused satte and cannot be resumed. The libvirt/qemu log it gives is this:
2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state section id 2(ram) 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: Input/output error 2016-09-28 12:18:15.872+0000: shutting down 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17,
13.el7_2.5 (CentOS BuildSystem < http://bugs.centos.org >, 2016-06-23-14:23:27, worker1.bsys.centos.org ), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7.16.1) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name front04.billydomain.com -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX -m size=100663296k,slots=16,maxmem=4294967296k -realtime mlock=off -smp 32,sockets=16,cores=1,threads=2 -numa node,nodeid=0,cpus=0-31,mem=98304 -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=4C4C4544- 004A-3310-8054-B2C04F474432,uuid=4511d1c0-6607-418f-ae75-34f605b2ad68 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ domain-front04.billydomain.com/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-09-28T14:22:21,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000001-0001-0001-0001- 0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/ b5b49d5c-2378-4639-9469-362e37ae7473/24fd0d3c-309b- 458d-9818-4321023afacf,if=none,id=drive-virtio-disk0,
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive- virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/rhev/data-center/00000001-0001-0001-0001- 0000000003e3/ba2bd397-9222-424d-aecc-eb652c0169d9/images/ f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084- 4234-80a1-18a67615c527,if=none,id=drive-virtio-disk1,
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive- virtio-disk1,id=virtio-disk1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a: 16:01:56,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/ 4511d1c0-6607-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev= charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/ 4511d1c0-6607-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev= charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev= charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 192.168.10.225:1 ,password -k es -spice tls-port=5902,addr=192.168.10.225,x509-dir=/etc/pki/vdsm/
-k es -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608, vgamem_mb=16,bus=pci.0,addr=0x2 -incoming tcp: 0.0.0.0:49156 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on Domain id=5 is tainted: hook-script red_dispatcher_loadvm_commands: KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba00c000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00fe98 RSP=ffff8817ba00fe98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000006 R13=ffff8817ba00c000 R14=ffff8817ba00c000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def80000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def93b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def89000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b725b000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba008000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00be98 RSP=ffff8817ba00be98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000005 R13=ffff8817ba008000 R14=ffff8817ba008000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def40000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def53b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def49000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b3c9a000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors.
EAX=ffffffed EBX=ba020000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000046 EBP=ba023e98 ESP=ba023e98 EIP=81058e96 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] CS =f000 ffff0000 0000ffff 00009b00 DPL=0 CS16 [-RA] SS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] DS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] FS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] GS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 0000000000000000 0000ffff IDT= 0000000000000000 0000ffff CR0=80050033 CR2=00007fd826ac20a0 CR3=000000003516c000 CR4=00140060 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Searching for errors like this I found some bug report about kernel issues but I don't think it's the case, other VMs spawned from the same image migrate without any issue. I have toi say that the original host running
VM has some RAM problem (ECC multibit fault in one DIMM). Maybe that's
----- Original Message ----- package: format=qcow2,serial=b5b49d5c-2378-4639-9469-362e37ae7473, cache=none,werror=stop,rerror=stop,aio=threads format=raw,serial=f02ac1ce-52cd-4b81-8b29-f8006d0469e0, cache=none,werror=stop,rerror=stop,aio=threads libvirt-spice,tls-channel=default,tls-channel=main,tls- channel=display,tls-channel=inputs,tls-channel=cursor,tls- channel=playback,tls-channel=record,tls-channel=smartcard, tls-channel=usbredir,seamless-migration=on the the
problem?
that seems quite likely. If you run the same VM on a different host and try to migrate it, does it work?
How can I properly read this error log?
Thanks
-- Davide Ferrari Senior Systems Engineer
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Davide Ferrari Senior Systems Engineer

--Apple-Mail=_EE727373-9FB9-4438-9111-AC303E5B3BE5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 29 Sep 2016, at 13:59, Davide Ferrari <davide@billymob.com> wrote: =20 Hello =20 Today I've the faulty DIMMs replaced, started the same VM again and = did the same migration and this time worked, so it was 100% due to that. =20 The problem that make me wonder a bit is: if it's the source host with = memory problem the one which blocks the correct migration, a faulty DIMM = will force you to stop the VMs running on that host, because you cannot = simply migrate them away to do the maintenence tasks=E2=80=A6
=20 =20 2016-09-29 13:53 GMT+02:00 Tomas Jelinek <tjelinek@redhat.com = <mailto:tjelinek@redhat.com>>: =20 =20 ----- Original Message -----
From: "Davide Ferrari" <davide@billymob.com = <mailto:davide@billymob.com>> To: "users" <users@ovirt.org <mailto:users@ovirt.org>> Sent: Wednesday, September 28, 2016 2:59:59 PM Subject: [ovirt-users] VM pauses/hangs after migration
Hello
trying to migrate a VM from one host to another, a big VM with 96GB = of RAM, I found that when the migration completes, the VM goes to a paused = satte and cannot be resumed. The libvirt/qemu log it gives is this:
2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state =
2(ram) 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: = Input/output error 2016-09-28 12:18:15.872+0000: shutting down 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, =
if you have a faulty hw you should do that ASAP as you never know where = it is going to affect you. It=E2=80=99s like with disk errors=E2=80=A6you = may think it=E2=80=99s ok when you rarely write to certain places, but = once you try to copy it off the problematic storage and you read every = single byte/location you=E2=80=99re screwed=E2=80=A6 Thanks, michal section id package:
13.el7_2.5 (CentOS BuildSystem < http://bugs.centos.org = <http://bugs.centos.org/> >, 2016-06-23-14 <tel:2016-06-23-14>:23:27, worker1.bsys.centos.org = <http://worker1.bsys.centos.org/> ), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7.16.1) LC_ALL=3DC PATH=3D/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=3Dspice /usr/libexec/qemu-kvm -name = front04.billydomain.com <http://front04.billydomain.com/> -S -machine pc-i440fx-rhel7.2.0,accel=3Dkvm,usb=3Doff -cpu = Haswell-noTSX -m size=3D100663296k,slots=3D16,maxmem=3D4294967296k -realtime = mlock=3Doff -smp 32,sockets=3D16,cores=3D1,threads=3D2 -numa = node,nodeid=3D0,cpus=3D0-31,mem=3D98304 -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios type=3D1,manufacturer=3DoVirt,product=3DoVirt = Node,version=3D7-2.1511.el7.centos.2.10,serial=3D4C4C4544-004A-3310-8054-B= 2C04F474432,uuid=3D4511d1c0-6607-418f-ae75-34f605b2ad68 -no-user-config -nodefaults -chardev socket,id=3Dcharmonitor,path=3D/var/lib/libvirt/qemu/ domain-front04.billydomain.com/monitor.sock,server,nowait = <http://domain-front04.billydomain.com/monitor.sock,server,nowait> -mon chardev=3Dcharmonitor,id=3Dmonitor,mode=3Dcontrol -rtc base=3D2016-09-28T14:22:21,driftfix=3Dslew -global kvm-pit.lost_tick_policy=3Ddiscard -no-hpet -no-shutdown -boot = strict=3Don -device piix3-usb-uhci,id=3Dusb,bus=3Dpci.0,addr=3D0x1.0x2 -device virtio-scsi-pci,id=3Dscsi0,bus=3Dpci.0,addr=3D0x7 -device virtio-serial-pci,id=3Dvirtio-serial0,max_ports=3D16,bus=3Dpci.0,addr=3D= 0x4 -drive if=3Dnone,id=3Ddrive-ide0-1-0,readonly=3Don,format=3Draw -device ide-cd,bus=3Dide.1,unit=3D0,drive=3Ddrive-ide0-1-0,id=3Dide0-1-0 = -drive = file=3D/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-922= 2-424d-aecc-eb652c0169d9/images/b5b49d5c-2378-4639-9469-362e37ae7473/24fd0= d3c-309b-458d-9818-4321023afacf,if=3Dnone,id=3Ddrive-virtio-disk0,format=3D= qcow2,serial=3Db5b49d5c-2378-4639-9469-362e37ae7473,cache=3Dnone,werror=3D= stop,rerror=3Dstop,aio=3Dthreads -device = virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0x5,drive=3Ddrive-virtio-disk= 0,id=3Dvirtio-disk0,bootindex=3D1 -drive = file=3D/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-922= 2-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e4= 9c6-3084-4234-80a1-18a67615c527,if=3Dnone,id=3Ddrive-virtio-disk1,format=3D= raw,serial=3Df02ac1ce-52cd-4b81-8b29-f8006d0469e0,cache=3Dnone,werror=3Dst= op,rerror=3Dstop,aio=3Dthreads -device = virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0x8,drive=3Ddrive-virtio-disk= 1,id=3Dvirtio-disk1 -netdev tap,fd=3D30,id=3Dhostnet0,vhost=3Don,vhostfd=3D31 -device = virtio-net-pci,netdev=3Dhostnet0,id=3Dnet0,mac=3D00:1a:4a:16:01:56,bus=3Dp= ci.0,addr=3D0x3 -chardev = socket,id=3Dcharchannel0,path=3D/var/lib/libvirt/qemu/channels/4511d1c0-66= 07-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait -device = virtserialport,bus=3Dvirtio-serial0.0,nr=3D1,chardev=3Dcharchannel0,id=3Dc= hannel0,name=3Dcom.redhat.rhevm.vdsm -chardev = socket,id=3Dcharchannel1,path=3D/var/lib/libvirt/qemu/channels/4511d1c0-66= 07-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait -device = virtserialport,bus=3Dvirtio-serial0.0,nr=3D2,chardev=3Dcharchannel1,id=3Dc= hannel1,name=3Dorg.qemu.guest_agent.0 -chardev spicevmc,id=3Dcharchannel2,name=3Dvdagent -device = virtserialport,bus=3Dvirtio-serial0.0,nr=3D3,chardev=3Dcharchannel2,id=3Dc= hannel2,name=3Dcom.redhat.spice.0 -vnc 192.168.10.225:1 <http://192.168.10.225:1/> ,password -k es = -spice = tls-port=3D5902,addr=3D192.168.10.225,x509-dir=3D/etc/pki/vdsm/libvirt-spi= ce,tls-channel=3Ddefault,tls-channel=3Dmain,tls-channel=3Ddisplay,tls-chan= nel=3Dinputs,tls-channel=3Dcursor,tls-channel=3Dplayback,tls-channel=3Drec= ord,tls-channel=3Dsmartcard,tls-channel=3Dusbredir,seamless-migration=3Don=
-k es -device = qxl-vga,id=3Dvideo0,ram_size=3D67108864,vram_size=3D8388608,vgamem_mb=3D16= ,bus=3Dpci.0,addr=3D0x2 -incoming tcp: 0.0.0.0:49156 <http://0.0.0.0:49156/> -device virtio-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D0x6 -msg = timestamp=3Don Domain id=3D5 is tainted: hook-script red_dispatcher_loadvm_commands: KVM: entry failed, hardware error 0x8 RAX=3D00000000ffffffed RBX=3Dffff8817ba00c000 RCX=3D0100000000000000 RDX=3D0000000000000000 RSI=3D0000000000000000 RDI=3D0000000000000046 RBP=3Dffff8817ba00fe98 RSP=3Dffff8817ba00fe98 R8 =3D0000000000000000 R9 =3D0000000000000000 R10=3D0000000000000000 R11=3D0000000000000000 R12=3D0000000000000006 R13=3Dffff8817ba00c000 R14=3Dffff8817ba00c000 R15=3D0000000000000000 RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 A20=3D1= SMM=3D0 HLT=3D0 ES =3D0000 0000000000000000 ffffffff 00000000 CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA] SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA] DS =3D0000 0000000000000000 ffffffff 00000000 FS =3D0000 0000000000000000 ffffffff 00000000 GS =3D0000 ffff8817def80000 ffffffff 00000000 LDT=3D0000 0000000000000000 ffffffff 00000000 TR =3D0040 ffff8817def93b80 00002087 00008b00 DPL=3D0 TSS64-busy GDT=3D ffff8817def89000 0000007f IDT=3D ffffffffff529000 00000fff CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b725b000 = CR4=3D001406e0 DR0=3D0000000000000000 DR1=3D0000000000000000 DR2=3D0000000000000000 DR3=3D0000000000000000 DR6=3D00000000ffff0ff0 DR7=3D0000000000000400 EFER=3D0000000000000d01 Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 = 55 49 89 ca KVM: entry failed, hardware error 0x8 RAX=3D00000000ffffffed RBX=3Dffff8817ba008000 RCX=3D0100000000000000 RDX=3D0000000000000000 RSI=3D0000000000000000 RDI=3D0000000000000046 RBP=3Dffff8817ba00be98 RSP=3Dffff8817ba00be98 R8 =3D0000000000000000 R9 =3D0000000000000000 R10=3D0000000000000000 R11=3D0000000000000000 R12=3D0000000000000005 R13=3Dffff8817ba008000 R14=3Dffff8817ba008000 R15=3D0000000000000000 RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 A20=3D1= SMM=3D0 HLT=3D0 ES =3D0000 0000000000000000 ffffffff 00000000 CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA] SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA] DS =3D0000 0000000000000000 ffffffff 00000000 FS =3D0000 0000000000000000 ffffffff 00000000 GS =3D0000 ffff8817def40000 ffffffff 00000000 LDT=3D0000 0000000000000000 ffffffff 00000000 TR =3D0040 ffff8817def53b80 00002087 00008b00 DPL=3D0 TSS64-busy GDT=3D ffff8817def49000 0000007f IDT=3D ffffffffff529000 00000fff CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b3c9a000 = CR4=3D001406e0 DR0=3D0000000000000000 DR1=3D0000000000000000 DR2=3D0000000000000000 DR3=3D0000000000000000 DR6=3D00000000ffff0ff0 DR7=3D0000000000000400 EFER=3D0000000000000d01 Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 = 55 49 89 ca KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted = mode support, the failure can be most likely due to the guest entering an = invalid state for Intel VT. For example, the guest maybe running in big real = mode which is not supported on less recent Intel processors.
EAX=3Dffffffed EBX=3Dba020000 ECX=3D00000000 EDX=3D00000000 ESI=3D00000000 EDI=3D00000046 EBP=3Dba023e98 ESP=3Dba023e98 EIP=3D81058e96 EFL=3D00000002 [-------] CPL=3D0 II=3D0 A20=3D1 SMM=3D0= HLT=3D0 ES =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] CS =3Df000 ffff0000 0000ffff 00009b00 DPL=3D0 CS16 [-RA] SS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] DS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] FS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] GS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] LDT=3D0000 00000000 0000ffff 00008200 DPL=3D0 LDT TR =3D0000 00000000 0000ffff 00008b00 DPL=3D0 TSS64-busy GDT=3D 0000000000000000 0000ffff IDT=3D 0000000000000000 0000ffff CR0=3D80050033 CR2=3D00007fd826ac20a0 CR3=3D000000003516c000 = CR4=3D00140060 DR0=3D0000000000000000 DR1=3D0000000000000000 DR2=3D0000000000000000 DR3=3D0000000000000000 DR6=3D00000000ffff0ff0 DR7=3D0000000000000400 EFER=3D0000000000000d01 Code=3D?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = ?? ?? ?? ??
Searching for errors like this I found some bug report about kernel = issues but I don't think it's the case, other VMs spawned from the same = image migrate without any issue. I have toi say that the original host = running the VM has some RAM problem (ECC multibit fault in one DIMM). Maybe = that's the problem? =20 that seems quite likely. If you run the same VM on a different host = and try to migrate it, does it work? =20 How can I properly read this error log?
Thanks
-- Davide Ferrari Senior Systems Engineer
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>
=20 =20 =20 --=20 Davide Ferrari Senior Systems Engineer _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_EE727373-9FB9-4438-9111-AC303E5B3BE5 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 29 Sep 2016, at 13:59, Davide Ferrari <<a = href=3D"mailto:davide@billymob.com" class=3D"">davide@billymob.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D""><div class=3D"">Hello<br = class=3D""><br class=3D""></div>Today I've the faulty DIMMs replaced, = started the same VM again and did the same migration and this time = worked, so it was 100% due to that.<br class=3D""><br class=3D""></div>The= problem that make me wonder a bit is: if it's the source host with = memory problem the one which blocks the correct migration, a faulty DIMM = will force you to stop the VMs running on that host, because you cannot = simply migrate them away to do the maintenence = tasks=E2=80=A6</div></div></blockquote><div><br class=3D""></div>if you = have a faulty hw you should do that ASAP as you never know where it is = going to affect you. It=E2=80=99s like with disk errors=E2=80=A6you may = think it=E2=80=99s ok when you rarely write to certain places, but once = you try to copy it off the problematic storage and you read every single = byte/location you=E2=80=99re screwed=E2=80=A6</div><div><br = class=3D""></div><div>Thanks,</div><div>michal</div><div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><br class=3D""></div><div class=3D"gmail_extra"><br= class=3D""><div class=3D"gmail_quote">2016-09-29 13:53 GMT+02:00 Tomas = Jelinek <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:tjelinek@redhat.com" target=3D"_blank" = class=3D"">tjelinek@redhat.com</a>></span>:<br class=3D""><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><span class=3D""><br class=3D""> <br class=3D""> ----- Original Message -----<br class=3D""> > From: "Davide Ferrari" <<a href=3D"mailto:davide@billymob.com" = class=3D"">davide@billymob.com</a>><br class=3D""> > To: "users" <<a href=3D"mailto:users@ovirt.org" = class=3D"">users@ovirt.org</a>><br class=3D""> > Sent: Wednesday, September 28, 2016 2:59:59 PM<br class=3D""> > Subject: [ovirt-users] VM pauses/hangs after migration<br class=3D"">= ><br class=3D""> > Hello<br class=3D""> ><br class=3D""> > trying to migrate a VM from one host to another, a big VM with 96GB = of RAM, I<br class=3D""> > found that when the migration completes, the VM goes to a paused = satte and<br class=3D""> > cannot be resumed. The libvirt/qemu log it gives is this:<br = class=3D""> ><br class=3D""> > 2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state = section id<br class=3D""> > 2(ram)<br class=3D""> > 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: = Input/output<br class=3D""> > error<br class=3D""> > 2016-09-28 12:18:15.872+0000: shutting down<br class=3D""> > 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, = package:<br class=3D""> > 13.el7_2.5 (CentOS BuildSystem < <a = href=3D"http://bugs.centos.org/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://bugs.centos.org</a> >,<br class=3D""> </span>> <a href=3D"tel:2016-06-23-14" value=3D"+12016062314" = class=3D"">2016-06-23-14</a>:23:27, <a = href=3D"http://worker1.bsys.centos.org/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">worker1.bsys.centos.org</a> ), qemu = version: 2.3.0<br class=3D""> <div class=3D""><div class=3D"h5">> (qemu-kvm-ev-2.3.0-31.el7.16.<wbr = class=3D"">1)<br class=3D""> > LC_ALL=3DC PATH=3D/usr/local/sbin:/usr/<wbr = class=3D"">local/bin:/usr/sbin:/usr/bin<br class=3D""> > QEMU_AUDIO_DRV=3Dspice /usr/libexec/qemu-kvm -name <a = href=3D"http://front04.billydomain.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">front04.billydomain.com</a> -S<br class=3D"">= > -machine pc-i440fx-rhel7.2.0,accel=3Dkvm,<wbr class=3D"">usb=3Doff = -cpu Haswell-noTSX -m<br class=3D""> > size=3D100663296k,slots=3D16,<wbr class=3D"">maxmem=3D4294967296k = -realtime mlock=3Doff -smp<br class=3D""> > 32,sockets=3D16,cores=3D1,threads=3D<wbr class=3D"">2 -numa = node,nodeid=3D0,cpus=3D0-31,mem=3D<wbr class=3D"">98304<br class=3D""> > -uuid 4511d1c0-6607-418f-ae75-<wbr class=3D"">34f605b2ad68 = -smbios<br class=3D""> > type=3D1,manufacturer=3DoVirt,<wbr class=3D"">product=3DoVirt<br = class=3D""> > Node,version=3D7-2.1511.el7.<wbr = class=3D"">centos.2.10,serial=3D4C4C4544-<wbr = class=3D"">004A-3310-8054-B2C04F474432,<wbr = class=3D"">uuid=3D4511d1c0-6607-418f-ae75-<wbr class=3D"">34f605b2ad68<br = class=3D""> > -no-user-config -nodefaults -chardev<br class=3D""> > socket,id=3Dcharmonitor,path=3D/<wbr = class=3D"">var/lib/libvirt/qemu/<br class=3D""> > <a = href=3D"http://domain-front04.billydomain.com/monitor.sock,server,nowait" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">domain-front04.billydomain.<wbr = class=3D"">com/monitor.sock,server,nowait</a> -mon<br class=3D""> > chardev=3Dcharmonitor,id=3D<wbr class=3D"">monitor,mode=3Dcontrol = -rtc<br class=3D""> > base=3D2016-09-28T14:22:21,<wbr class=3D"">driftfix=3Dslew = -global<br class=3D""> > kvm-pit.lost_tick_policy=3D<wbr class=3D"">discard -no-hpet = -no-shutdown -boot strict=3Don<br class=3D""> > -device piix3-usb-uhci,id=3Dusb,bus=3Dpci.<wbr = class=3D"">0,addr=3D0x1.0x2 -device<br class=3D""> > virtio-scsi-pci,id=3Dscsi0,bus=3D<wbr class=3D"">pci.0,addr=3D0x7 = -device<br class=3D""> > virtio-serial-pci,id=3Dvirtio-<wbr = class=3D"">serial0,max_ports=3D16,bus=3Dpci.<wbr class=3D"">0,addr=3D0x4 = -drive<br class=3D""> > if=3Dnone,id=3Ddrive-ide0-1-0,<wbr class=3D"">readonly=3Don,format=3D= raw -device<br class=3D""> > ide-cd,bus=3Dide.1,unit=3D0,drive=3D<wbr = class=3D"">drive-ide0-1-0,id=3Dide0-1-0 -drive<br class=3D""> > file=3D/rhev/data-center/<wbr class=3D"">00000001-0001-0001-0001-<wbr= class=3D"">0000000003e3/ba2bd397-9222-<wbr = class=3D"">424d-aecc-eb652c0169d9/images/<wbr = class=3D"">b5b49d5c-2378-4639-9469-<wbr = class=3D"">362e37ae7473/24fd0d3c-309b-<wbr = class=3D"">458d-9818-4321023afacf,if=3D<wbr = class=3D"">none,id=3Ddrive-virtio-disk0,<wbr = class=3D"">format=3Dqcow2,serial=3Db5b49d5c-<wbr = class=3D"">2378-4639-9469-362e37ae7473,<wbr = class=3D"">cache=3Dnone,werror=3Dstop,rerror=3D<wbr = class=3D"">stop,aio=3Dthreads<br class=3D""> > -device<br class=3D""> > virtio-blk-pci,scsi=3Doff,bus=3D<wbr = class=3D"">pci.0,addr=3D0x5,drive=3Ddrive-<wbr = class=3D"">virtio-disk0,id=3Dvirtio-disk0,<wbr class=3D"">bootindex=3D1<br= class=3D""> > -drive<br class=3D""> > file=3D/rhev/data-center/<wbr class=3D"">00000001-0001-0001-0001-<wbr= class=3D"">0000000003e3/ba2bd397-9222-<wbr = class=3D"">424d-aecc-eb652c0169d9/images/<wbr = class=3D"">f02ac1ce-52cd-4b81-8b29-<wbr = class=3D"">f8006d0469e0/ff4e49c6-3084-<wbr = class=3D"">4234-80a1-18a67615c527,if=3D<wbr = class=3D"">none,id=3Ddrive-virtio-disk1,<wbr = class=3D"">format=3Draw,serial=3Df02ac1ce-<wbr = class=3D"">52cd-4b81-8b29-f8006d0469e0,<wbr = class=3D"">cache=3Dnone,werror=3Dstop,rerror=3D<wbr = class=3D"">stop,aio=3Dthreads<br class=3D""> > -device<br class=3D""> > virtio-blk-pci,scsi=3Doff,bus=3D<wbr = class=3D"">pci.0,addr=3D0x8,drive=3Ddrive-<wbr = class=3D"">virtio-disk1,id=3Dvirtio-disk1<br class=3D""> > -netdev tap,fd=3D30,id=3Dhostnet0,vhost=3D<wbr = class=3D"">on,vhostfd=3D31 -device<br class=3D""> > virtio-net-pci,netdev=3D<wbr = class=3D"">hostnet0,id=3Dnet0,mac=3D00:1a:4a:<wbr = class=3D"">16:01:56,bus=3Dpci.0,addr=3D0x3<br class=3D""> > -chardev<br class=3D""> > socket,id=3Dcharchannel0,path=3D/<wbr = class=3D"">var/lib/libvirt/qemu/channels/<wbr = class=3D"">4511d1c0-6607-418f-ae75-<wbr = class=3D"">34f605b2ad68.com.redhat.rhevm.<wbr = class=3D"">vdsm,server,nowait<br class=3D""> > -device<br class=3D""> > virtserialport,bus=3Dvirtio-<wbr = class=3D"">serial0.0,nr=3D1,chardev=3D<wbr = class=3D"">charchannel0,id=3Dchannel0,name=3D<wbr = class=3D"">com.redhat.rhevm.vdsm<br class=3D""> > -chardev<br class=3D""> > socket,id=3Dcharchannel1,path=3D/<wbr = class=3D"">var/lib/libvirt/qemu/channels/<wbr = class=3D"">4511d1c0-6607-418f-ae75-<wbr = class=3D"">34f605b2ad68.org.qemu.guest_<wbr = class=3D"">agent.0,server,nowait<br class=3D""> > -device<br class=3D""> > virtserialport,bus=3Dvirtio-<wbr = class=3D"">serial0.0,nr=3D2,chardev=3D<wbr = class=3D"">charchannel1,id=3Dchannel1,name=3D<wbr = class=3D"">org.qemu.guest_agent.0<br class=3D""> > -chardev spicevmc,id=3Dcharchannel2,name=3D<wbr class=3D"">vdagent = -device<br class=3D""> > virtserialport,bus=3Dvirtio-<wbr = class=3D"">serial0.0,nr=3D3,chardev=3D<wbr = class=3D"">charchannel2,id=3Dchannel2,name=3D<wbr = class=3D"">com.redhat.spice.0<br class=3D""> </div></div>> -vnc <a href=3D"http://192.168.10.225:1/" = rel=3D"noreferrer" target=3D"_blank" class=3D"">192.168.10.225:1</a> = ,password -k es -spice<br class=3D""> <div class=3D""><div class=3D"h5">> = tls-port=3D5902,addr=3D192.168.10.<wbr = class=3D"">225,x509-dir=3D/etc/pki/vdsm/<wbr = class=3D"">libvirt-spice,tls-channel=3D<wbr = class=3D"">default,tls-channel=3Dmain,tls-<wbr = class=3D"">channel=3Ddisplay,tls-channel=3D<wbr = class=3D"">inputs,tls-channel=3Dcursor,tls-<wbr = class=3D"">channel=3Dplayback,tls-channel=3D<wbr = class=3D"">record,tls-channel=3Dsmartcard,<wbr = class=3D"">tls-channel=3Dusbredir,seamless-<wbr class=3D"">migration=3Don<= br class=3D""> > -k es -device<br class=3D""> > qxl-vga,id=3Dvideo0,ram_size=3D<wbr = class=3D"">67108864,vram_size=3D8388608,<wbr = class=3D"">vgamem_mb=3D16,bus=3Dpci.0,addr=3D<wbr class=3D"">0x2<br = class=3D""> > -incoming tcp: <a href=3D"http://0.0.0.0:49156/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">0.0.0.0:49156</a> -device<br class=3D""> > virtio-balloon-pci,id=3D<wbr class=3D"">balloon0,bus=3Dpci.0,addr=3D0= x6 -msg timestamp=3Don<br class=3D""> > Domain id=3D5 is tainted: hook-script<br class=3D""> > red_dispatcher_loadvm_<wbr class=3D"">commands:<br class=3D""> > KVM: entry failed, hardware error 0x8<br class=3D""> > RAX=3D00000000ffffffed RBX=3Dffff8817ba00c000 = RCX=3D0100000000000000<br class=3D""> > RDX=3D0000000000000000<br class=3D""> > RSI=3D0000000000000000 RDI=3D0000000000000046 = RBP=3Dffff8817ba00fe98<br class=3D""> > RSP=3Dffff8817ba00fe98<br class=3D""> > R8 =3D0000000000000000 R9 =3D0000000000000000 = R10=3D0000000000000000<br class=3D""> > R11=3D0000000000000000<br class=3D""> > R12=3D0000000000000006 R13=3Dffff8817ba00c000 = R14=3Dffff8817ba00c000<br class=3D""> > R15=3D0000000000000000<br class=3D""> > RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 = A20=3D1 SMM=3D0 HLT=3D0<br class=3D""> > ES =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA]<br = class=3D""> > SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA]<br = class=3D""> > DS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > FS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > GS =3D0000 ffff8817def80000 ffffffff 00000000<br class=3D""> > LDT=3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > TR =3D0040 ffff8817def93b80 00002087 00008b00 DPL=3D0 TSS64-busy<br = class=3D""> > GDT=3D ffff8817def89000 0000007f<br class=3D""> > IDT=3D ffffffffff529000 00000fff<br class=3D""> > CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b725b000 = CR4=3D001406e0<br class=3D""> > DR0=3D0000000000000000 DR1=3D0000000000000000 = DR2=3D0000000000000000<br class=3D""> > DR3=3D0000000000000000<br class=3D""> > DR6=3D00000000ffff0ff0 DR7=3D0000000000000400<br class=3D""> > EFER=3D0000000000000d01<br class=3D""> > Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f<br class=3D""> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 = 00 55 49<br class=3D""> > 89 ca<br class=3D""> > KVM: entry failed, hardware error 0x8<br class=3D""> > RAX=3D00000000ffffffed RBX=3Dffff8817ba008000 = RCX=3D0100000000000000<br class=3D""> > RDX=3D0000000000000000<br class=3D""> > RSI=3D0000000000000000 RDI=3D0000000000000046 = RBP=3Dffff8817ba00be98<br class=3D""> > RSP=3Dffff8817ba00be98<br class=3D""> > R8 =3D0000000000000000 R9 =3D0000000000000000 = R10=3D0000000000000000<br class=3D""> > R11=3D0000000000000000<br class=3D""> > R12=3D0000000000000005 R13=3Dffff8817ba008000 = R14=3Dffff8817ba008000<br class=3D""> > R15=3D0000000000000000<br class=3D""> > RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 = A20=3D1 SMM=3D0 HLT=3D0<br class=3D""> > ES =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA]<br = class=3D""> > SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA]<br = class=3D""> > DS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > FS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > GS =3D0000 ffff8817def40000 ffffffff 00000000<br class=3D""> > LDT=3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > TR =3D0040 ffff8817def53b80 00002087 00008b00 DPL=3D0 TSS64-busy<br = class=3D""> > GDT=3D ffff8817def49000 0000007f<br class=3D""> > IDT=3D ffffffffff529000 00000fff<br class=3D""> > CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b3c9a000 = CR4=3D001406e0<br class=3D""> > DR0=3D0000000000000000 DR1=3D0000000000000000 = DR2=3D0000000000000000<br class=3D""> > DR3=3D0000000000000000<br class=3D""> > DR6=3D00000000ffff0ff0 DR7=3D0000000000000400<br class=3D""> > EFER=3D0000000000000d01<br class=3D""> > Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f<br class=3D""> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 = 00 55 49<br class=3D""> > 89 ca<br class=3D""> > KVM: entry failed, hardware error 0x80000021<br class=3D""> ><br class=3D""> > If you're running a guest on an Intel machine without unrestricted = mode<br class=3D""> > support, the failure can be most likely due to the guest entering = an invalid<br class=3D""> > state for Intel VT. For example, the guest maybe running in big = real mode<br class=3D""> > which is not supported on less recent Intel processors.<br = class=3D""> ><br class=3D""> > EAX=3Dffffffed EBX=3Dba020000 ECX=3D00000000 EDX=3D00000000<br = class=3D""> > ESI=3D00000000 EDI=3D00000046 EBP=3Dba023e98 ESP=3Dba023e98<br = class=3D""> > EIP=3D81058e96 EFL=3D00000002 [-------] CPL=3D0 II=3D0 A20=3D1 = SMM=3D0 HLT=3D0<br class=3D""> > ES =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > CS =3Df000 ffff0000 0000ffff 00009b00 DPL=3D0 CS16 [-RA]<br = class=3D""> > SS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > DS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > FS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > GS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > LDT=3D0000 00000000 0000ffff 00008200 DPL=3D0 LDT<br class=3D""> > TR =3D0000 00000000 0000ffff 00008b00 DPL=3D0 TSS64-busy<br = class=3D""> > GDT=3D 0000000000000000 0000ffff<br class=3D""> > IDT=3D 0000000000000000 0000ffff<br class=3D""> > CR0=3D80050033 CR2=3D00007fd826ac20a0 CR3=3D000000003516c000 = CR4=3D00140060<br class=3D""> > DR0=3D0000000000000000 DR1=3D0000000000000000 = DR2=3D0000000000000000<br class=3D""> > DR3=3D0000000000000000<br class=3D""> > DR6=3D00000000ffff0ff0 DR7=3D0000000000000400<br class=3D""> > EFER=3D0000000000000d01<br class=3D""> > Code=3D?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = <??> ?? ??<br class=3D""> > ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = ?? ?? ??<br class=3D""> > ?? ??<br class=3D""> ><br class=3D""> ><br class=3D""> > Searching for errors like this I found some bug report about kernel = issues<br class=3D""> > but I don't think it's the case, other VMs spawned from the same = image<br class=3D""> > migrate without any issue. I have toi say that the original host = running the<br class=3D""> > VM has some RAM problem (ECC multibit fault in one DIMM). Maybe = that's the<br class=3D""> > problem?<br class=3D""> <br class=3D""> </div></div>that seems quite likely. If you run the same VM on a = different host and try to migrate<br class=3D""> it, does it work?<br class=3D""> <span class=3D""><br class=3D""> > How can I properly read this error log?<br class=3D""> ><br class=3D""> > Thanks<br class=3D""> ><br class=3D""> > --<br class=3D""> > Davide Ferrari<br class=3D""> > Senior Systems Engineer<br class=3D""> ><br class=3D""> </span>> ______________________________<wbr = class=3D"">_________________<br class=3D""> > Users mailing list<br class=3D""> > <a href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D""> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/<wbr = class=3D"">mailman/listinfo/users</a><br class=3D""> ><br class=3D""> </blockquote></div><br class=3D""><br clear=3D"all" class=3D""><br = class=3D"">-- <br class=3D""><div class=3D"gmail_signature" = data-smartmail=3D"gmail_signature"><div dir=3D"ltr" class=3D""><div = class=3D"">Davide Ferrari<br class=3D""></div>Senior Systems Engineer<br = class=3D""></div></div> </div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_EE727373-9FB9-4438-9111-AC303E5B3BE5--

Ok, what I said is not true :( I didn't try to migrate again to the same host that gave the initial problem, and the problem is still there. The destination host has no HW problem (at least nothing that the system reports, maybe I should try with an extensive memtest86) and the source problem now has no memory issues neither. So, my question is now: how can I debug this problem? The only difference that this host (vmhost01) has is that it was the first host installed in my self-hosted engine installation. But I have already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from 4.0.3. Any idea? 2016-09-29 13:59 GMT+02:00 Davide Ferrari <davide@billymob.com>:
Hello
Today I've the faulty DIMMs replaced, started the same VM again and did the same migration and this time worked, so it was 100% due to that.
The problem that make me wonder a bit is: if it's the source host with memory problem the one which blocks the correct migration, a faulty DIMM will force you to stop the VMs running on that host, because you cannot simply migrate them away to do the maintenence tasks...
2016-09-29 13:53 GMT+02:00 Tomas Jelinek <tjelinek@redhat.com>:
From: "Davide Ferrari" <davide@billymob.com> To: "users" <users@ovirt.org> Sent: Wednesday, September 28, 2016 2:59:59 PM Subject: [ovirt-users] VM pauses/hangs after migration
Hello
trying to migrate a VM from one host to another, a big VM with 96GB of RAM, I found that when the migration completes, the VM goes to a paused satte and cannot be resumed. The libvirt/qemu log it gives is this:
2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state section id 2(ram) 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: Input/output error 2016-09-28 12:18:15.872+0000: shutting down 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17,
13.el7_2.5 (CentOS BuildSystem < http://bugs.centos.org >, 2016-06-23-14:23:27, worker1.bsys.centos.org ), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7.16.1) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name front04.billydomain.com -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX -m size=100663296k,slots=16,maxmem=4294967296k -realtime mlock=off -smp 32,sockets=16,cores=1,threads=2 -numa node,nodeid=0,cpus=0-31,mem=98304 -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=4C4C4544-004A- 3310-8054-B2C04F474432,uuid=4511d1c0-6607-418f-ae75-34f605b2ad68 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ domain-front04.billydomain.com/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-09-28T14:22:21,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ ba2bd397-9222-424d-aecc-eb652c0169d9/images/b5b49d5c- 2378-4639-9469-362e37ae7473/24fd0d3c-309b-458d-9818- 4321023afacf,if=none,id=drive-virtio-disk0,format=qcow2, serial=b5b49d5c-2378-4639-9469-362e37ae7473,cache=none, werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virti o-disk0,id=virtio-disk0,bootindex=1 -drive file=/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce- 52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1- 18a67615c527,if=none,id=drive-virtio-disk1,format=raw, serial=f02ac1ce-52cd-4b81-8b29-f8006d0469e0,cache=none, werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virti o-disk1,id=virtio-disk1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01: 56,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4 511d1c0-6607-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel 0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4 511d1c0-6607-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel 1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel 2,id=channel2,name=com.redhat.spice.0 -vnc 192.168.10.225:1 ,password -k es -spice tls-port=5902,addr=192.168.10.225,x509-dir=/etc/pki/vdsm/lib virt-spice,tls-channel=default,tls-channel=main,tls-channel= display,tls-channel=inputs,tls-channel=cursor,tls-channel =playback,tls-channel=record,tls-channel=smartcard,tls- channel=usbredir,seamless-migration=on -k es -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vgamem _mb=16,bus=pci.0,addr=0x2 -incoming tcp: 0.0.0.0:49156 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on Domain id=5 is tainted: hook-script red_dispatcher_loadvm_commands: KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba00c000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00fe98 RSP=ffff8817ba00fe98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000006 R13=ffff8817ba00c000 R14=ffff8817ba00c000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def80000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def93b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def89000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b725b000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x8 RAX=00000000ffffffed RBX=ffff8817ba008000 RCX=0100000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00be98 RSP=ffff8817ba00be98 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000005 R13=ffff8817ba008000 R14=ffff8817ba008000 R15=0000000000000000 RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 ffffffff 00000000 FS =0000 0000000000000000 ffffffff 00000000 GS =0000 ffff8817def40000 ffffffff 00000000 LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 ffff8817def53b80 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff8817def49000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000017b3c9a000 CR4=001406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49 89 ca KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors.
EAX=ffffffed EBX=ba020000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000046 EBP=ba023e98 ESP=ba023e98 EIP=81058e96 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] CS =f000 ffff0000 0000ffff 00009b00 DPL=0 CS16 [-RA] SS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] DS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] FS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] GS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 0000000000000000 0000ffff IDT= 0000000000000000 0000ffff CR0=80050033 CR2=00007fd826ac20a0 CR3=000000003516c000 CR4=00140060 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Searching for errors like this I found some bug report about kernel issues but I don't think it's the case, other VMs spawned from the same image migrate without any issue. I have toi say that the original host running the VM has some RAM problem (ECC multibit fault in one DIMM). Maybe that's
----- Original Message ----- package: the
problem?
that seems quite likely. If you run the same VM on a different host and try to migrate it, does it work?
How can I properly read this error log?
Thanks
-- Davide Ferrari Senior Systems Engineer
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Davide Ferrari Senior Systems Engineer
-- Davide Ferrari Senior Systems Engineer

On 29 Sep 2016, at 16:23, Davide Ferrari <davide@billymob.com> wrote: =20 Ok, what I said is not true :( I didn't try to migrate again to the = same host that gave the initial problem, and the problem is still there. = The destination host has no HW problem (at least nothing that the system = reports, maybe I should try with an extensive memtest86) and the source =
--Apple-Mail=_8A13E691-C43E-4D9A-9A61-E680954C2884 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 problem now has no memory issues neither. So, my question is now: how = can I debug this problem? that is a very low level error really pointing at HW issues. It may or = may not be detected by memtest=E2=80=A6but I would give it a try
The only difference that this host (vmhost01) has is that it was the = first host installed in my self-hosted engine installation. But I have = already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 = from 4.0.3.
=20 Any idea? =20 2016-09-29 13:59 GMT+02:00 Davide Ferrari <davide@billymob.com = <mailto:davide@billymob.com>>: Hello =20 Today I've the faulty DIMMs replaced, started the same VM again and = did the same migration and this time worked, so it was 100% due to that. =20 The problem that make me wonder a bit is: if it's the source host with = memory problem the one which blocks the correct migration, a faulty DIMM = will force you to stop the VMs running on that host, because you cannot = simply migrate them away to do the maintenence tasks... =20 =20 2016-09-29 13:53 GMT+02:00 Tomas Jelinek <tjelinek@redhat.com = <mailto:tjelinek@redhat.com>>: =20 =20 ----- Original Message -----
From: "Davide Ferrari" <davide@billymob.com = <mailto:davide@billymob.com>> To: "users" <users@ovirt.org <mailto:users@ovirt.org>> Sent: Wednesday, September 28, 2016 2:59:59 PM Subject: [ovirt-users] VM pauses/hangs after migration
Hello
trying to migrate a VM from one host to another, a big VM with 96GB = of RAM, I found that when the migration completes, the VM goes to a paused = satte and cannot be resumed. The libvirt/qemu log it gives is this:
2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state =
2(ram) 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: = Input/output error 2016-09-28 12:18:15.872+0000: shutting down 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, =
does it happen only for the big 96GB VM? The others which you said are = working, are they all small? Might be worth trying other system stability tests, playing with = safer/slower settings in BIOS, use lower CPU cluster, etc section id package:
13.el7_2.5 (CentOS BuildSystem < http://bugs.centos.org = <http://bugs.centos.org/> >, 2016-06-23-14 <tel:2016-06-23-14>:23:27, worker1.bsys.centos.org = <http://worker1.bsys.centos.org/> ), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7.16.1) LC_ALL=3DC PATH=3D/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=3Dspice /usr/libexec/qemu-kvm -name = front04.billydomain.com <http://front04.billydomain.com/> -S -machine pc-i440fx-rhel7.2.0,accel=3Dkvm,usb=3Doff -cpu = Haswell-noTSX -m size=3D100663296k,slots=3D16,maxmem=3D4294967296k -realtime = mlock=3Doff -smp 32,sockets=3D16,cores=3D1,threads=3D2 -numa = node,nodeid=3D0,cpus=3D0-31,mem=3D98304 -uuid 4511d1c0-6607-418f-ae75-34f605b2ad68 -smbios type=3D1,manufacturer=3DoVirt,product=3DoVirt = Node,version=3D7-2.1511.el7.centos.2.10,serial=3D4C4C4544-004A-3310-8054-B= 2C04F474432,uuid=3D4511d1c0-6607-418f-ae75-34f605b2ad68 -no-user-config -nodefaults -chardev socket,id=3Dcharmonitor,path=3D/var/lib/libvirt/qemu/ domain-front04.billydomain.com/monitor.sock,server,nowait = <http://domain-front04.billydomain.com/monitor.sock,server,nowait> -mon chardev=3Dcharmonitor,id=3Dmonitor,mode=3Dcontrol -rtc base=3D2016-09-28T14:22:21,driftfix=3Dslew -global kvm-pit.lost_tick_policy=3Ddiscard -no-hpet -no-shutdown -boot = strict=3Don -device piix3-usb-uhci,id=3Dusb,bus=3Dpci.0,addr=3D0x1.0x2 -device virtio-scsi-pci,id=3Dscsi0,bus=3Dpci.0,addr=3D0x7 -device virtio-serial-pci,id=3Dvirtio-serial0,max_ports=3D16,bus=3Dpci.0,addr=3D= 0x4 -drive if=3Dnone,id=3Ddrive-ide0-1-0,readonly=3Don,format=3Draw -device ide-cd,bus=3Dide.1,unit=3D0,drive=3Ddrive-ide0-1-0,id=3Dide0-1-0 = -drive = file=3D/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-922= 2-424d-aecc-eb652c0169d9/images/b5b49d5c-2378-4639-9469-362e37ae7473/24fd0= d3c-309b-458d-9818-4321023afacf,if=3Dnone,id=3Ddrive-virtio-disk0,format=3D= qcow2,serial=3Db5b49d5c-2378-4639-9469-362e37ae7473,cache=3Dnone,werror=3D= stop,rerror=3Dstop,aio=3Dthreads -device = virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0x5,drive=3Ddrive-virtio-disk= 0,id=3Dvirtio-disk0,bootindex=3D1 -drive = file=3D/rhev/data-center/00000001-0001-0001-0001-0000000003e3/ba2bd397-922= 2-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e4= 9c6-3084-4234-80a1-18a67615c527,if=3Dnone,id=3Ddrive-virtio-disk1,format=3D= raw,serial=3Df02ac1ce-52cd-4b81-8b29-f8006d0469e0,cache=3Dnone,werror=3Dst= op,rerror=3Dstop,aio=3Dthreads -device = virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0x8,drive=3Ddrive-virtio-disk= 1,id=3Dvirtio-disk1 -netdev tap,fd=3D30,id=3Dhostnet0,vhost=3Don,vhostfd=3D31 -device = virtio-net-pci,netdev=3Dhostnet0,id=3Dnet0,mac=3D00:1a:4a:16:01:56,bus=3Dp= ci.0,addr=3D0x3 -chardev = socket,id=3Dcharchannel0,path=3D/var/lib/libvirt/qemu/channels/4511d1c0-66= 07-418f-ae75-34f605b2ad68.com.redhat.rhevm.vdsm,server,nowait -device = virtserialport,bus=3Dvirtio-serial0.0,nr=3D1,chardev=3Dcharchannel0,id=3Dc= hannel0,name=3Dcom.redhat.rhevm.vdsm -chardev = socket,id=3Dcharchannel1,path=3D/var/lib/libvirt/qemu/channels/4511d1c0-66= 07-418f-ae75-34f605b2ad68.org.qemu.guest_agent.0,server,nowait -device = virtserialport,bus=3Dvirtio-serial0.0,nr=3D2,chardev=3Dcharchannel1,id=3Dc= hannel1,name=3Dorg.qemu.guest_agent.0 -chardev spicevmc,id=3Dcharchannel2,name=3Dvdagent -device = virtserialport,bus=3Dvirtio-serial0.0,nr=3D3,chardev=3Dcharchannel2,id=3Dc= hannel2,name=3Dcom.redhat.spice.0 -vnc 192.168.10.225:1 <http://192.168.10.225:1/> ,password -k es = -spice = tls-port=3D5902,addr=3D192.168.10.225,x509-dir=3D/etc/pki/vdsm/libvirt-spi= ce,tls-channel=3Ddefault,tls-channel=3Dmain,tls-channel=3Ddisplay,tls-chan= nel=3Dinputs,tls-channel=3Dcursor,tls-channel=3Dplayback,tls-channel=3Drec= ord,tls-channel=3Dsmartcard,tls-channel=3Dusbredir,seamless-migration=3Don=
-k es -device = qxl-vga,id=3Dvideo0,ram_size=3D67108864,vram_size=3D8388608,vgamem_mb=3D16= ,bus=3Dpci.0,addr=3D0x2 -incoming tcp: 0.0.0.0:49156 <http://0.0.0.0:49156/> -device virtio-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D0x6 -msg = timestamp=3Don Domain id=3D5 is tainted: hook-script red_dispatcher_loadvm_commands: KVM: entry failed, hardware error 0x8 RAX=3D00000000ffffffed RBX=3Dffff8817ba00c000 RCX=3D0100000000000000 RDX=3D0000000000000000 RSI=3D0000000000000000 RDI=3D0000000000000046 RBP=3Dffff8817ba00fe98 RSP=3Dffff8817ba00fe98 R8 =3D0000000000000000 R9 =3D0000000000000000 R10=3D0000000000000000 R11=3D0000000000000000 R12=3D0000000000000006 R13=3Dffff8817ba00c000 R14=3Dffff8817ba00c000 R15=3D0000000000000000 RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 A20=3D1= SMM=3D0 HLT=3D0 ES =3D0000 0000000000000000 ffffffff 00000000 CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA] SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA] DS =3D0000 0000000000000000 ffffffff 00000000 FS =3D0000 0000000000000000 ffffffff 00000000 GS =3D0000 ffff8817def80000 ffffffff 00000000 LDT=3D0000 0000000000000000 ffffffff 00000000 TR =3D0040 ffff8817def93b80 00002087 00008b00 DPL=3D0 TSS64-busy GDT=3D ffff8817def89000 0000007f IDT=3D ffffffffff529000 00000fff CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b725b000 = CR4=3D001406e0 DR0=3D0000000000000000 DR1=3D0000000000000000 DR2=3D0000000000000000 DR3=3D0000000000000000 DR6=3D00000000ffff0ff0 DR7=3D0000000000000400 EFER=3D0000000000000d01 Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 = 55 49 89 ca KVM: entry failed, hardware error 0x8 RAX=3D00000000ffffffed RBX=3Dffff8817ba008000 RCX=3D0100000000000000 RDX=3D0000000000000000 RSI=3D0000000000000000 RDI=3D0000000000000046 RBP=3Dffff8817ba00be98 RSP=3Dffff8817ba00be98 R8 =3D0000000000000000 R9 =3D0000000000000000 R10=3D0000000000000000 R11=3D0000000000000000 R12=3D0000000000000005 R13=3Dffff8817ba008000 R14=3Dffff8817ba008000 R15=3D0000000000000000 RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 A20=3D1= SMM=3D0 HLT=3D0 ES =3D0000 0000000000000000 ffffffff 00000000 CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA] SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA] DS =3D0000 0000000000000000 ffffffff 00000000 FS =3D0000 0000000000000000 ffffffff 00000000 GS =3D0000 ffff8817def40000 ffffffff 00000000 LDT=3D0000 0000000000000000 ffffffff 00000000 TR =3D0040 ffff8817def53b80 00002087 00008b00 DPL=3D0 TSS64-busy GDT=3D ffff8817def49000 0000007f IDT=3D ffffffffff529000 00000fff CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b3c9a000 = CR4=3D001406e0 DR0=3D0000000000000000 DR1=3D0000000000000000 DR2=3D0000000000000000 DR3=3D0000000000000000 DR6=3D00000000ffff0ff0 DR7=3D0000000000000400 EFER=3D0000000000000d01 Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 = 55 49 89 ca KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted = mode support, the failure can be most likely due to the guest entering an = invalid state for Intel VT. For example, the guest maybe running in big real = mode which is not supported on less recent Intel processors.
EAX=3Dffffffed EBX=3Dba020000 ECX=3D00000000 EDX=3D00000000 ESI=3D00000000 EDI=3D00000046 EBP=3Dba023e98 ESP=3Dba023e98 EIP=3D81058e96 EFL=3D00000002 [-------] CPL=3D0 II=3D0 A20=3D1 SMM=3D0= HLT=3D0 ES =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] CS =3Df000 ffff0000 0000ffff 00009b00 DPL=3D0 CS16 [-RA] SS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] DS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] FS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] GS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA] LDT=3D0000 00000000 0000ffff 00008200 DPL=3D0 LDT TR =3D0000 00000000 0000ffff 00008b00 DPL=3D0 TSS64-busy GDT=3D 0000000000000000 0000ffff IDT=3D 0000000000000000 0000ffff CR0=3D80050033 CR2=3D00007fd826ac20a0 CR3=3D000000003516c000 = CR4=3D00140060 DR0=3D0000000000000000 DR1=3D0000000000000000 DR2=3D0000000000000000 DR3=3D0000000000000000 DR6=3D00000000ffff0ff0 DR7=3D0000000000000400 EFER=3D0000000000000d01 Code=3D?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = ?? ?? ?? ??
Searching for errors like this I found some bug report about kernel = issues but I don't think it's the case, other VMs spawned from the same = image migrate without any issue. I have toi say that the original host = running the VM has some RAM problem (ECC multibit fault in one DIMM). Maybe = that's the problem? =20 that seems quite likely. If you run the same VM on a different host = and try to migrate it, does it work? =20 How can I properly read this error log?
Thanks
-- Davide Ferrari Senior Systems Engineer
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>
=20 =20 =20 --=20 Davide Ferrari Senior Systems Engineer =20 =20 =20 --=20 Davide Ferrari Senior Systems Engineer _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_8A13E691-C43E-4D9A-9A61-E680954C2884 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 29 Sep 2016, at 16:23, Davide Ferrari <<a = href=3D"mailto:davide@billymob.com" class=3D"">davide@billymob.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"">Ok, what I said is not true :( I = didn't try to migrate again to the same host that gave the initial = problem, and the problem is still there. The destination host has no HW = problem (at least nothing that the system reports, maybe I should try = with an extensive memtest86) and the source problem now has no memory = issues neither. So, my question is now: how can I debug this problem? = </div></div></div></blockquote><div><br class=3D""></div>that is a very = low level error really pointing at HW issues. It may or may not be = detected by memtest=E2=80=A6but I would give it a try</div><div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"">The only difference that this = host (vmhost01) has is that it was the first host installed in my = self-hosted engine installation. But I have already reinstalled it from = GUI and menawhile I've upgraded to 4.0.4 from 4.0.3.<br = class=3D""></div></div></div></blockquote><div><br = class=3D""></div><div>does it happen only for the big 96GB VM? The = others which you said are working, are they all small?</div>Might be = worth trying other system stability tests, playing with safer/slower = settings in BIOS, use lower CPU cluster, etc</div><div><br = class=3D""></div><div><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D""><br = class=3D""></div>Any idea?<br class=3D""></div><div = class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">2016-09-29= 13:59 GMT+02:00 Davide Ferrari <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:davide@billymob.com" target=3D"_blank" = class=3D"">davide@billymob.com</a>></span>:<br class=3D""><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><div dir=3D"ltr" class=3D""><div class=3D""><div = class=3D"">Hello<br class=3D""><br class=3D""></div>Today I've the = faulty DIMMs replaced, started the same VM again and did the same = migration and this time worked, so it was 100% due to that.<br = class=3D""><br class=3D""></div>The problem that make me wonder a bit = is: if it's the source host with memory problem the one which blocks the = correct migration, a faulty DIMM will force you to stop the VMs running = on that host, because you cannot simply migrate them away to do the = maintenence tasks...<br class=3D""><br class=3D""></div><div = class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote">2016-09-29 13:53 GMT+02:00 Tomas = Jelinek <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:tjelinek@redhat.com" target=3D"_blank" = class=3D"">tjelinek@redhat.com</a>></span>:<br class=3D""><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><span class=3D""><br class=3D""> <br class=3D""> ----- Original Message -----<br class=3D""> > From: "Davide Ferrari" <<a href=3D"mailto:davide@billymob.com" = target=3D"_blank" class=3D"">davide@billymob.com</a>><br class=3D""> > To: "users" <<a href=3D"mailto:users@ovirt.org" target=3D"_blank" = class=3D"">users@ovirt.org</a>><br class=3D""> > Sent: Wednesday, September 28, 2016 2:59:59 PM<br class=3D""> > Subject: [ovirt-users] VM pauses/hangs after migration<br class=3D"">= ><br class=3D""> > Hello<br class=3D""> ><br class=3D""> > trying to migrate a VM from one host to another, a big VM with 96GB = of RAM, I<br class=3D""> > found that when the migration completes, the VM goes to a paused = satte and<br class=3D""> > cannot be resumed. The libvirt/qemu log it gives is this:<br = class=3D""> ><br class=3D""> > 2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state = section id<br class=3D""> > 2(ram)<br class=3D""> > 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: = Input/output<br class=3D""> > error<br class=3D""> > 2016-09-28 12:18:15.872+0000: shutting down<br class=3D""> > 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, = package:<br class=3D""> > 13.el7_2.5 (CentOS BuildSystem < <a = href=3D"http://bugs.centos.org/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://bugs.centos.org</a> >,<br class=3D""> </span>> <a href=3D"tel:2016-06-23-14" value=3D"+12016062314" = target=3D"_blank" class=3D"">2016-06-23-14</a>:23:27, <a = href=3D"http://worker1.bsys.centos.org/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">worker1.bsys.centos.org</a> ), qemu = version: 2.3.0<br class=3D""> <div class=3D""><div class=3D"">> (qemu-kvm-ev-2.3.0-31.el7.16.1<wbr = class=3D"">)<br class=3D""> > LC_ALL=3DC PATH=3D/usr/local/sbin:/usr/loca<wbr = class=3D"">l/bin:/usr/sbin:/usr/bin<br class=3D""> > QEMU_AUDIO_DRV=3Dspice /usr/libexec/qemu-kvm -name <a = href=3D"http://front04.billydomain.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">front04.billydomain.com</a> -S<br class=3D"">= > -machine pc-i440fx-rhel7.2.0,accel=3Dkvm,<wbr class=3D"">usb=3Doff = -cpu Haswell-noTSX -m<br class=3D""> > size=3D100663296k,slots=3D16,maxme<wbr class=3D"">m=3D4294967296k = -realtime mlock=3Doff -smp<br class=3D""> > 32,sockets=3D16,cores=3D1,threads=3D<wbr class=3D"">2 -numa = node,nodeid=3D0,cpus=3D0-31,mem=3D98<wbr class=3D"">304<br class=3D""> > -uuid 4511d1c0-6607-418f-ae75-34f605<wbr class=3D"">b2ad68 = -smbios<br class=3D""> > type=3D1,manufacturer=3DoVirt,prod<wbr class=3D"">uct=3DoVirt<br = class=3D""> > Node,version=3D7-2.1511.el7.cent<wbr = class=3D"">os.2.10,serial=3D4C4C4544-004A-<wbr = class=3D"">3310-8054-B2C04F474432,uuid=3D<wbr = class=3D"">4511d1c0-6607-418f-ae75-34f605<wbr class=3D"">b2ad68<br = class=3D""> > -no-user-config -nodefaults -chardev<br class=3D""> > socket,id=3Dcharmonitor,path=3D/va<wbr = class=3D"">r/lib/libvirt/qemu/<br class=3D""> > <a = href=3D"http://domain-front04.billydomain.com/monitor.sock,server,nowait" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">domain-front04.billydomain.com<wbr = class=3D"">/monitor.sock,server,nowait</a> -mon<br class=3D""> > chardev=3Dcharmonitor,id=3Dmonitor<wbr class=3D"">,mode=3Dcontrol = -rtc<br class=3D""> > base=3D2016-09-28T14:22:21,drift<wbr class=3D"">fix=3Dslew = -global<br class=3D""> > kvm-pit.lost_tick_policy=3Ddisca<wbr class=3D"">rd -no-hpet = -no-shutdown -boot strict=3Don<br class=3D""> > -device piix3-usb-uhci,id=3Dusb,bus=3Dpci.<wbr = class=3D"">0,addr=3D0x1.0x2 -device<br class=3D""> > virtio-scsi-pci,id=3Dscsi0,bus=3Dp<wbr class=3D"">ci.0,addr=3D0x7 = -device<br class=3D""> > virtio-serial-pci,id=3Dvirtio-se<wbr = class=3D"">rial0,max_ports=3D16,bus=3Dpci.0,<wbr class=3D"">addr=3D0x4 = -drive<br class=3D""> > if=3Dnone,id=3Ddrive-ide0-1-0,read<wbr class=3D"">only=3Don,format=3D= raw -device<br class=3D""> > ide-cd,bus=3Dide.1,unit=3D0,drive=3D<wbr = class=3D"">drive-ide0-1-0,id=3Dide0-1-0 -drive<br class=3D""> > file=3D/rhev/data-center/0000000<wbr = class=3D"">1-0001-0001-0001-0000000003e3/<wbr = class=3D"">ba2bd397-9222-424d-aecc-<wbr = class=3D"">eb652c0169d9/images/b5b49d5c-<wbr = class=3D"">2378-4639-9469-362e37ae7473/<wbr = class=3D"">24fd0d3c-309b-458d-9818-<wbr = class=3D"">4321023afacf,if=3Dnone,id=3Ddrive-<wbr = class=3D"">virtio-disk0,format=3Dqcow2,<wbr = class=3D"">serial=3Db5b49d5c-2378-4639-<wbr = class=3D"">9469-362e37ae7473,cache=3Dnone,<wbr = class=3D"">werror=3Dstop,rerror=3Dstop,aio=3D<wbr class=3D"">threads<br = class=3D""> > -device<br class=3D""> > virtio-blk-pci,scsi=3Doff,bus=3Dpc<wbr = class=3D"">i.0,addr=3D0x5,drive=3Ddrive-virti<wbr = class=3D"">o-disk0,id=3Dvirtio-disk0,bootin<wbr class=3D"">dex=3D1<br = class=3D""> > -drive<br class=3D""> > file=3D/rhev/data-center/0000000<wbr = class=3D"">1-0001-0001-0001-0000000003e3/<wbr = class=3D"">ba2bd397-9222-424d-aecc-<wbr = class=3D"">eb652c0169d9/images/f02ac1ce-<wbr = class=3D"">52cd-4b81-8b29-f8006d0469e0/<wbr = class=3D"">ff4e49c6-3084-4234-80a1-<wbr = class=3D"">18a67615c527,if=3Dnone,id=3Ddrive-<wbr = class=3D"">virtio-disk1,format=3Draw,<wbr = class=3D"">serial=3Df02ac1ce-52cd-4b81-<wbr = class=3D"">8b29-f8006d0469e0,cache=3Dnone,<wbr = class=3D"">werror=3Dstop,rerror=3Dstop,aio=3D<wbr class=3D"">threads<br = class=3D""> > -device<br class=3D""> > virtio-blk-pci,scsi=3Doff,bus=3Dpc<wbr = class=3D"">i.0,addr=3D0x8,drive=3Ddrive-virti<wbr = class=3D"">o-disk1,id=3Dvirtio-disk1<br class=3D""> > -netdev tap,fd=3D30,id=3Dhostnet0,vhost=3Don<wbr = class=3D"">,vhostfd=3D31 -device<br class=3D""> > virtio-net-pci,netdev=3Dhostnet0<wbr = class=3D"">,id=3Dnet0,mac=3D00:1a:4a:16:01:<wbr = class=3D"">56,bus=3Dpci.0,addr=3D0x3<br class=3D""> > -chardev<br class=3D""> > socket,id=3Dcharchannel0,path=3D/v<wbr = class=3D"">ar/lib/libvirt/qemu/channels/4<wbr = class=3D"">511d1c0-6607-418f-ae75-34f605b<wbr = class=3D"">2ad68.com.redhat.rhevm.vdsm,<wbr class=3D"">server,nowait<br = class=3D""> > -device<br class=3D""> > virtserialport,bus=3Dvirtio-seri<wbr = class=3D"">al0.0,nr=3D1,chardev=3Dcharchannel<wbr = class=3D"">0,id=3Dchannel0,name=3Dcom.redhat.<wbr class=3D"">rhevm.vdsm<br= class=3D""> > -chardev<br class=3D""> > socket,id=3Dcharchannel1,path=3D/v<wbr = class=3D"">ar/lib/libvirt/qemu/channels/4<wbr = class=3D"">511d1c0-6607-418f-ae75-34f605b<wbr = class=3D"">2ad68.org.qemu.guest_agent.0,<wbr class=3D"">server,nowait<br = class=3D""> > -device<br class=3D""> > virtserialport,bus=3Dvirtio-seri<wbr = class=3D"">al0.0,nr=3D2,chardev=3Dcharchannel<wbr = class=3D"">1,id=3Dchannel1,name=3Dorg.qemu.<wbr = class=3D"">guest_agent.0<br class=3D""> > -chardev spicevmc,id=3Dcharchannel2,name=3D<wbr class=3D"">vdagent = -device<br class=3D""> > virtserialport,bus=3Dvirtio-seri<wbr = class=3D"">al0.0,nr=3D3,chardev=3Dcharchannel<wbr = class=3D"">2,id=3Dchannel2,name=3Dcom.redhat.<wbr class=3D"">spice.0<br = class=3D""> </div></div>> -vnc <a href=3D"http://192.168.10.225:1/" = rel=3D"noreferrer" target=3D"_blank" class=3D"">192.168.10.225:1</a> = ,password -k es -spice<br class=3D""> <div class=3D""><div class=3D"">> tls-port=3D5902,addr=3D192.168.10.<wb= r class=3D"">225,x509-dir=3D/etc/pki/vdsm/lib<wbr = class=3D"">virt-spice,tls-channel=3Ddefault<wbr = class=3D"">,tls-channel=3Dmain,tls-channel=3D<wbr = class=3D"">display,tls-channel=3Dinputs,<wbr = class=3D"">tls-channel=3Dcursor,tls-channel<wbr = class=3D"">=3Dplayback,tls-channel=3Drecord,<wbr = class=3D"">tls-channel=3Dsmartcard,tls-<wbr = class=3D"">channel=3Dusbredir,seamless-migr<wbr class=3D"">ation=3Don<br = class=3D""> > -k es -device<br class=3D""> > qxl-vga,id=3Dvideo0,ram_size=3D671<wbr = class=3D"">08864,vram_size=3D8388608,vgamem<wbr = class=3D"">_mb=3D16,bus=3Dpci.0,addr=3D0x2<br class=3D""> > -incoming tcp: <a href=3D"http://0.0.0.0:49156/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">0.0.0.0:49156</a> -device<br class=3D""> > virtio-balloon-pci,id=3Dballoon0<wbr class=3D"">,bus=3Dpci.0,addr=3D0= x6 -msg timestamp=3Don<br class=3D""> > Domain id=3D5 is tainted: hook-script<br class=3D""> > red_dispatcher_loadvm_commands<wbr class=3D"">:<br class=3D""> > KVM: entry failed, hardware error 0x8<br class=3D""> > RAX=3D00000000ffffffed RBX=3Dffff8817ba00c000 = RCX=3D0100000000000000<br class=3D""> > RDX=3D0000000000000000<br class=3D""> > RSI=3D0000000000000000 RDI=3D0000000000000046 = RBP=3Dffff8817ba00fe98<br class=3D""> > RSP=3Dffff8817ba00fe98<br class=3D""> > R8 =3D0000000000000000 R9 =3D0000000000000000 = R10=3D0000000000000000<br class=3D""> > R11=3D0000000000000000<br class=3D""> > R12=3D0000000000000006 R13=3Dffff8817ba00c000 = R14=3Dffff8817ba00c000<br class=3D""> > R15=3D0000000000000000<br class=3D""> > RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 = A20=3D1 SMM=3D0 HLT=3D0<br class=3D""> > ES =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA]<br = class=3D""> > SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA]<br = class=3D""> > DS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > FS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > GS =3D0000 ffff8817def80000 ffffffff 00000000<br class=3D""> > LDT=3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > TR =3D0040 ffff8817def93b80 00002087 00008b00 DPL=3D0 TSS64-busy<br = class=3D""> > GDT=3D ffff8817def89000 0000007f<br class=3D""> > IDT=3D ffffffffff529000 00000fff<br class=3D""> > CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b725b000 = CR4=3D001406e0<br class=3D""> > DR0=3D0000000000000000 DR1=3D0000000000000000 = DR2=3D0000000000000000<br class=3D""> > DR3=3D0000000000000000<br class=3D""> > DR6=3D00000000ffff0ff0 DR7=3D0000000000000400<br class=3D""> > EFER=3D0000000000000d01<br class=3D""> > Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f<br class=3D""> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 = 00 55 49<br class=3D""> > 89 ca<br class=3D""> > KVM: entry failed, hardware error 0x8<br class=3D""> > RAX=3D00000000ffffffed RBX=3Dffff8817ba008000 = RCX=3D0100000000000000<br class=3D""> > RDX=3D0000000000000000<br class=3D""> > RSI=3D0000000000000000 RDI=3D0000000000000046 = RBP=3Dffff8817ba00be98<br class=3D""> > RSP=3Dffff8817ba00be98<br class=3D""> > R8 =3D0000000000000000 R9 =3D0000000000000000 = R10=3D0000000000000000<br class=3D""> > R11=3D0000000000000000<br class=3D""> > R12=3D0000000000000005 R13=3Dffff8817ba008000 = R14=3Dffff8817ba008000<br class=3D""> > R15=3D0000000000000000<br class=3D""> > RIP=3Dffffffff81058e96 RFL=3D00010286 [--S--P-] CPL=3D0 II=3D0 = A20=3D1 SMM=3D0 HLT=3D0<br class=3D""> > ES =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > CS =3D0010 0000000000000000 ffffffff 00a09b00 DPL=3D0 CS64 [-RA]<br = class=3D""> > SS =3D0018 0000000000000000 ffffffff 00c09300 DPL=3D0 DS [-WA]<br = class=3D""> > DS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > FS =3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > GS =3D0000 ffff8817def40000 ffffffff 00000000<br class=3D""> > LDT=3D0000 0000000000000000 ffffffff 00000000<br class=3D""> > TR =3D0040 ffff8817def53b80 00002087 00008b00 DPL=3D0 TSS64-busy<br = class=3D""> > GDT=3D ffff8817def49000 0000007f<br class=3D""> > IDT=3D ffffffffff529000 00000fff<br class=3D""> > CR0=3D80050033 CR2=3D00000000ffffffff CR3=3D00000017b3c9a000 = CR4=3D001406e0<br class=3D""> > DR0=3D0000000000000000 DR1=3D0000000000000000 = DR2=3D0000000000000000<br class=3D""> > DR3=3D0000000000000000<br class=3D""> > DR6=3D00000000ffff0ff0 DR7=3D0000000000000400<br class=3D""> > EFER=3D0000000000000d01<br class=3D""> > Code=3D89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 = <5d> c3 0f<br class=3D""> > 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 = 00 55 49<br class=3D""> > 89 ca<br class=3D""> > KVM: entry failed, hardware error 0x80000021<br class=3D""> ><br class=3D""> > If you're running a guest on an Intel machine without unrestricted = mode<br class=3D""> > support, the failure can be most likely due to the guest entering = an invalid<br class=3D""> > state for Intel VT. For example, the guest maybe running in big = real mode<br class=3D""> > which is not supported on less recent Intel processors.<br = class=3D""> ><br class=3D""> > EAX=3Dffffffed EBX=3Dba020000 ECX=3D00000000 EDX=3D00000000<br = class=3D""> > ESI=3D00000000 EDI=3D00000046 EBP=3Dba023e98 ESP=3Dba023e98<br = class=3D""> > EIP=3D81058e96 EFL=3D00000002 [-------] CPL=3D0 II=3D0 A20=3D1 = SMM=3D0 HLT=3D0<br class=3D""> > ES =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > CS =3Df000 ffff0000 0000ffff 00009b00 DPL=3D0 CS16 [-RA]<br = class=3D""> > SS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > DS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > FS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > GS =3D0000 00000000 0000ffff 00009300 DPL=3D0 DS [-WA]<br class=3D"">= > LDT=3D0000 00000000 0000ffff 00008200 DPL=3D0 LDT<br class=3D""> > TR =3D0000 00000000 0000ffff 00008b00 DPL=3D0 TSS64-busy<br = class=3D""> > GDT=3D 0000000000000000 0000ffff<br class=3D""> > IDT=3D 0000000000000000 0000ffff<br class=3D""> > CR0=3D80050033 CR2=3D00007fd826ac20a0 CR3=3D000000003516c000 = CR4=3D00140060<br class=3D""> > DR0=3D0000000000000000 DR1=3D0000000000000000 = DR2=3D0000000000000000<br class=3D""> > DR3=3D0000000000000000<br class=3D""> > DR6=3D00000000ffff0ff0 DR7=3D0000000000000400<br class=3D""> > EFER=3D0000000000000d01<br class=3D""> > Code=3D?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = <??> ?? ??<br class=3D""> > ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? = ?? ?? ??<br class=3D""> > ?? ??<br class=3D""> ><br class=3D""> ><br class=3D""> > Searching for errors like this I found some bug report about kernel = issues<br class=3D""> > but I don't think it's the case, other VMs spawned from the same = image<br class=3D""> > migrate without any issue. I have toi say that the original host = running the<br class=3D""> > VM has some RAM problem (ECC multibit fault in one DIMM). Maybe = that's the<br class=3D""> > problem?<br class=3D""> <br class=3D""> </div></div>that seems quite likely. If you run the same VM on a = different host and try to migrate<br class=3D""> it, does it work?<br class=3D""> <span class=3D""><br class=3D""> > How can I properly read this error log?<br class=3D""> ><br class=3D""> > Thanks<br class=3D""> ><br class=3D""> > --<br class=3D""> > Davide Ferrari<br class=3D""> > Senior Systems Engineer<br class=3D""> ><br class=3D""> </span>> ______________________________<wbr = class=3D"">_________________<br class=3D""> > Users mailing list<br class=3D""> > <a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> ><br class=3D""> </blockquote></div><br class=3D""><br clear=3D"all" class=3D""><br = class=3D"">-- <br class=3D""><div data-smartmail=3D"gmail_signature" = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"">Davide Ferrari<br = class=3D""></div>Senior Systems Engineer<br class=3D""></div></div> </div> </div></div></blockquote></div><br class=3D""><br clear=3D"all" = class=3D""><br class=3D"">-- <br class=3D""><div class=3D"gmail_signature"= data-smartmail=3D"gmail_signature"><div dir=3D"ltr" class=3D""><div = class=3D"">Davide Ferrari<br class=3D""></div>Senior Systems Engineer<br = class=3D""></div></div> </div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_8A13E691-C43E-4D9A-9A61-E680954C2884--

2016-09-30 15:35 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:
that is a very low level error really pointing at HW issues. It may or may not be detected by memtest…but I would give it a try
I left memtest86 running for 2 days and no error detected :(
The only difference that this host (vmhost01) has is that it was the first host installed in my self-hosted engine installation. But I have already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from 4.0.3.
does it happen only for the big 96GB VM? The others which you said are working, are they all small? Might be worth trying other system stability tests, playing with safer/slower settings in BIOS, use lower CPU cluster, etc
Yep, it happens only for the 96GB VM. Other VMs with fewer RAM (16GB for example) can be created on or migrated to that host flawlessly. I'll try to play a little with BIOS settings but otherwise I'll have the HW replaced. I was only trying to rule out possible oVirt SW problems due to that host being the first I deployed (from CLI) when I installed the cluster. Thanks! -- Davide Ferrari Senior Systems Engineer

On 3 Oct 2016, at 10:39, Davide Ferrari <davide@billymob.com> wrote: =20 =20 =20 2016-09-30 15:35 GMT+02:00 Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>>: =20 =20 that is a very low level error really pointing at HW issues. It may or = may not be detected by memtest=E2=80=A6but I would give it a try =20 =20 I left memtest86 running for 2 days and no error detected :( =20
The only difference that this host (vmhost01) has is that it was the = first host installed in my self-hosted engine installation. But I have = already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 = from 4.0.3. =20 does it happen only for the big 96GB VM? The others which you said are = working, are they all small? Might be worth trying other system stability tests, playing with = safer/slower settings in BIOS, use lower CPU cluster, etc =20 =20 Yep, it happens only for the 96GB VM. Other VMs with fewer RAM (16GB = for example) can be created on or migrated to that host flawlessly. I'll =
--Apple-Mail=_C55C9920-F117-4D6C-BD94-0D2709735EBA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 try to play a little with BIOS settings but otherwise I'll have the HW = replaced. I was only trying to rule out possible oVirt SW problems due = to that host being the first I deployed (from CLI) when I installed the = cluster. I understand. Unfortunately it really does look like some sort of = incompatibility rather than a sw issue:/
=20 Thanks! =20 --=20 Davide Ferrari Senior Systems Engineer _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_C55C9920-F117-4D6C-BD94-0D2709735EBA Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 3 Oct 2016, at 10:39, Davide Ferrari <<a = href=3D"mailto:davide@billymob.com" class=3D"">davide@billymob.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><br class=3D""><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote">2016-09-30 15:35 GMT+02:00 Michal = Skrivanek <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>></span>:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><div = style=3D"word-wrap:break-word" class=3D""><br class=3D""><br = class=3D""><div class=3D"">that is a very low level error really = pointing at HW issues. It may or may not be detected by memtest=E2=80=A6bu= t I would give it a try</div><div class=3D""><span class=3D""><br = class=3D""></span></div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">I left memtest86 running for 2 days and = no error detected :(<br class=3D""> <br class=3D""></div><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><div style=3D"word-wrap:break-word" = class=3D""><div class=3D""><span class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div class=3D"">The= only difference that this host (vmhost01) has is that it was the first = host installed in my self-hosted engine installation. But I have already = reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from = 4.0.3.<br class=3D""></div></div></div></blockquote><div class=3D""><br = class=3D""></div></span><div class=3D"">does it happen only for the big = 96GB VM? The others which you said are working, are they all = small?</div>Might be worth trying other system stability tests, playing = with safer/slower settings in BIOS, use lower CPU cluster, etc</div><div = class=3D""><div class=3D"h5"><br = class=3D""></div></div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">Yep, it happens only for the 96GB VM. = Other VMs with fewer RAM (16GB for example) can be created on or = migrated to that host flawlessly. I'll try to play a little with BIOS = settings but otherwise I'll have the HW replaced. I was only trying to = rule out possible oVirt SW problems due to that host being the first I = deployed (from CLI) when I installed the cluster.<br = class=3D""></div></div></div></div></div></blockquote><div><br = class=3D""></div>I understand. Unfortunately it really does look like = some sort of incompatibility rather than a sw issue:/</div><div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><br class=3D""></div><div = class=3D"">Thanks!<br class=3D""></div></div><br class=3D"">-- <br = class=3D""><div class=3D"gmail_signature" = data-smartmail=3D"gmail_signature"><div dir=3D"ltr" class=3D""><div = class=3D"">Davide Ferrari<br class=3D""></div>Senior Systems Engineer<br = class=3D""></div></div> </div></div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_C55C9920-F117-4D6C-BD94-0D2709735EBA--

Hello just for the record, after I have that server replaced (only motherboard+ram+controller, same disks), now everything works ok, so it was definitely an hardware issue. Thanks everyone for the troubleshoot help! 2016-10-04 18:06 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:
On 3 Oct 2016, at 10:39, Davide Ferrari <davide@billymob.com> wrote:
2016-09-30 15:35 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:
that is a very low level error really pointing at HW issues. It may or may not be detected by memtest…but I would give it a try
I left memtest86 running for 2 days and no error detected :(
The only difference that this host (vmhost01) has is that it was the first host installed in my self-hosted engine installation. But I have already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from 4.0.3.
does it happen only for the big 96GB VM? The others which you said are working, are they all small? Might be worth trying other system stability tests, playing with safer/slower settings in BIOS, use lower CPU cluster, etc
Yep, it happens only for the 96GB VM. Other VMs with fewer RAM (16GB for example) can be created on or migrated to that host flawlessly. I'll try to play a little with BIOS settings but otherwise I'll have the HW replaced. I was only trying to rule out possible oVirt SW problems due to that host being the first I deployed (from CLI) when I installed the cluster.
I understand. Unfortunately it really does look like some sort of incompatibility rather than a sw issue:/
Thanks!
-- Davide Ferrari Senior Systems Engineer _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Davide Ferrari Senior Systems Engineer
participants (3)
-
Davide Ferrari
-
Michal Skrivanek
-
Tomas Jelinek