<div dir="ltr"><div><div>Hello<br><br></div>Today I've the faulty DIMMs replaced, started the same VM again and did the same migration and this time worked, so it was 100% due to that.<br><br></div>The problem that make me wonder a bit is: if it's the source host with memory problem the one which blocks the correct migration, a faulty DIMM will force you to stop the VMs running on that host, because you cannot simply migrate them away to do the maintenence tasks...<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-09-29 13:53 GMT+02:00 Tomas Jelinek <span dir="ltr"><<a href="mailto:tjelinek@redhat.com" target="_blank">tjelinek@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
----- Original Message -----<br>
> From: "Davide Ferrari" <<a href="mailto:davide@billymob.com">davide@billymob.com</a>><br>
> To: "users" <<a href="mailto:users@ovirt.org">users@ovirt.org</a>><br>
> Sent: Wednesday, September 28, 2016 2:59:59 PM<br>
> Subject: [ovirt-users] VM pauses/hangs after migration<br>
><br>
> Hello<br>
><br>
> trying to migrate a VM from one host to another, a big VM with 96GB of RAM, I<br>
> found that when the migration completes, the VM goes to a paused satte and<br>
> cannot be resumed. The libvirt/qemu log it gives is this:<br>
><br>
> 2016-09-28T12:18:15.679176Z qemu-kvm: error while loading state section id<br>
> 2(ram)<br>
> 2016-09-28T12:18:15.680010Z qemu-kvm: load of migration failed: Input/output<br>
> error<br>
> 2016-09-28 12:18:15.872+0000: shutting down<br>
> 2016-09-28 12:22:21.467+0000: starting up libvirt version: 1.2.17, package:<br>
> 13.el7_2.5 (CentOS BuildSystem < <a href="http://bugs.centos.org" rel="noreferrer" target="_blank">http://bugs.centos.org</a> >,<br>
</span>> <a href="tel:2016-06-23-14" value="+12016062314">2016-06-23-14</a>:23:27, <a href="http://worker1.bsys.centos.org" rel="noreferrer" target="_blank">worker1.bsys.centos.org</a> ), qemu version: 2.3.0<br>
<div><div class="h5">> (qemu-kvm-ev-2.3.0-31.el7.16.<wbr>1)<br>
> LC_ALL=C PATH=/usr/local/sbin:/usr/<wbr>local/bin:/usr/sbin:/usr/bin<br>
> QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name <a href="http://front04.billydomain.com" rel="noreferrer" target="_blank">front04.billydomain.com</a> -S<br>
> -machine pc-i440fx-rhel7.2.0,accel=kvm,<wbr>usb=off -cpu Haswell-noTSX -m<br>
> size=100663296k,slots=16,<wbr>maxmem=4294967296k -realtime mlock=off -smp<br>
> 32,sockets=16,cores=1,threads=<wbr>2 -numa node,nodeid=0,cpus=0-31,mem=<wbr>98304<br>
> -uuid 4511d1c0-6607-418f-ae75-<wbr>34f605b2ad68 -smbios<br>
> type=1,manufacturer=oVirt,<wbr>product=oVirt<br>
> Node,version=7-2.1511.el7.<wbr>centos.2.10,serial=4C4C4544-<wbr>004A-3310-8054-B2C04F474432,<wbr>uuid=4511d1c0-6607-418f-ae75-<wbr>34f605b2ad68<br>
> -no-user-config -nodefaults -chardev<br>
> socket,id=charmonitor,path=/<wbr>var/lib/libvirt/qemu/<br>
> <a href="http://domain-front04.billydomain.com/monitor.sock,server,nowait" rel="noreferrer" target="_blank">domain-front04.billydomain.<wbr>com/monitor.sock,server,nowait</a> -mon<br>
> chardev=charmonitor,id=<wbr>monitor,mode=control -rtc<br>
> base=2016-09-28T14:22:21,<wbr>driftfix=slew -global<br>
> kvm-pit.lost_tick_policy=<wbr>discard -no-hpet -no-shutdown -boot strict=on<br>
> -device piix3-usb-uhci,id=usb,bus=pci.<wbr>0,addr=0x1.0x2 -device<br>
> virtio-scsi-pci,id=scsi0,bus=<wbr>pci.0,addr=0x7 -device<br>
> virtio-serial-pci,id=virtio-<wbr>serial0,max_ports=16,bus=pci.<wbr>0,addr=0x4 -drive<br>
> if=none,id=drive-ide0-1-0,<wbr>readonly=on,format=raw -device<br>
> ide-cd,bus=ide.1,unit=0,drive=<wbr>drive-ide0-1-0,id=ide0-1-0 -drive<br>
> file=/rhev/data-center/<wbr>00000001-0001-0001-0001-<wbr>0000000003e3/ba2bd397-9222-<wbr>424d-aecc-eb652c0169d9/images/<wbr>b5b49d5c-2378-4639-9469-<wbr>362e37ae7473/24fd0d3c-309b-<wbr>458d-9818-4321023afacf,if=<wbr>none,id=drive-virtio-disk0,<wbr>format=qcow2,serial=b5b49d5c-<wbr>2378-4639-9469-362e37ae7473,<wbr>cache=none,werror=stop,rerror=<wbr>stop,aio=threads<br>
> -device<br>
> virtio-blk-pci,scsi=off,bus=<wbr>pci.0,addr=0x5,drive=drive-<wbr>virtio-disk0,id=virtio-disk0,<wbr>bootindex=1<br>
> -drive<br>
> file=/rhev/data-center/<wbr>00000001-0001-0001-0001-<wbr>0000000003e3/ba2bd397-9222-<wbr>424d-aecc-eb652c0169d9/images/<wbr>f02ac1ce-52cd-4b81-8b29-<wbr>f8006d0469e0/ff4e49c6-3084-<wbr>4234-80a1-18a67615c527,if=<wbr>none,id=drive-virtio-disk1,<wbr>format=raw,serial=f02ac1ce-<wbr>52cd-4b81-8b29-f8006d0469e0,<wbr>cache=none,werror=stop,rerror=<wbr>stop,aio=threads<br>
> -device<br>
> virtio-blk-pci,scsi=off,bus=<wbr>pci.0,addr=0x8,drive=drive-<wbr>virtio-disk1,id=virtio-disk1<br>
> -netdev tap,fd=30,id=hostnet0,vhost=<wbr>on,vhostfd=31 -device<br>
> virtio-net-pci,netdev=<wbr>hostnet0,id=net0,mac=00:1a:4a:<wbr>16:01:56,bus=pci.0,addr=0x3<br>
> -chardev<br>
> socket,id=charchannel0,path=/<wbr>var/lib/libvirt/qemu/channels/<wbr>4511d1c0-6607-418f-ae75-<wbr>34f605b2ad68.com.redhat.rhevm.<wbr>vdsm,server,nowait<br>
> -device<br>
> virtserialport,bus=virtio-<wbr>serial0.0,nr=1,chardev=<wbr>charchannel0,id=channel0,name=<wbr>com.redhat.rhevm.vdsm<br>
> -chardev<br>
> socket,id=charchannel1,path=/<wbr>var/lib/libvirt/qemu/channels/<wbr>4511d1c0-6607-418f-ae75-<wbr>34f605b2ad68.org.qemu.guest_<wbr>agent.0,server,nowait<br>
> -device<br>
> virtserialport,bus=virtio-<wbr>serial0.0,nr=2,chardev=<wbr>charchannel1,id=channel1,name=<wbr>org.qemu.guest_agent.0<br>
> -chardev spicevmc,id=charchannel2,name=<wbr>vdagent -device<br>
> virtserialport,bus=virtio-<wbr>serial0.0,nr=3,chardev=<wbr>charchannel2,id=channel2,name=<wbr>com.redhat.spice.0<br>
</div></div>> -vnc <a href="http://192.168.10.225:1" rel="noreferrer" target="_blank">192.168.10.225:1</a> ,password -k es -spice<br>
<div><div class="h5">> tls-port=5902,addr=192.168.10.<wbr>225,x509-dir=/etc/pki/vdsm/<wbr>libvirt-spice,tls-channel=<wbr>default,tls-channel=main,tls-<wbr>channel=display,tls-channel=<wbr>inputs,tls-channel=cursor,tls-<wbr>channel=playback,tls-channel=<wbr>record,tls-channel=smartcard,<wbr>tls-channel=usbredir,seamless-<wbr>migration=on<br>
> -k es -device<br>
> qxl-vga,id=video0,ram_size=<wbr>67108864,vram_size=8388608,<wbr>vgamem_mb=16,bus=pci.0,addr=<wbr>0x2<br>
> -incoming tcp: <a href="http://0.0.0.0:49156" rel="noreferrer" target="_blank">0.0.0.0:49156</a> -device<br>
> virtio-balloon-pci,id=<wbr>balloon0,bus=pci.0,addr=0x6 -msg timestamp=on<br>
> Domain id=5 is tainted: hook-script<br>
> red_dispatcher_loadvm_<wbr>commands:<br>
> KVM: entry failed, hardware error 0x8<br>
> RAX=00000000ffffffed RBX=ffff8817ba00c000 RCX=0100000000000000<br>
> RDX=0000000000000000<br>
> RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00fe98<br>
> RSP=ffff8817ba00fe98<br>
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000<br>
> R11=0000000000000000<br>
> R12=0000000000000006 R13=ffff8817ba00c000 R14=ffff8817ba00c000<br>
> R15=0000000000000000<br>
> RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0<br>
> ES =0000 0000000000000000 ffffffff 00000000<br>
> CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]<br>
> SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]<br>
> DS =0000 0000000000000000 ffffffff 00000000<br>
> FS =0000 0000000000000000 ffffffff 00000000<br>
> GS =0000 ffff8817def80000 ffffffff 00000000<br>
> LDT=0000 0000000000000000 ffffffff 00000000<br>
> TR =0040 ffff8817def93b80 00002087 00008b00 DPL=0 TSS64-busy<br>
> GDT= ffff8817def89000 0000007f<br>
> IDT= ffffffffff529000 00000fff<br>
> CR0=80050033 CR2=00000000ffffffff CR3=00000017b725b000 CR4=001406e0<br>
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000<br>
> DR3=0000000000000000<br>
> DR6=00000000ffff0ff0 DR7=0000000000000400<br>
> EFER=0000000000000d01<br>
> Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f<br>
> 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49<br>
> 89 ca<br>
> KVM: entry failed, hardware error 0x8<br>
> RAX=00000000ffffffed RBX=ffff8817ba008000 RCX=0100000000000000<br>
> RDX=0000000000000000<br>
> RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8817ba00be98<br>
> RSP=ffff8817ba00be98<br>
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000<br>
> R11=0000000000000000<br>
> R12=0000000000000005 R13=ffff8817ba008000 R14=ffff8817ba008000<br>
> R15=0000000000000000<br>
> RIP=ffffffff81058e96 RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0<br>
> ES =0000 0000000000000000 ffffffff 00000000<br>
> CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]<br>
> SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]<br>
> DS =0000 0000000000000000 ffffffff 00000000<br>
> FS =0000 0000000000000000 ffffffff 00000000<br>
> GS =0000 ffff8817def40000 ffffffff 00000000<br>
> LDT=0000 0000000000000000 ffffffff 00000000<br>
> TR =0040 ffff8817def53b80 00002087 00008b00 DPL=0 TSS64-busy<br>
> GDT= ffff8817def49000 0000007f<br>
> IDT= ffffffffff529000 00000fff<br>
> CR0=80050033 CR2=00000000ffffffff CR3=00000017b3c9a000 CR4=001406e0<br>
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000<br>
> DR3=0000000000000000<br>
> DR6=00000000ffff0ff0 DR7=0000000000000400<br>
> EFER=0000000000000d01<br>
> Code=89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f<br>
> 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 00 00 00 00 00 55 49<br>
> 89 ca<br>
> KVM: entry failed, hardware error 0x80000021<br>
><br>
> If you're running a guest on an Intel machine without unrestricted mode<br>
> support, the failure can be most likely due to the guest entering an invalid<br>
> state for Intel VT. For example, the guest maybe running in big real mode<br>
> which is not supported on less recent Intel processors.<br>
><br>
> EAX=ffffffed EBX=ba020000 ECX=00000000 EDX=00000000<br>
> ESI=00000000 EDI=00000046 EBP=ba023e98 ESP=ba023e98<br>
> EIP=81058e96 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0<br>
> ES =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]<br>
> CS =f000 ffff0000 0000ffff 00009b00 DPL=0 CS16 [-RA]<br>
> SS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]<br>
> DS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]<br>
> FS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]<br>
> GS =0000 00000000 0000ffff 00009300 DPL=0 DS [-WA]<br>
> LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT<br>
> TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS64-busy<br>
> GDT= 0000000000000000 0000ffff<br>
> IDT= 0000000000000000 0000ffff<br>
> CR0=80050033 CR2=00007fd826ac20a0 CR3=000000003516c000 CR4=00140060<br>
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000<br>
> DR3=0000000000000000<br>
> DR6=00000000ffff0ff0 DR7=0000000000000400<br>
> EFER=0000000000000d01<br>
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ??<br>
> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??<br>
> ?? ??<br>
><br>
><br>
> Searching for errors like this I found some bug report about kernel issues<br>
> but I don't think it's the case, other VMs spawned from the same image<br>
> migrate without any issue. I have toi say that the original host running the<br>
> VM has some RAM problem (ECC multibit fault in one DIMM). Maybe that's the<br>
> problem?<br>
<br>
</div></div>that seems quite likely. If you run the same VM on a different host and try to migrate<br>
it, does it work?<br>
<span class=""><br>
> How can I properly read this error log?<br>
><br>
> Thanks<br>
><br>
> --<br>
> Davide Ferrari<br>
> Senior Systems Engineer<br>
><br>
</span>> ______________________________<wbr>_________________<br>
> Users mailing list<br>
> <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
><br>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Davide Ferrari<br></div>Senior Systems Engineer<br></div></div>
</div>