How to discover why a VM is getting suspended without recovery possibility?

Hello, I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host: [47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003 So something really went crazy. The VM is going down at least two times a day for the last 5 days. At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened. About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\. Any ideia on how to start troubleshooting it? Thanks,

Hi, sorry to bump the thread. But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern. What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors. I’m really running out of ideas. The VM was working normally and suddenly this started. Thanks, PS: When I was typing this message it crashed again: [427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host:
[47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,

What type of disks are you using ? Any change you use thin disks ? Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org> написа: Hi, sorry to bump the thread. But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern. What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors. I’m really running out of ideas. The VM was working normally and suddenly this started. Thanks, PS: When I was typing this message it crashed again: [427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host:
[47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...

Hi Strahil, Both disks are VirtIO-SCSI and are Preallocated: [cid:2BC106D6-211A-4205-B0AF-29DE0B123B3F@home.ferrao.eti.br] [cid:EE88C68A-8F58-4B94-8C0A-D2FBD29517A0@home.ferrao.eti.br] Thanks, On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86_bg@yahoo.com<mailto:hunter86_bg@yahoo.com>> wrote: What type of disks are you using ? Any change you use thin disks ? Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org<mailto:users@ovirt.org>> написа: Hi, sorry to bump the thread. But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern. What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors. I’m really running out of ideas. The VM was working normally and suddenly this started. Thanks, PS: When I was typing this message it crashed again: [427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004 On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br<mailto:ferrao@versatushpc.com.br>> wrote: Hello, I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host: [47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003 So something really went crazy. The VM is going down at least two times a day for the last 5 days. At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened. About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\. Any ideia on how to start troubleshooting it? Thanks, _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...

Usually libvirt's log might provide hints (yet , no clues) of any issues. For example: /var/log/libvirt/qemu/<VM_NAME>.log Anything changed recently (maybe oVirt version was increased) ? Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа: Hi Strahil, Both disks are VirtIO-SCSI and are Preallocated: Thanks,
On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
What type of disks are you using ? Any change you use thin disks ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org> написа:
Hi, sorry to bump the thread.
But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern.
What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors.
I’m really running out of ideas. The VM was working normally and suddenly this started.
Thanks,
PS: When I was typing this message it crashed again:
[427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host:
[47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...

Strahil, thank you man. We finally got some output: 2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0] 2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000000 EBX=01746180 ECX=4be7c002 EDX=000400b6 ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =8d00 7ff8d000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 04c59000 00000067 00008b00 GDT= 04c5afb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 (<unknown process>) 2020-09-16 04:12:02.212+0000: shutting down, reason=shutdown That’s the issue, I got this on the logs of both physical machines. The probability of both machines are damaged is not quite common right? So even with the log saying it’s a hardware error it may be software related? And again, this only happens with this VM.
On 21 Sep 2020, at 17:36, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Usually libvirt's log might provide hints (yet , no clues) of any issues.
For example: /var/log/libvirt/qemu/<VM_NAME>.log
Anything changed recently (maybe oVirt version was increased) ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа:
Hi Strahil,
Both disks are VirtIO-SCSI and are Preallocated:
Thanks,
On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
What type of disks are you using ? Any change you use thin disks ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org> написа:
Hi, sorry to bump the thread.
But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern.
What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors.
I’m really running out of ideas. The VM was working normally and suddenly this started.
Thanks,
PS: When I was typing this message it crashed again:
[427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host:
[47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...

Interesting is that I don't find anything recent , but this one: https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653 Can you check if anything in the OS was updated/changed recently ? Also check if the VM is with nested virtualization enabled. Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа: Strahil, thank you man. We finally got some output: 2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0] 2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000000 EBX=01746180 ECX=4be7c002 EDX=000400b6 ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =8d00 7ff8d000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 04c59000 00000067 00008b00 GDT= 04c5afb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 (<unknown process>) 2020-09-16 04:12:02.212+0000: shutting down, reason=shutdown That’s the issue, I got this on the logs of both physical machines. The probability of both machines are damaged is not quite common right? So even with the log saying it’s a hardware error it may be software related? And again, this only happens with this VM.
On 21 Sep 2020, at 17:36, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Usually libvirt's log might provide hints (yet , no clues) of any issues.
For example: /var/log/libvirt/qemu/<VM_NAME>.log
Anything changed recently (maybe oVirt version was increased) ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа:
Hi Strahil,
Both disks are VirtIO-SCSI and are Preallocated:
Thanks,
On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
What type of disks are you using ? Any change you use thin disks ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org> написа:
Hi, sorry to bump the thread.
But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern.
What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors.
I’m really running out of ideas. The VM was working normally and suddenly this started.
Thanks,
PS: When I was typing this message it crashed again:
[427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host:
[47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...

Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not: https://patchwork.kernel.org/patch/5526561/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 Regarding the OS, nothing new was installed, just regular Windows Updates. And finally about nested virtualisation, it’s disabled on hypervisor. One thing that caught my attention on the link you’ve sent is regarding a rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443 But come on, it’s from 2006… Well, I’m up to other ideas, VM just crashed once again: EAX=00000000 EBX=075c5180 ECX=75432002 EDX=000400b6 ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =9900 7ff99000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 075da000 00000067 00008b00 GDT= 075dbfb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff4ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [519192.536247] *** Guest State *** [519192.536275] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [519192.536324] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [519192.537322] CR3 = 0x00000000001ad002 [519192.538166] RSP = 0xfffffb047db5d770 RIP = 0x0000000000008000 [519192.539017] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [519192.539861] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [519192.540690] CS: sel=0x9900, attr=0x08093, limit=0xffffffff, base=0x000000007ff99000 [519192.541523] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.542356] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.543167] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.543961] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.544747] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.545511] GDTR: limit=0x00000057, base=0xffffad01075dbfb0 [519192.546275] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [519192.547052] IDTR: limit=0x00000000, base=0x0000000000000000 [519192.547841] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffad01075da000 [519192.548639] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [519192.549460] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [519192.550302] Interruptibility = 00000009 ActivityState = 00000000 [519192.551137] *** Host State *** [519192.551963] RIP = 0xffffffffc150a034 RSP = 0xffff88cd9cafbc90 [519192.552805] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [519192.553646] FSBase=00007f7da762a700 GSBase=ffff88d45f2c0000 TRBase=ffff88d45f2c4000 [519192.554496] GDTBase=ffff88d45f2cc000 IDTBase=ffffffffff528000 [519192.555347] CR0=0000000080050033 CR3=000000033dc82000 CR4=00000000001627e0 [519192.556202] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff91596cc0 [519192.557058] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [519192.557913] *** Control State *** [519192.558757] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [519192.559605] EntryControls=0000d1ff ExitControls=002fefff [519192.560453] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [519192.561306] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [519192.562158] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [519192.563006] reason=80000021 qualification=0000000000000000 [519192.563860] IDTVectoring: info=00000000 errcode=00000000 [519192.564695] TSC Offset = 0xfffcc6c7d53f16d7 [519192.565526] TPR Threshold = 0x00 [519192.566345] EPT pointer = 0x0000000b9397901e [519192.567162] PLE Gap=00000080 Window=00001000 [519192.567984] Virtual processor ID = 0x0005 Thank you! On 22 Sep 2020, at 02:30, Strahil Nikolov <hunter86_bg@yahoo.com<mailto:hunter86_bg@yahoo.com>> wrote: Interesting is that I don't find anything recent , but this one: https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653 Can you check if anything in the OS was updated/changed recently ? Also check if the VM is with nested virtualization enabled. Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа: Strahil, thank you man. We finally got some output: 2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0] 2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000000 EBX=01746180 ECX=4be7c002 EDX=000400b6 ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =8d00 7ff8d000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 04c59000 00000067 00008b00 GDT= 04c5afb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 (<unknown process>) 2020-09-16 04:12:02.212+0000: shutting down, reason=shutdown That’s the issue, I got this on the logs of both physical machines. The probability of both machines are damaged is not quite common right? So even with the log saying it’s a hardware error it may be software related? And again, this only happens with this VM. On 21 Sep 2020, at 17:36, Strahil Nikolov <hunter86_bg@yahoo.com> wrote: Usually libvirt's log might provide hints (yet , no clues) of any issues. For example: /var/log/libvirt/qemu/<VM_NAME>.log Anything changed recently (maybe oVirt version was increased) ? Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа: Hi Strahil, Both disks are VirtIO-SCSI and are Preallocated: Thanks, On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote: What type of disks are you using ? Any change you use thin disks ? Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org> написа: Hi, sorry to bump the thread. But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern. What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors. I’m really running out of ideas. The VM was working normally and suddenly this started. Thanks, PS: When I was typing this message it crashed again: [427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004 On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote: Hello, I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host: [47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003 So something really went crazy. The VM is going down at least two times a day for the last 5 days. At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened. About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\. Any ideia on how to start troubleshooting it? Thanks, _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...

On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users <users@ovirt.org> wrote:
Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not:
https://patchwork.kernel.org/patch/5526561/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027
Regarding the OS, nothing new was installed, just regular Windows Updates. And finally about nested virtualisation, it’s disabled on hypervisor.
In your original post you wrote about the VM going suspended. So I think there could be something useful in engine.log on the engine and/or vdsm.log on the hypervisor. Could you check those? Also, do you see anything in event viewer of the WIndows VM and/or in Freenas logs? Gianluca

Hi Gianluca. On 22 Sep 2020, at 04:24, Gianluca Cecchi <gianluca.cecchi@gmail.com<mailto:gianluca.cecchi@gmail.com>> wrote: On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users <users@ovirt.org<mailto:users@ovirt.org>> wrote: Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not: https://patchwork.kernel.org/patch/5526561/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 Regarding the OS, nothing new was installed, just regular Windows Updates. And finally about nested virtualisation, it’s disabled on hypervisor. In your original post you wrote about the VM going suspended. So I think there could be something useful in engine.log on the engine and/or vdsm.log on the hypervisor. Could you check those? Yes I goes to suspend. I think this is just the engine don’t knowing what really happened and guessing it was suspended. On engine.log I only have this two lines: # grep "2020-09-22 01:51" /var/log/ovirt-engine/engine.log 2020-09-22 01:51:52,604-03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] VM '351db98a-5f74-439f-99a4-31f611b2d250'(cerulean) moved from 'Up' --> 'Paused' 2020-09-22 01:51:52,699-03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] EVENT_ID: VM_PAUSED(1,025), VM cerulean has been paused. Note that I’ve “grepped” with time. There’s only this two lines when it crashed like 2h30m ago. On vdsm.log on the near time with the name of the VM I only found an huge JSON, with the characteristics of the VM. If there something that I should check specifically? Tried some combinations of “grep” but nothing really useful. Also, do you see anything in event viewer of the WIndows VM and/or in Freenas logs? FreeNAS is just cool, nothing wrong there. No errors on dmesg, nor resource starvation on ZFS. No overload on the disks, nothing… the storage is running easy. About Windows Event Viewer it’s my Achilles’ heel; nothing relevant either as far as I’m concerned. There’s of course some mentions of improperly shutdown due to the crash, but nothing else. I’m looking further here, will report back if I found something useful. Thanks, Gianluca

This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept a pretty valid instruction and it was a bug in KVM. Maybe you can try to : - power off the VM - pick an older CPU type for that VM only - power on and monitor in the next days Do you have a cluster with different cpu vendor (if currently on AMD -> Intel and if currently Intel -> AMD)? Maybe you can move it to another cluster and identify if the issue happens there too. Another option is to try to rollback the windows updates , to identify if any of them has caused the problem. Yet, that's aworkaround and not a fix . Are you using oVirt 4.3 or 4.4 ? Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа: Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not: https://patchwork.kernel.org/patch/5526561/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 Regarding the OS, nothing new was installed, just regular Windows Updates. And finally about nested virtualisation, it’s disabled on hypervisor. One thing that caught my attention on the link you’ve sent is regarding a rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443 But come on, it’s from 2006… Well, I’m up to other ideas, VM just crashed once again: EAX=00000000 EBX=075c5180 ECX=75432002 EDX=000400b6 ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =9900 7ff99000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 075da000 00000067 00008b00 GDT= 075dbfb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff4ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [519192.536247] *** Guest State *** [519192.536275] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [519192.536324] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [519192.537322] CR3 = 0x00000000001ad002 [519192.538166] RSP = 0xfffffb047db5d770 RIP = 0x0000000000008000 [519192.539017] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [519192.539861] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [519192.540690] CS: sel=0x9900, attr=0x08093, limit=0xffffffff, base=0x000000007ff99000 [519192.541523] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.542356] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.543167] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.543961] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.544747] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.545511] GDTR: limit=0x00000057, base=0xffffad01075dbfb0 [519192.546275] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [519192.547052] IDTR: limit=0x00000000, base=0x0000000000000000 [519192.547841] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffad01075da000 [519192.548639] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [519192.549460] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [519192.550302] Interruptibility = 00000009 ActivityState = 00000000 [519192.551137] *** Host State *** [519192.551963] RIP = 0xffffffffc150a034 RSP = 0xffff88cd9cafbc90 [519192.552805] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [519192.553646] FSBase=00007f7da762a700 GSBase=ffff88d45f2c0000 TRBase=ffff88d45f2c4000 [519192.554496] GDTBase=ffff88d45f2cc000 IDTBase=ffffffffff528000 [519192.555347] CR0=0000000080050033 CR3=000000033dc82000 CR4=00000000001627e0 [519192.556202] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff91596cc0 [519192.557058] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [519192.557913] *** Control State *** [519192.558757] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [519192.559605] EntryControls=0000d1ff ExitControls=002fefff [519192.560453] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [519192.561306] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [519192.562158] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [519192.563006] reason=80000021 qualification=0000000000000000 [519192.563860] IDTVectoring: info=00000000 errcode=00000000 [519192.564695] TSC Offset = 0xfffcc6c7d53f16d7 [519192.565526] TPR Threshold = 0x00 [519192.566345] EPT pointer = 0x0000000b9397901e [519192.567162] PLE Gap=00000080 Window=00001000 [519192.567984] Virtual processor ID = 0x0005 Thank you!
On 22 Sep 2020, at 02:30, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Interesting is that I don't find anything recent , but this one: https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653
Can you check if anything in the OS was updated/changed recently ?
Also check if the VM is with nested virtualization enabled.
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа:
Strahil, thank you man. We finally got some output:
2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0] 2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors.
EAX=00000000 EBX=01746180 ECX=4be7c002 EDX=000400b6 ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =8d00 7ff8d000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 04c59000 00000067 00008b00 GDT= 04c5afb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 (<unknown process>) 2020-09-16 04:12:02.212+0000: shutting down, reason=shutdown
That’s the issue, I got this on the logs of both physical machines. The probability of both machines are damaged is not quite common right? So even with the log saying it’s a hardware error it may be software related? And again, this only happens with this VM.
On 21 Sep 2020, at 17:36, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Usually libvirt's log might provide hints (yet , no clues) of any issues.
For example: /var/log/libvirt/qemu/<VM_NAME>.log
Anything changed recently (maybe oVirt version was increased) ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа:
Hi Strahil,
Both disks are VirtIO-SCSI and are Preallocated:
Thanks,
On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
What type of disks are you using ? Any change you use thin disks ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org> написа:
Hi, sorry to bump the thread.
But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern.
What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors.
I’m really running out of ideas. The VM was working normally and suddenly this started.
Thanks,
PS: When I was typing this message it crashed again:
[427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host:
[47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...

Hi again Strahil, It’s oVirt 4.3.10. Same CPU on the entire cluster, it’s three machines with Xeon E5-2620v2 (Ivy Bridge), all the machines are identical in model and specs. I’ve changed the VM CPU Model to: Nehalem,+spec-ctrl,+ssbd Let’s see how it behaves. If it crashes again I’ll definitely look at rolling back the OS updates. Thank you all. PS: I can try upgrading to 4.4.
On 22 Sep 2020, at 04:28, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept a pretty valid instruction and it was a bug in KVM.
Maybe you can try to : - power off the VM - pick an older CPU type for that VM only - power on and monitor in the next days
Do you have a cluster with different cpu vendor (if currently on AMD -> Intel and if currently Intel -> AMD)? Maybe you can move it to another cluster and identify if the issue happens there too.
Another option is to try to rollback the windows updates , to identify if any of them has caused the problem. Yet, that's aworkaround and not a fix .
Are you using oVirt 4.3 or 4.4 ?
Best Regards, Strahil Nikolov
В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа:
Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not:
https://patchwork.kernel.org/patch/5526561/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027
Regarding the OS, nothing new was installed, just regular Windows Updates.
And finally about nested virtualisation, it’s disabled on hypervisor.
One thing that caught my attention on the link you’ve sent is regarding a rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443
But come on, it’s from 2006…
Well, I’m up to other ideas, VM just crashed once again:
EAX=00000000 EBX=075c5180 ECX=75432002 EDX=000400b6 ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =9900 7ff99000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 075da000 00000067 00008b00 GDT= 075dbfb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff4ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[519192.536247] *** Guest State *** [519192.536275] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [519192.536324] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [519192.537322] CR3 = 0x00000000001ad002 [519192.538166] RSP = 0xfffffb047db5d770 RIP = 0x0000000000008000 [519192.539017] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [519192.539861] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [519192.540690] CS: sel=0x9900, attr=0x08093, limit=0xffffffff, base=0x000000007ff99000 [519192.541523] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.542356] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.543167] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.543961] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.544747] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [519192.545511] GDTR: limit=0x00000057, base=0xffffad01075dbfb0 [519192.546275] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [519192.547052] IDTR: limit=0x00000000, base=0x0000000000000000 [519192.547841] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffad01075da000 [519192.548639] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [519192.549460] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [519192.550302] Interruptibility = 00000009 ActivityState = 00000000 [519192.551137] *** Host State *** [519192.551963] RIP = 0xffffffffc150a034 RSP = 0xffff88cd9cafbc90 [519192.552805] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [519192.553646] FSBase=00007f7da762a700 GSBase=ffff88d45f2c0000 TRBase=ffff88d45f2c4000 [519192.554496] GDTBase=ffff88d45f2cc000 IDTBase=ffffffffff528000 [519192.555347] CR0=0000000080050033 CR3=000000033dc82000 CR4=00000000001627e0 [519192.556202] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff91596cc0 [519192.557058] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [519192.557913] *** Control State *** [519192.558757] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [519192.559605] EntryControls=0000d1ff ExitControls=002fefff [519192.560453] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [519192.561306] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [519192.562158] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [519192.563006] reason=80000021 qualification=0000000000000000 [519192.563860] IDTVectoring: info=00000000 errcode=00000000 [519192.564695] TSC Offset = 0xfffcc6c7d53f16d7 [519192.565526] TPR Threshold = 0x00 [519192.566345] EPT pointer = 0x0000000b9397901e [519192.567162] PLE Gap=00000080 Window=00001000 [519192.567984] Virtual processor ID = 0x0005
Thank you!
On 22 Sep 2020, at 02:30, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Interesting is that I don't find anything recent , but this one: https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653
Can you check if anything in the OS was updated/changed recently ?
Also check if the VM is with nested virtualization enabled.
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа:
Strahil, thank you man. We finally got some output:
2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0] 2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors.
EAX=00000000 EBX=01746180 ECX=4be7c002 EDX=000400b6 ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770 EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =8d00 7ff8d000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 000fffff 00000000 TR =0040 04c59000 00000067 00008b00 GDT= 04c5afb0 00000057 IDT= 00000000 00000000 CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 (<unknown process>) 2020-09-16 04:12:02.212+0000: shutting down, reason=shutdown
That’s the issue, I got this on the logs of both physical machines. The probability of both machines are damaged is not quite common right? So even with the log saying it’s a hardware error it may be software related? And again, this only happens with this VM.
On 21 Sep 2020, at 17:36, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Usually libvirt's log might provide hints (yet , no clues) of any issues.
For example: /var/log/libvirt/qemu/<VM_NAME>.log
Anything changed recently (maybe oVirt version was increased) ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão <ferrao@versatushpc.com.br> написа:
Hi Strahil,
Both disks are VirtIO-SCSI and are Preallocated:
Thanks,
On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
What type of disks are you using ? Any change you use thin disks ?
Best Regards, Strahil Nikolov
В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users <users@ovirt.org> написа:
Hi, sorry to bump the thread.
But I still with this issue on the VM. This crashes are still happening, and I really don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg` of the host machine I started changing setting to see if anything changes or if I at least I get a pattern.
What I’ve tried: 1. Disabled I/O Threading on VM. 2. Increased I/O Threading to 2 form 1. 3. Disabled Memory Balooning. 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM. 5. Moved the VM to another host. 6. Dedicated a host specific to this VM. 7. Check on the storage system to see if there’s any resource starvation, but everything seems to be fine. 8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0 errors.
I’m really running out of ideas. The VM was working normally and suddenly this started.
Thanks,
PS: When I was typing this message it crashed again:
[427483.126725] *** Guest State *** [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [427483.129342] CR3 = 0x00000001849ff002 [427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000 [427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000 [427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0 [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000 [427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000 [427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [427483.142503] Interruptibility = 00000009 ActivityState = 00000000 [427483.143353] *** Host State *** [427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90 [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000 [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000 [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0 [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [427483.150231] *** Control State *** [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [427483.151942] EntryControls=0000d1ff ExitControls=002fefff [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004 [427483.155376] reason=80000021 qualification=0000000000000000 [427483.156230] IDTVectoring: info=00000000 errcode=00000000 [427483.157068] TSC Offset = 0xfffccfc261506dd9 [427483.157905] TPR Threshold = 0x0d [427483.158728] EPT pointer = 0x00000009b437701e [427483.159550] PLE Gap=00000080 Window=00080000 [427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão <ferrao@versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I need to click on shutdown and them power on. I can’t find anything useful on the logs, except on “dmesg” of the host:
[47807.747606] *** Guest State *** [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, gh_mask=fffffffffffffff7 [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=fffffffffffff871 [47807.747721] CR3 = 0x00000000001ad002 [47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000 [47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 [47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000 [47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000 [47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0 [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000 [47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000 [47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000 [47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106 [47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [47807.748221] Interruptibility = 00000009 ActivityState = 00000000 [47807.748248] *** Host State *** [47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90 [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000 [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000 [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0 [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0 [47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106 [47807.748461] *** Control State *** [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb [47807.748507] EntryControls=0000d1ff ExitControls=002fefff [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000 [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001 [47807.748618] reason=80000021 qualification=0000000000000000 [47807.748645] IDTVectoring: info=00000000 errcode=00000000 [47807.748669] TSC Offset = 0xfffff9b8c8d943b6 [47807.748699] TPR Threshold = 0x00 [47807.748715] EPT pointer = 0x000000105cd5601e [47807.748735] PLE Gap=00000080 Window=00001000 [47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host, and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\ and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPV...
participants (3)
-
Gianluca Cecchi
-
Strahil Nikolov
-
Vinícius Ferrão