What type of disks are you using ? Any change you use thin disks ?
Best Regards,
Strahil Nikolov
В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via Users
<users(a)ovirt.org> написа:
Hi, sorry to bump the thread.
But I still with this issue on the VM. This crashes are still happening, and I really
don’t know what to do. Since there’s nothing on logs, except from that message on `dmesg`
of the host machine I started changing setting to see if anything changes or if I at least
I get a pattern.
What I’ve tried:
1. Disabled I/O Threading on VM.
2. Increased I/O Threading to 2 form 1.
3. Disabled Memory Balooning.
4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of RAM.
5. Moved the VM to another host.
6. Dedicated a host specific to this VM.
7. Check on the storage system to see if there’s any resource starvation, but everything
seems to be fine.
8. Checked both iSCSI switches to see if there’s something wrong with the fabrics: 0
errors.
I’m really running out of ideas. The VM was working normally and suddenly this started.
Thanks,
PS: When I was typing this message it crashed again:
[427483.126725] *** Guest State ***
[427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032,
gh_mask=fffffffffffffff7
[427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000,
gh_mask=fffffffffffff871
[427483.129342] CR3 = 0x00000001849ff002
[427483.130177] RSP = 0xffffb10186ffffb0 RIP = 0x0000000000008000
[427483.131014] RFLAGS=0x00000002 DR7 = 0x0000000000000400
[427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
[427483.132708] CS: sel=0x9b00, attr=0x08093, limit=0xffffffff, base=0x000000007ff9b000
[427483.133559] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[427483.134413] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[427483.135237] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[427483.136040] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[427483.136842] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[427483.137629] GDTR: limit=0x00000057, base=0xffffb10186eb4fb0
[427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000
[427483.139202] IDTR: limit=0x00000000, base=0x0000000000000000
[427483.139998] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffffb10186eb3000
[427483.140816] EFER = 0x0000000000000000 PAT = 0x0007010600070106
[427483.141650] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000
[427483.142503] Interruptibility = 00000009 ActivityState = 00000000
[427483.143353] *** Host State ***
[427483.144194] RIP = 0xffffffffc0c65024 RSP = 0xffff9253c0b9bc90
[427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
[427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 TRBase=ffff925adf244000
[427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000
[427483.147630] CR0=0000000080050033 CR3=00000010597b6000 CR4=00000000001627e0
[427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0
[427483.149365] EFER = 0x0000000000000d01 PAT = 0x0007050600070106
[427483.150231] *** Control State ***
[427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb
[427483.151942] EntryControls=0000d1ff ExitControls=002fefff
[427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
[427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000
[427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004
[427483.155376] reason=80000021 qualification=0000000000000000
[427483.156230] IDTVectoring: info=00000000 errcode=00000000
[427483.157068] TSC Offset = 0xfffccfc261506dd9
[427483.157905] TPR Threshold = 0x0d
[427483.158728] EPT pointer = 0x00000009b437701e
[427483.159550] PLE Gap=00000080 Window=00080000
[427483.160370] Virtual processor ID = 0x0004
On 16 Sep 2020, at 17:11, Vinícius Ferrão
<ferrao(a)versatushpc.com.br> wrote:
Hello,
I’m an Exchange Server VM that’s going down to suspend without possibility of recovery. I
need to click on shutdown and them power on. I can’t find anything useful on the logs,
except on “dmesg” of the host:
[47807.747606] *** Guest State ***
[47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032,
gh_mask=fffffffffffffff7
[47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000,
gh_mask=fffffffffffff871
[47807.747721] CR3 = 0x00000000001ad002
[47807.747739] RSP = 0xffffc20904fa3770 RIP = 0x0000000000008000
[47807.747766] RFLAGS=0x00000002 DR7 = 0x0000000000000400
[47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
[47807.747821] CS: sel=0x9100, attr=0x08093, limit=0xffffffff, base=0x000000007ff91000
[47807.747855] DS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[47807.747889] SS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[47807.747923] ES: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[47807.747957] FS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[47807.747991] GS: sel=0x0000, attr=0x08093, limit=0xffffffff, base=0x0000000000000000
[47807.748025] GDTR: limit=0x00000057, base=0xffff80817e7d5fb0
[47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, base=0x0000000000000000
[47807.748093] IDTR: limit=0x00000000, base=0x0000000000000000
[47807.748128] TR: sel=0x0040, attr=0x0008b, limit=0x00000067, base=0xffff80817e7d4000
[47807.748162] EFER = 0x0000000000000000 PAT = 0x0007010600070106
[47807.748189] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000
[47807.748221] Interruptibility = 00000009 ActivityState = 00000000
[47807.748248] *** Host State ***
[47807.748263] RIP = 0xffffffffc0c65024 RSP = 0xffff9252bda5fc90
[47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
[47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 TRBase=ffff9252ffac4000
[47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000
[47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 CR4=00000000001627e0
[47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0
[47807.748435] EFER = 0x0000000000000d01 PAT = 0x0007050600070106
[47807.748461] *** Control State ***
[47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb
[47807.748507] EntryControls=0000d1ff ExitControls=002fefff
[47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
[47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000
[47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001
[47807.748618] reason=80000021 qualification=0000000000000000
[47807.748645] IDTVectoring: info=00000000 errcode=00000000
[47807.748669] TSC Offset = 0xfffff9b8c8d943b6
[47807.748699] TPR Threshold = 0x00
[47807.748715] EPT pointer = 0x000000105cd5601e
[47807.748735] PLE Gap=00000080 Window=00001000
[47807.748755] Virtual processor ID = 0x0003
So something really went crazy. The VM is going down at least two times a day for the
last 5 days.
At first I thought it would be an hardware issue, so I restarted the VM on other host,
and the same thing happened.
About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 with iSCSI
storage to a FreeNAS box, where the VM disks are running; there are a 300GB disc for C:\
and 2TB disk for D:\.
Any ideia on how to start troubleshooting it?
Thanks,
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MU...