[ovirt-users] VM HostedEngine is down with error

Tuesday, 1 September 2020

Hello everyone, 

I have a replica 2 + arbiter installation and this morning the Hosted Engine gave the
following error on the UI and resumed on a different node (node3) than the one it was
originally running(node1). (The original node has more memory than the one it ended up,
but it had a better memory usage percentage at the time). Also, the only way I discovered
the migration had happened and there was an Error in Events, was because I logged in the
web interface of ovirt for a routine inspection. Βesides that, everything was working
properly and still is.

The error that popped is the following:

VM HostedEngine is down with error. Exit message: internal error: qemu unexpectedly closed
the monitor: 
2020-09-01T06:49:20.749126Z qemu-kvm: warning: All CPU(s) up to maxcpus should be
described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and
will be removed in future
2020-09-01T06:49:20.927274Z qemu-kvm: -device
virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,id=ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,bootindex=1,write-cache=on:
Failed to get "write" lock
Is another process using the image?.

Which from what I could gather concerns the following snippet from the HostedEngine.xml
and it's the virtio disk of the Hosted Engine:

    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none'
error_policy='stop' io='threads' iothread='1'/>
      <source
file='/var/run/vdsm/storage/80f6e393-9718-4738-a14a-64cf43c3d8c2/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7'>
        <seclabel model='dac' relabel='no'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <serial>d5de54b6-9f8e-4fba-819b-ebf6780757d2</serial>
      <alias name='ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x07' function='0x0'/>
    </disk>

I've tried looking into the logs and the sar command but I couldn't find anything
to relate with the above errors and determining the reason for it to happen. Is this a
Gluster or a QEMU problem?

The Hosted Engine was manually migrated five days before on node1.

Is there a standard practice I could follow to determine what happened and secure my
system?

Thank you very much for your time, 
Maria Souvalioti

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] VM HostedEngine is down with error