Re: [ovirt-users] HE in bad stauts, will not start following storage issue - HELP

10 Mar 2017

      Hi Ian,

it is normal that VDSMs are competing for the lock, one should win
though. If that is not the case then the lockspace might be corrupted
or the sanlock daemons can't reach it.

I would recommend putting the cluster to global maintenance and
attempting a manual start using:

# hosted-engine --set-maintenance --mode=global
# hosted-engine --vm-start

You will need to check your storage connectivity and sanlock status on
all hosts if that does not work.

# sanlock client status

There are couple of locks I would expect to be there (ha_agent, spm),
but no lock for hosted engine disk should be visible.

Next steps depend on whether you have important VMs running on the
cluster and on the Gluster status (I can't help you there
unfortunately).

Best regards

--
Martin Sivak
SLA / oVirt

On Fri, Mar 10, 2017 at 7:37 AM, Ian Neilsen <ian.neilsen@gmail.com> wrote:
...
I just noticed this in the vdsm.logs.  The agent looks like it is trying to
start hosted engine on both machines??
<on_poweroff>destroy</on_poweroff><on_reboot>destroy</on_reboot><on_crash>destroy</on_crash></domain>
Thread-7517::ERROR::2017-03-10
01:26:13,053::vm::773::virt.vm::(_startUnderlyingVm)
vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 714, in _startUnderlyingVm
self._run()
  File "/usr/share/vdsm/virt/vm.py", line 2026, in _run
self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
123, in wrapper ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 917, in
wrapper return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3782, in
createXML if ret is None:raise libvirtError('virDomainCreateXML() failed',
conn=self)
libvirtError: Failed to acquire lock: Permission denied
INFO::2017-03-10 01:26:13,054::vm::1330::virt.vm::(setDownStatus)
vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::Changed state to Down: Failed
to acquire lock: Permission denied (code=1)
INFO::2017-03-10 01:26:13,054::guestagent::430::virt.vm::(stop)
vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::Stopping connection
DEBUG::2017-03-10 01:26:13,054::vmchannels::238::vds::(unregister) Delete
fileno 56 from listener.
DEBUG::2017-03-10 01:26:13,055::vmchannels::66::vds::(_unregister_fd) Failed
to unregister FD from epoll (ENOENT): 56
DEBUG::2017-03-10 01:26:13,055::__init__::209::jsonrpc.Notification::(emit)
Sending event {"params": {"2419f9fe-4998-4b7a-9fe9-151571d20379": {"status":
"Down", "exitReason": 1, "exitMessage": "Failed to acquire lock: Permission
denied", "exitCode": 1}, "notify_time": 4308740560}, "jsonrpc": "2.0",
"method": "|virt|VM_status|2419f9fe-4998-4b7a-9fe9-151571d20379"}
VM Channels Listener::DEBUG::2017-03-10
01:26:13,475::vmchannels::142::vds::(_do_del_channels) fileno 56 was removed
from listener.
DEBUG::2017-03-10 01:26:14,430::check::296::storage.check::(_start_process)
START check
u'/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata'
cmd=['/usr/bin/taskset', '--cpu-list', '0-39', '/usr/bin/dd',
u'if=/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata',
'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct'] delay=0.00
DEBUG::2017-03-10 01:26:14,481::asyncevent::564::storage.asyncevent::(reap)
Process <cpopen.CPopen object at 0x3ba6550> terminated (count=1)
DEBUG::2017-03-10
01:26:14,481::check::327::storage.check::(_check_completed) FINISH check
u'/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata'
rc=0 err=bytearray(b'0+1 records in\n0+1 records out\n300 bytes (300 B)
copied, 8.7603e-05 s, 3.4 MB/s\n') elapsed=0.06
On 10 March 2017 at 10:40, Ian Neilsen <ian.neilsen@gmail.com> wrote:
...
Hi All
I had a storage issue with my gluster volumes running under ovirt hosted.
I now cannot start the hosted engine manager vm from "hosted-engine
--vm-start".
I've scoured the net to find a way, but can't seem to find anything
concrete.
Running Centos7, ovirt 4.0 and gluster 3.8.9
How do I recover the engine manager. Im at a loss!
Engine Status = score between nodes was 0 for all, now node 1 is reading
3400, but all others are 0
{"reason": "bad vm status", "health": "bad", "vm": "down", "detail":
"down"}
Logs from agent.log
==================
INFO::2017-03-09
19:32:52,600::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
INFO::2017-03-09
19:32:52,603::hosted_engine::612::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Initializing VDSM
INFO::2017-03-09
19:32:54,820::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
Connecting the storage
INFO::2017-03-09
19:32:54,821::storage_server::219::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
INFO::2017-03-09
19:32:59,194::storage_server::226::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
INFO::2017-03-09
19:32:59,211::storage_server::233::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
INFO::2017-03-09
19:32:59,328::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
Preparing images
INFO::2017-03-09
19:32:59,328::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
Preparing images
INFO::2017-03-09
19:33:01,748::hosted_engine::669::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
Reloading vm.conf from the shared storage domain
INFO::2017-03-09
19:33:01,748::config::206::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file)
Trying to get a fresher copy of vm configuration from the OVF_STORE
WARNING::2017-03-09
19:33:04,056::ovf_store::107::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Unable to find OVF_STORE
ERROR::2017-03-09
19:33:04,058::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file)
Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
ovirt-ha-agent logs
================
ovirt-ha-agent
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable
to get vm.conf from OVF_STORE, falling back to initial vm.conf
vdsm
======
vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof
ovirt-ha-broker
============
ovirt-ha-broker cpu_load_no_engine.EngineHealth ERROR Failed to
getVmStats: 'pid'
--
Ian Neilsen
Mobile: 0424 379 762
Linkedin: http://au.linkedin.com/in/ianneilsen
Twitter : ineilsen
--
Ian Neilsen
Mobile: 0424 379 762
Linkedin: http://au.linkedin.com/in/ianneilsen
Twitter : ineilsen
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users