Hi Ian,
it is normal that VDSMs are competing for the lock, one should win
though. If that is not the case then the lockspace might be corrupted
or the sanlock daemons can't reach it.
I would recommend putting the cluster to global maintenance and
attempting a manual start using:
# hosted-engine --set-maintenance --mode=global
# hosted-engine --vm-start
You will need to check your storage connectivity and sanlock status on
all hosts if that does not work.
# sanlock client status
There are couple of locks I would expect to be there (ha_agent, spm),
but no lock for hosted engine disk should be visible.
Next steps depend on whether you have important VMs running on the
cluster and on the Gluster status (I can't help you there
unfortunately).
Best regards
--
Martin Sivak
SLA / oVirt
On Fri, Mar 10, 2017 at 7:37 AM, Ian Neilsen <ian.neilsen(a)gmail.com> wrote:
I just noticed this in the vdsm.logs. The agent looks like it is
trying to
start hosted engine on both machines??
<on_poweroff>destroy</on_poweroff><on_reboot>destroy</on_reboot><on_crash>destroy</on_crash></domain>
Thread-7517::ERROR::2017-03-10
01:26:13,053::vm::773::virt.vm::(_startUnderlyingVm)
vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::The vm start process failed
Traceback (most recent call last):
File "/usr/share/vdsm/virt/vm.py", line 714, in _startUnderlyingVm
self._run()
File "/usr/share/vdsm/virt/vm.py", line 2026, in _run
self._connection.createXML(domxml, flags),
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
123, in wrapper ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 917, in
wrapper return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3782, in
createXML if ret is None:raise libvirtError('virDomainCreateXML() failed',
conn=self)
libvirtError: Failed to acquire lock: Permission denied
INFO::2017-03-10 01:26:13,054::vm::1330::virt.vm::(setDownStatus)
vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::Changed state to Down: Failed
to acquire lock: Permission denied (code=1)
INFO::2017-03-10 01:26:13,054::guestagent::430::virt.vm::(stop)
vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::Stopping connection
DEBUG::2017-03-10 01:26:13,054::vmchannels::238::vds::(unregister) Delete
fileno 56 from listener.
DEBUG::2017-03-10 01:26:13,055::vmchannels::66::vds::(_unregister_fd) Failed
to unregister FD from epoll (ENOENT): 56
DEBUG::2017-03-10 01:26:13,055::__init__::209::jsonrpc.Notification::(emit)
Sending event {"params": {"2419f9fe-4998-4b7a-9fe9-151571d20379":
{"status":
"Down", "exitReason": 1, "exitMessage": "Failed to
acquire lock: Permission
denied", "exitCode": 1}, "notify_time": 4308740560},
"jsonrpc": "2.0",
"method": "|virt|VM_status|2419f9fe-4998-4b7a-9fe9-151571d20379"}
VM Channels Listener::DEBUG::2017-03-10
01:26:13,475::vmchannels::142::vds::(_do_del_channels) fileno 56 was removed
from listener.
DEBUG::2017-03-10 01:26:14,430::check::296::storage.check::(_start_process)
START check
u'/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata'
cmd=['/usr/bin/taskset', '--cpu-list', '0-39',
'/usr/bin/dd',
u'if=/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata',
'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct']
delay=0.00
DEBUG::2017-03-10 01:26:14,481::asyncevent::564::storage.asyncevent::(reap)
Process <cpopen.CPopen object at 0x3ba6550> terminated (count=1)
DEBUG::2017-03-10
01:26:14,481::check::327::storage.check::(_check_completed) FINISH check
u'/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata'
rc=0 err=bytearray(b'0+1 records in\n0+1 records out\n300 bytes (300 B)
copied, 8.7603e-05 s, 3.4 MB/s\n') elapsed=0.06
On 10 March 2017 at 10:40, Ian Neilsen <ian.neilsen(a)gmail.com> wrote:
>
> Hi All
>
> I had a storage issue with my gluster volumes running under ovirt hosted.
> I now cannot start the hosted engine manager vm from "hosted-engine
> --vm-start".
> I've scoured the net to find a way, but can't seem to find anything
> concrete.
>
> Running Centos7, ovirt 4.0 and gluster 3.8.9
>
> How do I recover the engine manager. Im at a loss!
>
> Engine Status = score between nodes was 0 for all, now node 1 is reading
> 3400, but all others are 0
>
> {"reason": "bad vm status", "health": "bad",
"vm": "down", "detail":
> "down"}
>
>
> Logs from agent.log
> ==================
>
> INFO::2017-03-09
>
19:32:52,600::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
> Global maintenance detected
> INFO::2017-03-09
>
19:32:52,603::hosted_engine::612::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Initializing VDSM
> INFO::2017-03-09
>
19:32:54,820::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> Connecting the storage
> INFO::2017-03-09
>
19:32:54,821::storage_server::219::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> INFO::2017-03-09
>
19:32:59,194::storage_server::226::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> INFO::2017-03-09
>
19:32:59,211::storage_server::233::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Refreshing the storage domain
> INFO::2017-03-09
>
19:32:59,328::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> Preparing images
> INFO::2017-03-09
> 19:32:59,328::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
> Preparing images
> INFO::2017-03-09
>
19:33:01,748::hosted_engine::669::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> Reloading vm.conf from the shared storage domain
> INFO::2017-03-09
>
19:33:01,748::config::206::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file)
> Trying to get a fresher copy of vm configuration from the OVF_STORE
> WARNING::2017-03-09
>
19:33:04,056::ovf_store::107::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Unable to find OVF_STORE
> ERROR::2017-03-09
>
19:33:04,058::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file)
> Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
>
> ovirt-ha-agent logs
> ================
>
> ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable
> to get vm.conf from OVF_STORE, falling back to initial vm.conf
>
> vdsm
> ======
>
> vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof
>
> ovirt-ha-broker
> ============
>
> ovirt-ha-broker cpu_load_no_engine.EngineHealth ERROR Failed to
> getVmStats: 'pid'
>
> --
> Ian Neilsen
>
> Mobile: 0424 379 762
> Linkedin:
http://au.linkedin.com/in/ianneilsen
> Twitter : ineilsen
--
Ian Neilsen
Mobile: 0424 379 762
Linkedin:
http://au.linkedin.com/in/ianneilsen
Twitter : ineilsen
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users