<div dir="ltr"><div><div>I&#39;ve checked id&#39;s in  /rhev/data-center/mnt/glusterSD/*...../dom_md/      <br><br># -rw-rw----. 1 vdsm kvm  1048576 Mar 12 05:14 ids<br><br></div><div>seems ok<br></div><div><br>sanlock.log showing;<br>---------------------------<br>r14 acquire_token open error -13<br>r14 cmd_acquire 2,11,89283 acquire_token -13 <br><br></div>Now I&#39;m not quiet sure on which direction to take. <br><br>Lockspace<br>---------------<br>&quot;hosted-engine --reinitialize-lockspace&quot; is throwing an exception;<br><br>Exception(&quot;Lockfile reset cannot be performed with&quot;<br>Exception: Lockfile reset cannot be performed with an active agent.<br><br><br></div>@didi - I am in &quot;Global Maintenance&quot;. <br>I just noticed that host 1 now shows.<br>Engine status: unknown stale-data<br>state= AgentStopped<br><br>I&#39;m pretty sure Ive been able to start the Engine VM while in Global Maintenance. But you raise a good question. I don&#39;t see why you would be restricted in running the engine while in Global or even starting the VM. If so this is a little bakwards.<br><br><br><br><div><br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 12 March 2017 at 16:28, Yedidyah Bar David <span dir="ltr">&lt;<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Fri, Mar 10, 2017 at 12:39 PM, Martin Sivak &lt;<a href="mailto:msivak@redhat.com">msivak@redhat.com</a>&gt; wrote:<br>
&gt; Hi Ian,<br>
&gt;<br>
&gt; it is normal that VDSMs are competing for the lock, one should win<br>
&gt; though. If that is not the case then the lockspace might be corrupted<br>
&gt; or the sanlock daemons can&#39;t reach it.<br>
&gt;<br>
&gt; I would recommend putting the cluster to global maintenance and<br>
&gt; attempting a manual start using:<br>
&gt;<br>
&gt; # hosted-engine --set-maintenance --mode=global<br>
&gt; # hosted-engine --vm-start<br>
<br>
</span>Is that possible? See also:<br>
<br>
<a href="http://lists.ovirt.org/pipermail/users/2016-January/036993.html" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>pipermail/users/2016-January/<wbr>036993.html</a><br>
<div class="HOEnZb"><div class="h5"><br>
&gt;<br>
&gt; You will need to check your storage connectivity and sanlock status on<br>
&gt; all hosts if that does not work.<br>
&gt;<br>
&gt; # sanlock client status<br>
&gt;<br>
&gt; There are couple of locks I would expect to be there (ha_agent, spm),<br>
&gt; but no lock for hosted engine disk should be visible.<br>
&gt;<br>
&gt; Next steps depend on whether you have important VMs running on the<br>
&gt; cluster and on the Gluster status (I can&#39;t help you there<br>
&gt; unfortunately).<br>
&gt;<br>
&gt; Best regards<br>
&gt;<br>
&gt; --<br>
&gt; Martin Sivak<br>
&gt; SLA / oVirt<br>
&gt;<br>
&gt;<br>
&gt; On Fri, Mar 10, 2017 at 7:37 AM, Ian Neilsen &lt;<a href="mailto:ian.neilsen@gmail.com">ian.neilsen@gmail.com</a>&gt; wrote:<br>
&gt;&gt; I just noticed this in the vdsm.logs.  The agent looks like it is trying to<br>
&gt;&gt; start hosted engine on both machines??<br>
&gt;&gt;<br>
&gt;&gt; &lt;on_poweroff&gt;destroy&lt;/on_<wbr>poweroff&gt;&lt;on_reboot&gt;destroy&lt;/<wbr>on_reboot&gt;&lt;on_crash&gt;destroy&lt;/<wbr>on_crash&gt;&lt;/domain&gt;<br>
&gt;&gt; Thread-7517::ERROR::2017-03-10<br>
&gt;&gt; 01:26:13,053::vm::773::virt.<wbr>vm::(_startUnderlyingVm)<br>
&gt;&gt; vmId=`2419f9fe-4998-4b7a-9fe9-<wbr>151571d20379`::The vm start process failed<br>
&gt;&gt; Traceback (most recent call last):<br>
&gt;&gt;   File &quot;/usr/share/vdsm/virt/vm.py&quot;, line 714, in _startUnderlyingVm<br>
&gt;&gt; self._run()<br>
&gt;&gt;   File &quot;/usr/share/vdsm/virt/vm.py&quot;, line 2026, in _run<br>
&gt;&gt; self._connection.createXML(<wbr>domxml, flags),<br>
&gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/<wbr>libvirtconnection.py&quot;, line<br>
&gt;&gt; 123, in wrapper ret = f(*args, **kwargs)<br>
&gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/utils.py&quot;, line 917, in<br>
&gt;&gt; wrapper return func(inst, *args, **kwargs)<br>
&gt;&gt;   File &quot;/usr/lib64/python2.7/site-<wbr>packages/libvirt.py&quot;, line 3782, in<br>
&gt;&gt; createXML if ret is None:raise libvirtError(&#39;<wbr>virDomainCreateXML() failed&#39;,<br>
&gt;&gt; conn=self)<br>
&gt;&gt;<br>
&gt;&gt; libvirtError: Failed to acquire lock: Permission denied<br>
&gt;&gt;<br>
&gt;&gt; INFO::2017-03-10 01:26:13,054::vm::1330::virt.<wbr>vm::(setDownStatus)<br>
&gt;&gt; vmId=`2419f9fe-4998-4b7a-9fe9-<wbr>151571d20379`::Changed state to Down: Failed<br>
&gt;&gt; to acquire lock: Permission denied (code=1)<br>
&gt;&gt; INFO::2017-03-10 01:26:13,054::guestagent::430:<wbr>:virt.vm::(stop)<br>
&gt;&gt; vmId=`2419f9fe-4998-4b7a-9fe9-<wbr>151571d20379`::Stopping connection<br>
&gt;&gt;<br>
&gt;&gt; DEBUG::2017-03-10 01:26:13,054::vmchannels::238:<wbr>:vds::(unregister) Delete<br>
&gt;&gt; fileno 56 from listener.<br>
&gt;&gt; DEBUG::2017-03-10 01:26:13,055::vmchannels::66::<wbr>vds::(_unregister_fd) Failed<br>
&gt;&gt; to unregister FD from epoll (ENOENT): 56<br>
&gt;&gt; DEBUG::2017-03-10 01:26:13,055::__init__::209::<wbr>jsonrpc.Notification::(emit)<br>
&gt;&gt; Sending event {&quot;params&quot;: {&quot;2419f9fe-4998-4b7a-9fe9-<wbr>151571d20379&quot;: {&quot;status&quot;:<br>
&gt;&gt; &quot;Down&quot;, &quot;exitReason&quot;: 1, &quot;exitMessage&quot;: &quot;Failed to acquire lock: Permission<br>
&gt;&gt; denied&quot;, &quot;exitCode&quot;: 1}, &quot;notify_time&quot;: 4308740560}, &quot;jsonrpc&quot;: &quot;2.0&quot;,<br>
&gt;&gt; &quot;method&quot;: &quot;|virt|VM_status|2419f9fe-<wbr>4998-4b7a-9fe9-151571d20379&quot;}<br>
&gt;&gt; VM Channels Listener::DEBUG::2017-03-10<br>
&gt;&gt; 01:26:13,475::vmchannels::142:<wbr>:vds::(_do_del_channels) fileno 56 was removed<br>
&gt;&gt; from listener.<br>
&gt;&gt; DEBUG::2017-03-10 01:26:14,430::check::296::<wbr>storage.check::(_start_<wbr>process)<br>
&gt;&gt; START check<br>
&gt;&gt; u&#39;/rhev/data-center/mnt/<wbr>glusterSD/192.168.3.10:_data/<wbr>a08822ec-3f5b-4dba-ac2d-<wbr>5510f0b4b6a2/dom_md/metadata&#39;<br>
&gt;&gt; cmd=[&#39;/usr/bin/taskset&#39;, &#39;--cpu-list&#39;, &#39;0-39&#39;, &#39;/usr/bin/dd&#39;,<br>
&gt;&gt; u&#39;if=/rhev/data-center/mnt/<wbr>glusterSD/192.168.3.10:_data/<wbr>a08822ec-3f5b-4dba-ac2d-<wbr>5510f0b4b6a2/dom_md/metadata&#39;,<br>
&gt;&gt; &#39;of=/dev/null&#39;, &#39;bs=4096&#39;, &#39;count=1&#39;, &#39;iflag=direct&#39;] delay=0.00<br>
&gt;&gt; DEBUG::2017-03-10 01:26:14,481::asyncevent::564:<wbr>:storage.asyncevent::(reap)<br>
&gt;&gt; Process &lt;cpopen.CPopen object at 0x3ba6550&gt; terminated (count=1)<br>
&gt;&gt; DEBUG::2017-03-10<br>
&gt;&gt; 01:26:14,481::check::327::<wbr>storage.check::(_check_<wbr>completed) FINISH check<br>
&gt;&gt; u&#39;/rhev/data-center/mnt/<wbr>glusterSD/192.168.3.10:_data/<wbr>a08822ec-3f5b-4dba-ac2d-<wbr>5510f0b4b6a2/dom_md/metadata&#39;<br>
&gt;&gt; rc=0 err=bytearray(b&#39;0+1 records in\n0+1 records out\n300 bytes (300 B)<br>
&gt;&gt; copied, 8.7603e-05 s, 3.4 MB/s\n&#39;) elapsed=0.06<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On 10 March 2017 at 10:40, Ian Neilsen &lt;<a href="mailto:ian.neilsen@gmail.com">ian.neilsen@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Hi All<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I had a storage issue with my gluster volumes running under ovirt hosted.<br>
&gt;&gt;&gt; I now cannot start the hosted engine manager vm from &quot;hosted-engine<br>
&gt;&gt;&gt; --vm-start&quot;.<br>
&gt;&gt;&gt; I&#39;ve scoured the net to find a way, but can&#39;t seem to find anything<br>
&gt;&gt;&gt; concrete.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Running Centos7, ovirt 4.0 and gluster 3.8.9<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; How do I recover the engine manager. Im at a loss!<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Engine Status = score between nodes was 0 for all, now node 1 is reading<br>
&gt;&gt;&gt; 3400, but all others are 0<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; {&quot;reason&quot;: &quot;bad vm status&quot;, &quot;health&quot;: &quot;bad&quot;, &quot;vm&quot;: &quot;down&quot;, &quot;detail&quot;:<br>
&gt;&gt;&gt; &quot;down&quot;}<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Logs from agent.log<br>
&gt;&gt;&gt; ==================<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:52,600::state_<wbr>decorators::51::ovirt_hosted_<wbr>engine_ha.agent.hosted_engine.<wbr>HostedEngine::(check)<br>
&gt;&gt;&gt; Global maintenance detected<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:52,603::hosted_engine::<wbr>612::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>vdsm)<br>
&gt;&gt;&gt; Initializing VDSM<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:54,820::hosted_engine::<wbr>639::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>storage_images)<br>
&gt;&gt;&gt; Connecting the storage<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:54,821::storage_server::<wbr>219::ovirt_hosted_engine_ha.<wbr>lib.storage_server.<wbr>StorageServer::(connect_<wbr>storage_server)<br>
&gt;&gt;&gt; Connecting storage server<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:59,194::storage_server::<wbr>226::ovirt_hosted_engine_ha.<wbr>lib.storage_server.<wbr>StorageServer::(connect_<wbr>storage_server)<br>
&gt;&gt;&gt; Connecting storage server<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:59,211::storage_server::<wbr>233::ovirt_hosted_engine_ha.<wbr>lib.storage_server.<wbr>StorageServer::(connect_<wbr>storage_server)<br>
&gt;&gt;&gt; Refreshing the storage domain<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:59,328::hosted_engine::<wbr>666::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>storage_images)<br>
&gt;&gt;&gt; Preparing images<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:32:59,328::image::126::<wbr>ovirt_hosted_engine_ha.lib.<wbr>image.Image::(prepare_images)<br>
&gt;&gt;&gt; Preparing images<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:33:01,748::hosted_engine::<wbr>669::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(_initialize_<wbr>storage_images)<br>
&gt;&gt;&gt; Reloading vm.conf from the shared storage domain<br>
&gt;&gt;&gt; INFO::2017-03-09<br>
&gt;&gt;&gt; 19:33:01,748::config::206::<wbr>ovirt_hosted_engine_ha.agent.<wbr>hosted_engine.HostedEngine.<wbr>config::(refresh_local_conf_<wbr>file)<br>
&gt;&gt;&gt; Trying to get a fresher copy of vm configuration from the OVF_STORE<br>
&gt;&gt;&gt; WARNING::2017-03-09<br>
&gt;&gt;&gt; 19:33:04,056::ovf_store::107::<wbr>ovirt_hosted_engine_ha.lib.<wbr>ovf.ovf_store.OVFStore::(scan)<br>
&gt;&gt;&gt; Unable to find OVF_STORE<br>
&gt;&gt;&gt; ERROR::2017-03-09<br>
&gt;&gt;&gt; 19:33:04,058::config::235::<wbr>ovirt_hosted_engine_ha.agent.<wbr>hosted_engine.HostedEngine.<wbr>config::(refresh_local_conf_<wbr>file)<br>
&gt;&gt;&gt; Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; ovirt-ha-agent logs<br>
&gt;&gt;&gt; ================<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; ovirt-ha-agent<br>
&gt;&gt;&gt; ovirt_hosted_engine_ha.agent.<wbr>hosted_engine.HostedEngine.<wbr>config ERROR Unable<br>
&gt;&gt;&gt; to get vm.conf from OVF_STORE, falling back to initial vm.conf<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; vdsm<br>
&gt;&gt;&gt; ======<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; ovirt-ha-broker<br>
&gt;&gt;&gt; ============<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; ovirt-ha-broker cpu_load_no_engine.<wbr>EngineHealth ERROR Failed to<br>
&gt;&gt;&gt; getVmStats: &#39;pid&#39;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt; Ian Neilsen<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Mobile: 0424 379 762<br>
&gt;&gt;&gt; Linkedin: <a href="http://au.linkedin.com/in/ianneilsen" rel="noreferrer" target="_blank">http://au.linkedin.com/in/<wbr>ianneilsen</a><br>
&gt;&gt;&gt; Twitter : ineilsen<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Ian Neilsen<br>
&gt;&gt;<br>
&gt;&gt; Mobile: <a href="tel:0424%20379%20762" value="+61424379762">0424 379 762</a><br>
&gt;&gt; Linkedin: <a href="http://au.linkedin.com/in/ianneilsen" rel="noreferrer" target="_blank">http://au.linkedin.com/in/<wbr>ianneilsen</a><br>
&gt;&gt; Twitter : ineilsen<br>
&gt;&gt;<br>
&gt;&gt; ______________________________<wbr>_________________<br>
&gt;&gt; Users mailing list<br>
&gt;&gt; <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
&gt;&gt;<br>
&gt; ______________________________<wbr>_________________<br>
&gt; Users mailing list<br>
&gt; <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
&gt; <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
<br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
Didi<br>
</font></span></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Ian Neilsen<br><br>Mobile: 0424 379 762<br>Linkedin: <a href="http://au.linkedin.com/in/ianneilsen" target="_blank">http://au.linkedin.com/in/ianneilsen</a><div>Twitter : ineilsen</div></div></div></div>
</div>