<div dir="ltr">Ok i mount manualy the domain for hosted engine and agent go up.<div><br></div><div>But vm-status :</div><div><br></div><div><div>--== Host 2 status ==--</div><div><br></div><div>Status up-to-date : False</div>
<div>Hostname : 192.168.99.103</div><div>Host ID : 2</div><div>Engine status : unknown stale-data</div><div>Score : 0</div>
<div>Local maintenance : False</div><div>Host timestamp : 1398333438</div></div><div><br></div><div>And in my engine, host02 Ha is no active.</div></div><div class="gmail_extra"><br><br>
<div class="gmail_quote">2014-04-24 12:48 GMT+02:00 Kevin Tibi <span dir="ltr"><<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hi,<div><br></div><div>I try to reboot my hosts and now [supervdsmServer] is <defunct>.</div><div><br></div><div>/var/log/vdsm/supervdsm.log</div><div><br></div><div><div><br></div><div>MainProcess|Thread-120::DEBUG::2014-04-24 12:22:19,955::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper) return validateAccess with None</div>
<div>MainProcess|Thread-120::DEBUG::2014-04-24 12:22:20,010::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper) call validateAccess with ('qemu', ('qemu', 'kvm'), '/rhev/data-center/mnt/host01.ovirt.lan:_home_export', 5) {}</div>
<div>MainProcess|Thread-120::DEBUG::2014-04-24 12:22:20,014::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper) return validateAccess with None</div><div>MainProcess|Thread-120::DEBUG::2014-04-24 12:22:20,059::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper) call validateAccess with ('qemu', ('qemu', 'kvm'), '/rhev/data-center/mnt/host01.ovirt.lan:_home_iso', 5) {}</div>
<div>MainProcess|Thread-120::DEBUG::2014-04-24 12:22:20,063::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper) return validateAccess with None</div></div><div><br></div><div>and one host don't mount the NFS used for hosted engine.</div>
<div><br></div><div><div>MainThread::CRITICAL::2014-04-24 12:36:16,603::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent</div><div class=""><div>Traceback (most recent call last):</div>
</div><div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in run</div>
<div> self._run_agent()</div><div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, in _run_agent</div><div> hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()</div>
<div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 299, in start_monitoring</div><div> self._initialize_vdsm()</div><div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 418, in _initialize_vdsm</div>
<div> self._sd_path = env_path.get_domain_path(self._config)</div><div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/path.py", line 40, in get_domain_path</div><div> .format(sd_uuid, parent))</div>
<div>Exception: path to storage domain aea040f8-ab9d-435b-9ecf-ddd4272e592f not found in /rhev/data-center/mnt</div></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-23 17:40 GMT+02:00 Kevin Tibi <span dir="ltr"><<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a>></span>:<div>
<div class="h5"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>top</div><div><div>1729 vdsm 20 0 0 0 0 Z <font color="#ff0000">373.8</font> 0.0 252:08.51 ovirt-ha-broker <defunct></div>
</div><div><br></div><div><br></div><div>[root@host01 ~]# ps axwu | grep 1729</div>
<div>vdsm 1729 0.7 0.0 0 0 ? Zl Apr02 240:24 [ovirt-ha-broker] <defunct></div><div><br></div><div><div>[root@host01 ~]# ll /rhev/data-center/mnt/host01.ovirt.lan\:_home_NFS01/aea040f8-ab9d-435b-9ecf-ddd4272e592f/ha_agent/</div>
<div>total 2028</div><div>-rw-rw----. 1 vdsm kvm 1048576 23 avril 17:35 hosted-engine.lockspace</div><div>-rw-rw----. 1 vdsm kvm 1028096 23 avril 17:35 hosted-engine.metadata</div></div><div><br></div><div>cat /var/log/vdsm/vdsm.log</div>
<div><br></div><div><div>Thread-120518::DEBUG::2014-04-23 17:38:02,299::task::1185::TaskManager.Task::(prepare) Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::finished: {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000410963', 'lastCheck': '3.4', 'valid': True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000412357', 'lastCheck': '6.8', 'valid': True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.000455292', 'lastCheck': '1.2', 'valid': True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.00817113', 'lastCheck': '1.7', 'valid': True}}</div>
<div>Thread-120518::DEBUG::2014-04-23 17:38:02,300::task::595::TaskManager.Task::(_updateState) Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::moving from state preparing -> state finished</div><div>Thread-120518::DEBUG::2014-04-23 17:38:02,300::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}</div>
<div>Thread-120518::DEBUG::2014-04-23 17:38:02,300::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}</div><div>Thread-120518::DEBUG::2014-04-23 17:38:02,300::task::990::TaskManager.Task::(_decref) Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::ref 0 aborting False</div>
<div>Thread-120518::ERROR::2014-04-23 17:38:02,302::brokerlink::72::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect) Failed to connect to broker: [Errno 2] No such file or directory</div><div>Thread-120518::ERROR::2014-04-23 17:38:02,302::API::1612::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info</div>
<div>
<div>Traceback (most recent call last):</div></div><div> File "/usr/share/vdsm/API.py", line 1603, in _getHaInfo</div><div> stats = instance.get_all_stats()</div><div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 83, in get_all_stats</div>
<div> with broker.connection():</div><div> File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__</div><div> return self.gen.next()</div><div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 96, in connection</div>
<div> self.connect()</div><div> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 64, in connect</div><div> self._socket.connect(constants.BROKER_SOCKET_FILE)</div><div>
File "<string>", line 1, in connect</div><div>error: [Errno 2] No such file or directory</div><div>Thread-78::DEBUG::2014-04-23 17:38:05,490::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/5ae613a4-44e4-42cb-89fc-7b5d34c1f30f/dom_md/metadata bs=4096 count=1' (cwd None)</div>
<div>Thread-78::DEBUG::2014-04-23 17:38:05,523::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n545 bytes (545 B) copied, 0.000412209 s, 1.3 MB/s\n'; <rc> = 0</div>
</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-23 17:27 GMT+02:00 Martin Sivak <span dir="ltr"><<a href="mailto:msivak@redhat.com" target="_blank">msivak@redhat.com</a>></span>:<div>
<div><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Kevin,<br>
<br>
> same pb.<br>
<br>
Are you missing the lockspace file as well while running on top of GlusterFS?<br>
<div><br>
> ovirt-ha-broker have 400% cpu and is defunct. I can't kill with -9.<br>
<br>
</div>Defunct process eating full four cores? I wonder how is that possible.. What are the status flags of that process when you do ps axwu?<br>
<br>
Can you attach the log files please?<br>
<div><div><br>
--<br>
Martin Sivák<br>
<a href="mailto:msivak@redhat.com" target="_blank">msivak@redhat.com</a><br>
Red Hat Czech<br>
RHEV-M SLA / Brno, CZ<br>
<br>
----- Original Message -----<br>
> same pb. ovirt-ha-broker have 400% cpu and is defunct. I can't kill with -9.<br>
><br>
><br>
> 2014-04-23 13:55 GMT+02:00 Martin Sivak <<a href="mailto:msivak@redhat.com" target="_blank">msivak@redhat.com</a>>:<br>
><br>
> > Hi,<br>
> ><br>
> > > Isn't this file created when hosted engine is started?<br>
> ><br>
> > The file is created by the setup script. If it got lost then there was<br>
> > probably something bad happening in your NFS or Gluster storage.<br>
> ><br>
> > > Or how can I create this file manually?<br>
> ><br>
> > I can give you experimental treatment for this. We do not have any<br>
> > official way as this is something that should not ever happen :)<br>
> ><br>
> > !! But before you do that make sure you do not have any nodes running<br>
> > properly. This will destroy and reinitialize the lockspace database for the<br>
> > whole hosted-engine environment (which you apparently lack, but..). !!<br>
> ><br>
> > You have to create the ha_agent/hosted-engine.lockspace file with the<br>
> > expected size (1MB) and then tell sanlock to initialize it as a lockspace<br>
> > using:<br>
> ><br>
> > # python<br>
> > >>> import sanlock<br>
> > >>> sanlock.write_lockspace(lockspace="hosted-engine",<br>
> > ... path="/rhev/data-center/mnt/<nfs>/<hosted engine storage<br>
> > domain>/ha_agent/hosted-engine.lockspace",<br>
> > ... offset=0)<br>
> > >>><br>
> ><br>
> > Then try starting the services (both broker and agent) again.<br>
> ><br>
> > --<br>
> > Martin Sivák<br>
> > <a href="mailto:msivak@redhat.com" target="_blank">msivak@redhat.com</a><br>
> > Red Hat Czech<br>
> > RHEV-M SLA / Brno, CZ<br>
> ><br>
> ><br>
> > ----- Original Message -----<br>
> > > On 04/23/2014 11:08 AM, Martin Sivak wrote:<br>
> > > > Hi René,<br>
> > > ><br>
> > > >>>> libvirtError: Failed to acquire lock: No space left on device<br>
> > > ><br>
> > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid<br>
> > > >>>> lockspace found -1 failed 0 name<br>
> > 2851af27-8744-445d-9fb1-a0d083c8dc82<br>
> > > ><br>
> > > > Can you please check the contents of /rhev/data-center/<your nfs<br>
> > > > mount>/<nfs domain uuid>/ha_agent/?<br>
> > > ><br>
> > > > This is how it should look like:<br>
> > > ><br>
> > > > [root@dev-03 ~]# ls -al<br>
> > > ><br>
> > /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/<br>
> > > > total 2036<br>
> > > > drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 .<br>
> > > > drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 ..<br>
> > > > -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace<br>
> > > > -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata<br>
> > > ><br>
> > > > The errors seem to indicate that you somehow lost the lockspace file.<br>
> > ><br>
> > > True :)<br>
> > > Isn't this file created when hosted engine is started? Or how can I<br>
> > > create this file manually?<br>
> > ><br>
> > > ><br>
> > > > --<br>
> > > > Martin Sivák<br>
> > > > <a href="mailto:msivak@redhat.com" target="_blank">msivak@redhat.com</a><br>
> > > > Red Hat Czech<br>
> > > > RHEV-M SLA / Brno, CZ<br>
> > > ><br>
> > > > ----- Original Message -----<br>
> > > >> On 04/23/2014 12:28 AM, Doron Fediuck wrote:<br>
> > > >>> Hi Rene,<br>
> > > >>> any idea what closed your ovirtmgmt bridge?<br>
> > > >>> as long as it is down vdsm may have issues starting up properly<br>
> > > >>> and this is why you see the complaints on the rpc server.<br>
> > > >>><br>
> > > >>> Can you try manually fixing the network part first and then<br>
> > > >>> restart vdsm?<br>
> > > >>> Once vdsm is happy hosted engine VM will start.<br>
> > > >><br>
> > > >> Thanks for your feedback, Doron.<br>
> > > >><br>
> > > >> My ovirtmgmt bridge seems to be on or isn't it:<br>
> > > >> # brctl show ovirtmgmt<br>
> > > >> bridge name bridge id STP enabled interfaces<br>
> > > >> ovirtmgmt 8000.0025907587c2 no eth0.200<br>
> > > >><br>
> > > >> # ip a s ovirtmgmt<br>
> > > >> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue<br>
> > > >> state UNKNOWN<br>
> > > >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff<br>
> > > >> inet <a href="http://10.0.200.102/24" target="_blank">10.0.200.102/24</a> brd 10.0.200.255 scope global ovirtmgmt<br>
> > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope link<br>
> > > >> valid_lft forever preferred_lft forever<br>
> > > >><br>
> > > >> # ip a s eth0.200<br>
> > > >> 6: eth0.200@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc<br>
> > > >> noqueue state UP<br>
> > > >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff<br>
> > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope link<br>
> > > >> valid_lft forever preferred_lft forever<br>
> > > >><br>
> > > >> I tried the following yesterday:<br>
> > > >> Copy virtual disk from GlusterFS storage to local disk of host and<br>
> > > >> create a new vm with virt-manager which loads ovirtmgmt disk. I could<br>
> > > >> reach my engine over the ovirtmgmt bridge (so bridge must be working).<br>
> > > >><br>
> > > >> I also started libvirtd with Option -v and I saw the following in<br>
> > > >> libvirtd.log when trying to start ovirt engine:<br>
> > > >> 2014-04-22 14:18:25.432+0000: 8901: debug : virCommandRunAsync:2250 :<br>
> > > >> Command result 0, with PID 11491<br>
> > > >> 2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 :<br>
> > Result<br>
> > > >> exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0'<br>
> > is<br>
> > > >> not a chain<br>
> > > >><br>
> > > >> So it could be that something is broken in my hosted-engine network.<br>
> > Do<br>
> > > >> you have any clue how I can troubleshoot this?<br>
> > > >><br>
> > > >><br>
> > > >> Thanks,<br>
> > > >> René<br>
> > > >><br>
> > > >><br>
> > > >>><br>
> > > >>> ----- Original Message -----<br>
> > > >>>> From: "René Koch" <<a href="mailto:rkoch@linuxland.at" target="_blank">rkoch@linuxland.at</a>><br>
> > > >>>> To: "Martin Sivak" <<a href="mailto:msivak@redhat.com" target="_blank">msivak@redhat.com</a>><br>
> > > >>>> Cc: <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>
> > > >>>> Sent: Tuesday, April 22, 2014 1:46:38 PM<br>
> > > >>>> Subject: Re: [ovirt-users] hosted engine health check issues<br>
> > > >>>><br>
> > > >>>> Hi,<br>
> > > >>>><br>
> > > >>>> I rebooted one of my ovirt hosts today and the result is now that I<br>
> > > >>>> can't start hosted-engine anymore.<br>
> > > >>>><br>
> > > >>>> ovirt-ha-agent isn't running because the lockspace file is missing<br>
> > > >>>> (sanlock complains about it).<br>
> > > >>>> So I tried to start hosted-engine with --vm-start and I get the<br>
> > > >>>> following errors:<br>
> > > >>>><br>
> > > >>>> ==> /var/log/sanlock.log <==<br>
> > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid<br>
> > > >>>> lockspace found -1 failed 0 name<br>
> > 2851af27-8744-445d-9fb1-a0d083c8dc82<br>
> > > >>>><br>
> > > >>>> ==> /var/log/messages <==<br>
> > > >>>> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22<br>
> > 12:38:17+0200 654<br>
> > > >>>> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0<br>
> > name<br>
> > > >>>> 2851af27-8744-445d-9fb1-a0d083c8dc82<br>
> > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)<br>
> > entering<br>
> > > >>>> disabled state<br>
> > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous<br>
> > mode<br>
> > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)<br>
> > entering<br>
> > > >>>> disabled state<br>
> > > >>>><br>
> > > >>>> ==> /var/log/vdsm/vdsm.log <==<br>
> > > >>>> Thread-21::DEBUG::2014-04-22<br>
> > > >>>> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown<br>
> > > >>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire<br>
> > > >>>> lock: No space left on device<br>
> > > >>>> Thread-21::DEBUG::2014-04-22<br>
> > > >>>> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)<br>
> > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations<br>
> > released<br>
> > > >>>> Thread-21::ERROR::2014-04-22<br>
> > > >>>> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)<br>
> > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process<br>
> > failed<br>
> > > >>>> Traceback (most recent call last):<br>
> > > >>>> File "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm<br>
> > > >>>> self._run()<br>
> > > >>>> File "/usr/share/vdsm/vm.py", line 3170, in _run<br>
> > > >>>> self._connection.createXML(domxml, flags),<br>
> > > >>>> File<br>
> > > >>>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",<br>
> > > >>>> line 92, in wrapper<br>
> > > >>>> ret = f(*args, **kwargs)<br>
> > > >>>> File "/usr/lib64/python2.6/site-packages/libvirt.py", line<br>
> > 2665, in<br>
> > > >>>> createXML<br>
> > > >>>> if ret is None:raise libvirtError('virDomainCreateXML()<br>
> > failed',<br>
> > > >>>> conn=self)<br>
> > > >>>> libvirtError: Failed to acquire lock: No space left on device<br>
> > > >>>><br>
> > > >>>> ==> /var/log/messages <==<br>
> > > >>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR<br>
> > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process<br>
> > > >>>> failed#012Traceback (most recent call last):#012 File<br>
> > > >>>> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012<br>
> > > >>>> self._run()#012 File "/usr/share/vdsm/vm.py", line 3170, in<br>
> > _run#012<br>
> > > >>>> self._connection.createXML(domxml, flags),#012 File<br>
> > > >>>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",<br>
> > line 92,<br>
> > > >>>> in wrapper#012 ret = f(*args, **kwargs)#012 File<br>
> > > >>>> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in<br>
> > > >>>> createXML#012 if ret is None:raise<br>
> > libvirtError('virDomainCreateXML()<br>
> > > >>>> failed', conn=self)#012libvirtError: Failed to acquire lock: No<br>
> > space<br>
> > > >>>> left on device<br>
> > > >>>><br>
> > > >>>> ==> /var/log/vdsm/vdsm.log <==<br>
> > > >>>> Thread-21::DEBUG::2014-04-22<br>
> > > >>>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)<br>
> > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:<br>
> > > >>>> Failed to acquire lock: No space left on device<br>
> > > >>>><br>
> > > >>>><br>
> > > >>>> No space left on device is nonsense as there is enough space (I had<br>
> > this<br>
> > > >>>> issue last time as well where I had to patch machine.py, but this<br>
> > file<br>
> > > >>>> is now Python 2.6.6 compatible.<br>
> > > >>>><br>
> > > >>>> Any idea what prevents hosted-engine from starting?<br>
> > > >>>> ovirt-ha-broker, vdsmd and sanlock are running btw.<br>
> > > >>>><br>
> > > >>>> Btw, I can see in log that json rpc server module is missing - which<br>
> > > >>>> package is required for CentOS 6.5?<br>
> > > >>>> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the<br>
> > json<br>
> > > >>>> rpc server module. Please make sure it is installed.<br>
> > > >>>><br>
> > > >>>><br>
> > > >>>> Thanks,<br>
> > > >>>> René<br>
> > > >>>><br>
> > > >>>><br>
> > > >>>><br>
> > > >>>> On 04/17/2014 10:02 AM, Martin Sivak wrote:<br>
> > > >>>>> Hi,<br>
> > > >>>>><br>
> > > >>>>>>>> How can I disable notifications?<br>
> > > >>>>><br>
> > > >>>>> The notification is configured in<br>
> > > >>>>> /etc/ovirt-hosted-engine-ha/broker.conf<br>
> > > >>>>> section notification.<br>
> > > >>>>> The email is sent when the key state_transition exists and the<br>
> > string<br>
> > > >>>>> OldState-NewState contains the (case insensitive) regexp from the<br>
> > > >>>>> value.<br>
> > > >>>>><br>
> > > >>>>>>>> Is it intended to send out these messages and detect that ovirt<br>
> > > >>>>>>>> engine<br>
> > > >>>>>>>> is down (which is false anyway), but not to restart the vm?<br>
> > > >>>>><br>
> > > >>>>> Forget about emails for now and check the<br>
> > > >>>>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and<br>
> > attach<br>
> > > >>>>> them<br>
> > > >>>>> as well btw).<br>
> > > >>>>><br>
> > > >>>>>>>> oVirt hosts think that hosted engine is down because it seems<br>
> > that<br>
> > > >>>>>>>> hosts<br>
> > > >>>>>>>> can't write to hosted-engine.lockspace due to glusterfs issues<br>
> > (or<br>
> > > >>>>>>>> at<br>
> > > >>>>>>>> least I think so).<br>
> > > >>>>><br>
> > > >>>>> The hosts think so or can't really write there? The lockspace is<br>
> > > >>>>> managed<br>
> > > >>>>> by<br>
> > > >>>>> sanlock and our HA daemons do not touch it at all. We only ask<br>
> > sanlock<br>
> > > >>>>> to<br>
> > > >>>>> get make sure we have unique server id.<br>
> > > >>>>><br>
> > > >>>>>>>> Is is possible or planned to make the whole ha feature optional?<br>
> > > >>>>><br>
> > > >>>>> Well the system won't perform any automatic actions if you put the<br>
> > > >>>>> hosted<br>
> > > >>>>> engine to global maintenance and only start/stop/migrate the VM<br>
> > > >>>>> manually.<br>
> > > >>>>> I would discourage you from stopping agent/broker, because the<br>
> > engine<br>
> > > >>>>> itself has some logic based on the reporting.<br>
> > > >>>>><br>
> > > >>>>> Regards<br>
> > > >>>>><br>
> > > >>>>> --<br>
> > > >>>>> Martin Sivák<br>
> > > >>>>> <a href="mailto:msivak@redhat.com" target="_blank">msivak@redhat.com</a><br>
> > > >>>>> Red Hat Czech<br>
> > > >>>>> RHEV-M SLA / Brno, CZ<br>
> > > >>>>><br>
> > > >>>>> ----- Original Message -----<br>
> > > >>>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:<br>
> > > >>>>>>> On 04/14/2014 10:50 AM, René Koch wrote:<br>
> > > >>>>>>>> Hi,<br>
> > > >>>>>>>><br>
> > > >>>>>>>> I have some issues with hosted engine status.<br>
> > > >>>>>>>><br>
> > > >>>>>>>> oVirt hosts think that hosted engine is down because it seems<br>
> > that<br>
> > > >>>>>>>> hosts<br>
> > > >>>>>>>> can't write to hosted-engine.lockspace due to glusterfs issues<br>
> > (or<br>
> > > >>>>>>>> at<br>
> > > >>>>>>>> least I think so).<br>
> > > >>>>>>>><br>
> > > >>>>>>>> Here's the output of vm-status:<br>
> > > >>>>>>>><br>
> > > >>>>>>>> # hosted-engine --vm-status<br>
> > > >>>>>>>><br>
> > > >>>>>>>><br>
> > > >>>>>>>> --== Host 1 status ==--<br>
> > > >>>>>>>><br>
> > > >>>>>>>> Status up-to-date : False<br>
> > > >>>>>>>> Hostname : 10.0.200.102<br>
> > > >>>>>>>> Host ID : 1<br>
> > > >>>>>>>> Engine status : unknown stale-data<br>
> > > >>>>>>>> Score : 2400<br>
> > > >>>>>>>> Local maintenance : False<br>
> > > >>>>>>>> Host timestamp : 1397035677<br>
> > > >>>>>>>> Extra metadata (valid at timestamp):<br>
> > > >>>>>>>> metadata_parse_version=1<br>
> > > >>>>>>>> metadata_feature_version=1<br>
> > > >>>>>>>> timestamp=1397035677 (Wed Apr 9 11:27:57 2014)<br>
> > > >>>>>>>> host-id=1<br>
> > > >>>>>>>> score=2400<br>
> > > >>>>>>>> maintenance=False<br>
> > > >>>>>>>> state=EngineUp<br>
> > > >>>>>>>><br>
> > > >>>>>>>><br>
> > > >>>>>>>> --== Host 2 status ==--<br>
> > > >>>>>>>><br>
> > > >>>>>>>> Status up-to-date : True<br>
> > > >>>>>>>> Hostname : 10.0.200.101<br>
> > > >>>>>>>> Host ID : 2<br>
> > > >>>>>>>> Engine status : {'reason': 'vm not running<br>
> > on<br>
> > > >>>>>>>> this<br>
> > > >>>>>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}<br>
> > > >>>>>>>> Score : 0<br>
> > > >>>>>>>> Local maintenance : False<br>
> > > >>>>>>>> Host timestamp : 1397464031<br>
> > > >>>>>>>> Extra metadata (valid at timestamp):<br>
> > > >>>>>>>> metadata_parse_version=1<br>
> > > >>>>>>>> metadata_feature_version=1<br>
> > > >>>>>>>> timestamp=1397464031 (Mon Apr 14 10:27:11 2014)<br>
> > > >>>>>>>> host-id=2<br>
> > > >>>>>>>> score=0<br>
> > > >>>>>>>> maintenance=False<br>
> > > >>>>>>>> state=EngineUnexpectedlyDown<br>
> > > >>>>>>>> timeout=Mon Apr 14 10:35:05 2014<br>
> > > >>>>>>>><br>
> > > >>>>>>>> oVirt engine is sending me 2 emails every 10 minutes with the<br>
> > > >>>>>>>> following<br>
> > > >>>>>>>> subjects:<br>
> > > >>>>>>>> - ovirt-hosted-engine state transition EngineDown-EngineStart<br>
> > > >>>>>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp<br>
> > > >>>>>>>><br>
> > > >>>>>>>> In oVirt webadmin I can see the following message:<br>
> > > >>>>>>>> VM HostedEngine is down. Exit message: internal error Failed to<br>
> > > >>>>>>>> acquire<br>
> > > >>>>>>>> lock: error -243.<br>
> > > >>>>>>>><br>
> > > >>>>>>>> These messages are really annoying as oVirt isn't doing anything<br>
> > > >>>>>>>> with<br>
> > > >>>>>>>> hosted engine - I have an uptime of 9 days in my engine vm.<br>
> > > >>>>>>>><br>
> > > >>>>>>>> So my questions are now:<br>
> > > >>>>>>>> Is it intended to send out these messages and detect that ovirt<br>
> > > >>>>>>>> engine<br>
> > > >>>>>>>> is down (which is false anyway), but not to restart the vm?<br>
> > > >>>>>>>><br>
> > > >>>>>>>> How can I disable notifications? I'm planning to write a Nagios<br>
> > > >>>>>>>> plugin<br>
> > > >>>>>>>> which parses the output of hosted-engine --vm-status and only<br>
> > Nagios<br>
> > > >>>>>>>> should notify me, not hosted-engine script.<br>
> > > >>>>>>>><br>
> > > >>>>>>>> Is is possible or planned to make the whole ha feature<br>
> > optional? I<br>
> > > >>>>>>>> really really really hate cluster software as it causes more<br>
> > > >>>>>>>> troubles<br>
> > > >>>>>>>> then standalone machines and in my case the hosted-engine ha<br>
> > feature<br>
> > > >>>>>>>> really causes troubles (and I didn't had a hardware or network<br>
> > > >>>>>>>> outage<br>
> > > >>>>>>>> yet only issues with hosted-engine ha agent). I don't need any<br>
> > ha<br>
> > > >>>>>>>> feature for hosted engine. I just want to run engine<br>
> > virtualized on<br>
> > > >>>>>>>> oVirt and if engine vm fails (e.g. because of issues with a<br>
> > host)<br>
> > > >>>>>>>> I'll<br>
> > > >>>>>>>> restart it on another node.<br>
> > > >>>>>>><br>
> > > >>>>>>> Hi, you can:<br>
> > > >>>>>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and<br>
> > tweak<br>
> > > >>>>>>> the logger as you like<br>
> > > >>>>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services<br>
> > > >>>>>><br>
> > > >>>>>> Thanks for the information.<br>
> > > >>>>>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent<br>
> > isn't<br>
> > > >>>>>> running?<br>
> > > >>>>>><br>
> > > >>>>>><br>
> > > >>>>>> Regards,<br>
> > > >>>>>> René<br>
> > > >>>>>><br>
> > > >>>>>>><br>
> > > >>>>>>> --Jirka<br>
> > > >>>>>>>><br>
> > > >>>>>>>> Thanks,<br>
> > > >>>>>>>> René<br>
> > > >>>>>>>><br>
> > > >>>>>>>><br>
> > > >>>>>>><br>
> > > >>>>>> _______________________________________________<br>
> > > >>>>>> Users mailing list<br>
> > > >>>>>> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
> > > >>>>>> <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> > > >>>>>><br>
> > > >>>> _______________________________________________<br>
> > > >>>> Users mailing list<br>
> > > >>>> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
> > > >>>> <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> > > >>>><br>
> > > >><br>
> > ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
> > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> ><br>
><br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
</div></div></blockquote></div></div></div><br></div>
</blockquote></div></div></div><br></div>
</blockquote></div><br></div>