<div dir="ltr"><div><div><div><div>Hi all, <br><br></div>I had put a specif email alert during the deploy and then I wanted to change it. <br></div>I did the following: <br><br></div>At one of the hosts ra: <br><br>
<span style="font-family: courier\ new, courier, monospace;">hosted-engine --set-shared-config destination-emails <a href="mailto:alerts@domain.com">alerts@domain.com</a> --type=broker<br><br></span></div><div><span style="font-family: courier\ new, courier, monospace;">systemctl restart ovirt-ha-broker.service<br><span style="font-family:arial,helvetica,sans-serif"><br></span></span></div><div><span style="font-family: courier\ new, courier, monospace;"><span style="font-family:arial,helvetica,sans-serif">I had to do the above since changing the email from GUI did not have any effect. <br><br>After the above the emails are received at the new email address but the cluster seems to have some issue recognizing the state of engine. i am flooded with emails that "
<font size="2"><span style="font-size:11pt">EngineMaybeAway-EngineUnexpectedlyDown</span></font>
"</span><br><br></span></div><div>I have restarted at each host also the ovirt-ha-agent.service. <br></div><div>Did put the cluster to global maintenance and then disabled global maintenance. <br><br></div><div>host agent logs I have: <br><br><span style="font-family:monospace,monospace">MainThread::ERROR::2018-02-18 11:12:20,751::hosted_engine::720::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) cannot get lock on host id 1: host already holds lock on a different host id</span><br><br></div><div>One other host logs: <br></div><div><span style="font-family:monospace,monospace">MainThread::INFO::2018-02-18 11:20:23,692::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Sun Feb 18 11:15:13 2018<br>MainThread::INFO::2018-02-18 11:20:23,692::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0)</span><br><br></div><div>The engine status on 3 hosts is: <br><span style="font-family:monospace,monospace">hosted-engine --vm-status<br><br><br>--== Host 1 status ==--<br><br>conf_on_shared_storage : True<br>Status up-to-date : True<br>Hostname : v0<br>Host ID : 1<br>Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}<br>Score : 0<br>stopped : False<br>Local maintenance : False<br>crc32 : cfd15dac<br>local_conf_timestamp : 4721144<br>Host timestamp : 4721144<br>Extra metadata (valid at timestamp):<br> metadata_parse_version=1<br> metadata_feature_version=1<br> timestamp=4721144 (Sun Feb 18 11:20:33 2018)<br> host-id=1<br> score=0<br> vm_conf_refresh_time=4721144 (Sun Feb 18 11:20:33 2018)<br> conf_on_shared_storage=True<br> maintenance=False<br> state=EngineUnexpectedlyDown<br> stopped=False<br> timeout=Tue Feb 24 15:29:44 1970<br><br><br>--== Host 2 status ==--<br><br>conf_on_shared_storage : True<br>Status up-to-date : True<br>Hostname : v1<br>Host ID : 2<br>Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}<br>Score : 0<br>stopped : False<br>Local maintenance : False<br>crc32 : 5cbcef4c<br>local_conf_timestamp : 2499416<br>Host timestamp : 2499416<br>Extra metadata (valid at timestamp):<br> metadata_parse_version=1<br> metadata_feature_version=1<br> timestamp=2499416 (Sun Feb 18 11:20:46 2018)<br> host-id=2<br> score=0<br> vm_conf_refresh_time=2499416 (Sun Feb 18 11:20:46 2018)<br> conf_on_shared_storage=True<br> maintenance=False<br> state=EngineUnexpectedlyDown<br> stopped=False<br> timeout=Thu Jan 29 22:18:42 1970<br><br><br>--== Host 3 status ==--<br><br>conf_on_shared_storage : True<br>Status up-to-date : False<br>Hostname : v2<br>Host ID : 3<br>Engine status : unknown stale-data<br>Score : 3400<br>stopped : False<br>Local maintenance : False<br>crc32 : f064d529<br>local_conf_timestamp : 2920612<br>Host timestamp : 2920611<br>Extra metadata (valid at timestamp):<br> metadata_parse_version=1<br> metadata_feature_version=1<br> timestamp=2920611 (Sun Feb 18 10:47:31 2018)<br> host-id=3<br> score=3400<br> vm_conf_refresh_time=2920612 (Sun Feb 18 10:47:32 2018)<br> conf_on_shared_storage=True<br> maintenance=False<br> state=GlobalMaintenance<br> stopped=False<br></span><br><br></div><div>Putting each host at maintenance then activating them back does not resolve the issue. Seems I have to avoid defining email address during deploy and have it set only later at GUI. <br><br></div><div>How one can recover from this situation?<br><br><br></div><div>Thanx, <br></div><div>Alex<br></div></div>