[ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes

Jiri Moskovcak jmoskovc at redhat.com
Wed Apr 9 12:42:32 UTC 2014


On 04/09/2014 02:32 PM, Daniel Helgenberger wrote:
> On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote:
>> On 04/08/2014 06:09 PM, Daniel Helgenberger wrote:
>>> Hello,
>>>
>>> I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
>>> production use.
>>>
>>> I "simulated" an ungraceful shutdown of all HA nodes (powercut) while
>>> the engine was running. After powering up, the system did not recover
>>> itself (it seemed).
>>> I had to restart the ovirt-hosted-ha service (witch was in a locked
>>> state) and then manually run 'hosted-engine --vm-start'.
>>>
>>> What is the supposed procedure after a shutdown (graceful / ungraceful)
>>> of Hosted-Engine HA nodes? Should the engine recover by itself? Should
>>> the running VM's be restarted automatically?
>>
>> When this happens the agent should start the engine VM and the engine
>> should take care of restarting the VMs which were running on that
>> restarted host and are marked as HA. Can you please provide contents ov
>> /var/log/ovirt* from the host after the powercut when the engine VM
>> doesn't come up?
>>
> Hello Jirka,
>
> I accidentally already send the message without pointing out the
> interesting part; this is:
>
> <<< start logging ha-agent after reboot:
> /var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 192.168.50.201
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.50.1'}
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911299600
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300304
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300112
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300240
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700723857104
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service) Starting vdsmd
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent
> (10 min nothing)
> <<< here I did a 'service ovirt-hosted-ha start'
> /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:59:16,698::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started
> ....
>
> after this things went quite smoothly.
>

Hi Daniel,
I noticed that in the log and I was just about to ask if that's when you 
manually fixed it. Is there something else around that time in 
/var/log/message which might be related to it?

Thanks,
Jirka

>> Thanks,
>> Jirka
>
>
>>
>>>
>>> Thanks,
>>> Daniel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>




More information about the Users mailing list