[ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes
Daniel Helgenberger
daniel.helgenberger at m-box.de
Wed Apr 9 12:32:53 UTC 2014
On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote:
> On 04/08/2014 06:09 PM, Daniel Helgenberger wrote:
> > Hello,
> >
> > I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
> > production use.
> >
> > I "simulated" an ungraceful shutdown of all HA nodes (powercut) while
> > the engine was running. After powering up, the system did not recover
> > itself (it seemed).
> > I had to restart the ovirt-hosted-ha service (witch was in a locked
> > state) and then manually run 'hosted-engine --vm-start'.
> >
> > What is the supposed procedure after a shutdown (graceful / ungraceful)
> > of Hosted-Engine HA nodes? Should the engine recover by itself? Should
> > the running VM's be restarted automatically?
>
> When this happens the agent should start the engine VM and the engine
> should take care of restarting the VMs which were running on that
> restarted host and are marked as HA. Can you please provide contents ov
> /var/log/ovirt* from the host after the powercut when the engine VM
> doesn't come up?
>
Hello Jirka,
I accidentally already send the message without pointing out the
interesting part; this is:
<<< start logging ha-agent after reboot:
/var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 192.168.50.201
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.50.1'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911299600
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300304
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300112
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300240
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700723857104
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service) Starting vdsmd
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent
(10 min nothing)
<<< here I did a 'service ovirt-hosted-ha start'
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:59:16,698::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started
....
after this things went quite smoothly.
> Thanks,
> Jirka
>
> >
> > Thanks,
> > Daniel
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4380 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140409/389a8896/attachment-0001.bin>
More information about the Users
mailing list