Hosted Engine suddenly reboots

Hi guys, I have ovirt 4.0.5 with 3 hosts and 1 storage setup, using iscsi for data and nfs for hosted engine storage. storage network is on a private vlan. sometimes I see ETL service stopped / ETL service started in the events log, side by side with a hosted engine stop/start... also, sometimes I get kicked out of the admin portal with no reason I had another issue which was related to https://bugzilla.redhat.com/show_bug.cgi?id=1349829 but looks like it's harmless so maybe Im not seeing the problem. can you please guide me on finding the issue here? best regards, JP

On Tue, Dec 13, 2016 at 4:34 PM, Juan Pablo <pablo.localhost@gmail.com> wrote:
Hi guys, I have ovirt 4.0.5 with 3 hosts and 1 storage setup, using iscsi for data and nfs for hosted engine storage. storage network is on a private vlan. sometimes I see ETL service stopped / ETL service started in the events log, side by side with a hosted engine stop/start... also, sometimes I get kicked out of the admin portal with no reason I had another issue which was related to https://bugzilla.redhat.com/show_bug.cgi?id=1349829 but looks like it's harmless so maybe Im not seeing the problem.
can you please guide me on finding the issue here?
You should start by checking: /var/log/ovirt-hosted-engine-ha/agent.log. Best,
best regards, JP
_______________________________________________ Users mailing list Users@ovirt.org http://lists.phx.ovirt.org/mailman/listinfo/users
-- Didi

thanks for pointing me on the right direction , I have this line a couple of minutes before the vm restart ":states::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 1600 due to gateway status" so looks like this is causing: states::413::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host virt01-int.xxxx.xxxxxx (id 1) score is significantly better than local score, shutting down VM on this host is this a network related issue? hosted engine and hosts are on the same vlan, does a gateway check should be triggering a hosted engine shutdown? thanks! JP 2016-12-13 11:37 GMT-03:00 Yedidyah Bar David <didi@redhat.com>:
On Tue, Dec 13, 2016 at 4:34 PM, Juan Pablo <pablo.localhost@gmail.com> wrote:
Hi guys, I have ovirt 4.0.5 with 3 hosts and 1 storage setup, using iscsi for data and nfs for hosted engine storage. storage network is on a private vlan. sometimes I see ETL service stopped / ETL service started in the events log, side by side with a hosted engine stop/start... also, sometimes I get kicked out of the admin portal with no reason I had another issue which was related to https://bugzilla.redhat.com/show_bug.cgi?id=1349829 but looks like it's harmless so maybe Im not seeing the problem.
can you please guide me on finding the issue here?
You should start by checking: /var/log/ovirt-hosted-engine-ha/agent.log.
Best,
best regards, JP
_______________________________________________ Users mailing list Users@ovirt.org http://lists.phx.ovirt.org/mailman/listinfo/users
-- Didi

On Tue, Dec 13, 2016 at 4:58 PM, Juan Pablo <pablo.localhost@gmail.com> wrote:
thanks for pointing me on the right direction , I have this line a couple of minutes before the vm restart ":states::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 1600 due to gateway status" so looks like this is causing: states::413::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host virt01-int.xxxx.xxxxxx (id 1) score is significantly better than local score, shutting down VM on this host is this a network related issue? hosted engine and hosts are on the same vlan, does a gateway check should be triggering a hosted engine shutdown?
Seems so. ping to the gateway is an important test, because if it fails it might mean a split-brain. When you are asked about a 'gateway address', it's actually used only for that. It does not need to be your gateway, but it does need to be a very reliable thing that should always reply. Best,
thanks! JP
2016-12-13 11:37 GMT-03:00 Yedidyah Bar David <didi@redhat.com>:
On Tue, Dec 13, 2016 at 4:34 PM, Juan Pablo <pablo.localhost@gmail.com> wrote:
Hi guys, I have ovirt 4.0.5 with 3 hosts and 1 storage setup, using iscsi for data and nfs for hosted engine storage. storage network is on a private vlan. sometimes I see ETL service stopped / ETL service started in the events log, side by side with a hosted engine stop/start... also, sometimes I get kicked out of the admin portal with no reason I had another issue which was related to https://bugzilla.redhat.com/show_bug.cgi?id=1349829 but looks like it's harmless so maybe Im not seeing the problem.
can you please guide me on finding the issue here?
You should start by checking: /var/log/ovirt-hosted-engine-ha/agent.log.
Best,
best regards, JP
_______________________________________________ Users mailing list Users@ovirt.org http://lists.phx.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi

thanks a lot for your help! 2016-12-13 12:07 GMT-03:00 Yedidyah Bar David <didi@redhat.com>:
On Tue, Dec 13, 2016 at 4:58 PM, Juan Pablo <pablo.localhost@gmail.com> wrote:
thanks for pointing me on the right direction , I have this line a couple of minutes before the vm restart ":states::128::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(score) Penalizing score by 1600 due to gateway status" so looks like this is causing: states::413::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(consume) Host virt01-int.xxxx.xxxxxx (id 1) score is significantly better than local score, shutting down VM on this host is this a network related issue? hosted engine and hosts are on the same vlan, does a gateway check should be triggering a hosted engine shutdown?
Seems so.
ping to the gateway is an important test, because if it fails it might mean a split-brain. When you are asked about a 'gateway address', it's actually used only for that. It does not need to be your gateway, but it does need to be a very reliable thing that should always reply.
Best,
thanks! JP
2016-12-13 11:37 GMT-03:00 Yedidyah Bar David <didi@redhat.com>:
On Tue, Dec 13, 2016 at 4:34 PM, Juan Pablo <pablo.localhost@gmail.com> wrote:
Hi guys, I have ovirt 4.0.5 with 3 hosts and 1 storage setup, using iscsi for data and nfs for hosted engine storage. storage network is on a private vlan. sometimes I see ETL service stopped / ETL service started in the
events
log, side by side with a hosted engine stop/start... also, sometimes I get kicked out of the admin portal with no reason I had another issue which was related to https://bugzilla.redhat.com/show_bug.cgi?id=1349829 but looks like it's harmless so maybe Im not seeing the problem.
can you please guide me on finding the issue here?
You should start by checking: /var/log/ovirt-hosted-engine- ha/agent.log.
Best,
best regards, JP
_______________________________________________ Users mailing list Users@ovirt.org http://lists.phx.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi
participants (2)
-
Juan Pablo
-
Yedidyah Bar David