[ovirt-users] Can HA Agent control NFS Mount?

Wed May 28 05:07:29 EDT 2014

----- Original Message -----
> From: "Andrew Lau" <andrew at andrewklau.com>
> To: "Doron Fediuck" <dfediuck at redhat.com>
> Cc: "Bob Doolittle" <bob at doolittle.us.com>, "users" <users at ovirt.org>, "Jiri Moskovcak" <jmoskovc at redhat.com>,
> "Sandro Bonazzola" <sbonazzo at redhat.com>
> Sent: Wednesday, May 28, 2014 11:03:38 AM
> Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
> 
> Hi Doron,
> 
> Before the initial thread sways a little more..
> 
> On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck <dfediuck at redhat.com> wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Andrew Lau" <andrew at andrewklau.com>
> >> To: "Bob Doolittle" <bob at doolittle.us.com>
> >> Cc: "users" <users at ovirt.org>
> >> Sent: Monday, May 26, 2014 7:30:41 AM
> >> Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
> >>
> >> On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle <bob at doolittle.us.com>
> >> wrote:
> >> >
> >> > On 05/25/2014 02:51 PM, Joop wrote:
> >> >>
> >> >> On 25-5-2014 19:38, Bob Doolittle wrote:
> >> >>>
> >> >>>
> >> >>> Also curious is that when I say "poweroff" it actually reboots and
> >> >>> comes
> >> >>> up again. Could that be due to the timeouts on the way down?
> >> >>>
> >> >> Ah, that's something my F19 host does too. Some more info: if engine
> >> >> hasn't been started on the host then I can shutdown it and it will
> >> >> poweroff.
> >> >> IF engine has been run on it then it will reboot.
> >> >> Its not vdsm (I think) because my shutdown sequence is (on my f19
> >> >> host):
> >> >>  service ovirt-agent-ha stop
> >> >>  service ovirt-agent-broker stop
> >> >>  service vdsmd stop
> >> >>  ssh root at engine01 "init 0"
> >> >> init 0
> >> >>
> >> >> I don't use maintenance mode because when I poweron my host (= my
> >> >> desktop)
> >> >> I want engine to power on automatically which it does most of the time
> >> >> within 10 min.
> >> >
> >> >
> >> > For comparison, I see this issue and I *do* use maintenance mode
> >> > (because
> >> > presumably that's the 'blessed' way to shut things down and I'm scared
> >> > to
> >> > mess this complex system up by straying off the beaten path ;). My
> >> > process
> >> > is:
> >> >
> >> > ssh root at engine "init 0"
> >> > (wait for "vdsClient -s 0 list | grep Status:" to show the vm as down)
> >> > hosted-engine --set-maintenance --mode=global
> >> > poweroff
> >> >
> >> > And then on startup:
> >> > hosted-engine --set-maintenance --mode=none
> >> > hosted-engine --vm-start
> >> >
> >> > There are two issues here. I am not sure if they are related or not.
> >> > 1. The NFS timeout during shutdown (Joop do you see this also? Or just
> >> > #2?)
> >> > 2. The system reboot instead of poweroff (which messes up remote machine
> >> > management)
> >> >
> >>
> >> For 1. I was wondering if perhaps, we could have an option to specify
> >> the mount options. If I understand correctly, applying a soft mount
> >> instead of a hard mount would prevent this from happening. I'm however
> >> not sure of the implications this would have on the data integrity..
> >>
> >> I would really like to see it happen in the ha-agent, as it's the one
> >> which connects/mounts the storage it should also unmount it on boot.
> >> However the stability on it, is flaky at best. I've noticed if `df`
> >> hangs because of another NFS mount having timed-out the agent will
> >> die. That's not a good sign.. this was what actually caused my
> >> hosted-engine to run twice in one case.
> >>
> >> > Thanks,
> >> >      Bob
> >> >
> >> >
> >> >> I think wdmd or sanlock are causing the reboot instead of poweroff
> >> >>
> >> >> Joop
> >> >>
> >
> > Great to have your feedback guys!
> >
> > So just to clarify some of the issues you mentioned;
> >
> > Hosted engine wasn't designed for a 'single node' use case, as we do
> > want it to be highly available. This is why it's being restarted
> > elsewhere or even on the same server if no better alternative.
> >
> > Having said that, it is possible to set global maintenance mode
> > as a first step (in the UI: right click engine vm and choose
> > ha-maintenance).
> > Then you can ssh into the engine vm and init 0.
> >
> > After a short while, the qemu process should gracefully end and release
> > its sanlock lease as well as any other resource, which means you can
> > reboot your hypervisor peacefully.
> >
> 
> What about in a 2 host cluster. Lets say we want to take down 1 host
> for maintenance, so 50% chance it could be running the engine. Would
> setting  maintenance-mode local do the same thing and allow a clean
> shutdown/reboot?
> 

Yes. That's the idea behind of local (aka host) maintenance[1].
Starting 3.4 all you need to do is move the host to maintenance in the UI
and this will also set the local maintenance mode for this host. So
you should be able to do everything with it, and use 'activate' in the
UI to get it into production.

[1] http://www.ovirt.org/Features/Self_Hosted_Engine#Maintenance_Flows