[Users] hosted engine help

Tue Mar 11 15:08:40 UTC 2014

> Date: Tue, 11 Mar 2014 15:16:36 +0100
> From: sbonazzo at redhat.com
> To: giuseppe.ragusa at hotmail.com; jbrooks at redhat.com; msivak at redhat.com
> CC: users at ovirt.org; fsimonce at redhat.com; gpadgett at redhat.com
> Subject: Re: [Users] hosted engine help
> 
> Il 10/03/2014 22:32, Giuseppe Ragusa ha scritto:
> > Hi all,
> > 
> >> Date: Mon, 10 Mar 2014 12:56:19 -0400
> >> From: jbrooks at redhat.com
> >> To: msivak at redhat.com
> >> CC: users at ovirt.org
> >> Subject: Re: [Users] hosted engine help
> >>
> >>
> >>
> >> ----- Original Message -----
> >> > From: "Martin Sivak" <msivak at redhat.com>
> >> > To: "Dan Kenigsberg" <danken at redhat.com>
> >> > Cc: users at ovirt.org
> >> > Sent: Saturday, March 8, 2014 11:52:59 PM
> >> > Subject: Re: [Users] hosted engine help
> >> >
> >> > Hi Jason,
> >> >
> >> > can you please attach the full logs? We had very similar issue before I we
> >> > need to see if is the same or not.
> >>
> >> I may have to recreate it -- I switched back to an all in one engine after my
> >> setup started refusing to run the engine at all. It's no fun losing your engine!
> >>
> >> This was a migrated-from-standalone setup, maybe that caused additional wrinkles...
> >>
> >> Jason
> >>
> >> >
> >> > Thanks
> > 
> > I experienced the exact same symptoms as Jason on a from-scratch installation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt
> > 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Gluster-provided NFS as storage for the self-hosted engine VM only).
> 
> Using GlusterFS with hosted-engine storage is not supported and not recommended.
> HA daemon may not work properly there.

If it is unsupported (and particularly "not recommended") even with the interposed NFS (the native Gluster-provided NFSv3 export of a volume), then which is the recommended way to setup a fault-tolerant load-balanced 2 node oVirt cluster (without external dedicated SAN/NAS)?

> > I roughly followed the guide from Andrew Lau:
> > 
> > http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/
> > 
> > with some variations due to newer packages (resolved bugs) and different hardware setup (no VLANs in my setup: physically separated networks; custom
> > second nic added to Engine VM template before deploying etc.)
> > 
> > The self-hosted installation on first node + Engine VM (configured for managing both oVirt and the storage; Datacenter default set to NFS because no
> > GlusterFS offered) went apparently smooth, but the HA-agent failed to start at the very end (same errors in logs as Jason: the storage domain seems
> > "missing") and I was only able to start it all manually with:
> > 
> > hosted-engine --connect-storage
> > hosted-engine --start-pool
> 
> The above commands are used for development and shouldn't be used for starting the engine.

Directly starting the engine (with the command below) failed because of storage unavailability, so I used the above "trick" as a "last resort" to at least prove that the engine was able to start and had not been somewhat "destroyed" or "lost" in the process (but I do understand that it is an extreme debug-only action).

> > hosted-engine --vm-start
> > 
> > then the Engine came up and I could use it, I even registered the second node (same final error in HA-agent) and tried to add GlusterFS storage
> > domains for further VMs and ISOs (by the way: the original NFS-GlusterFS domain for Engine VM only is not present inside the Engine web UI) but it
> > always failed activating the domains (they remain "Inactive").
> > 
> > Furthermore the engine gets killed some time after starting (from 3 up to 11 hours later) and the only way to get it back is repeating the above commands.
> 
> Need logs for this.

I will try to reproduce it all, but I can recall that on libvirt logs (HostedEngine.log) there was always clear indication of the PID that killed the Engine VM and each time it belonged to an instance of sanlock.

> > I always managed GlusterFS "natively" (not through oVirt) from the commandline and verified that the NFS-exported Engine-VM-only volume gets
> > replicated, but I obviously failed to try migration because the HA part results inactive and oVirt refuse to migrate the Engine.
> > 
> > Since I tried many times, with variations and further manual actions between (like trying to manually mount the NFS Engine domain, restarting the
> > HA-agent only etc.), my logs are "cluttered", so I should start from scratch again and pack up all logs in one swipe.
> 
> +1

;>

> > Tell me what I should capture and at which points in the whole process and I will try to follow up as soon as possible.
> 
> What:
> hosted-engine-setup, hosted-engine-ha, vdsm, libvirt, sanlock from the physical hosts and engine and server logs from the hosted engine VM.
> 
> When:
> As soon as you see an error.

If the setup design (wholly GlusterFS based) is somewhat flawed, please point me to some hint/docs/guide for the right way of setting it up on 2 standalone physical nodes, so as not to waste your time in chasing "defects" in something that is not supposed to be working anyway.

I will follow your advice and try it accordingly.

Many thanks again,
Giuseppe

> > Many thanks,
> > Giuseppe
> > 
> >> > --
> >> > Martin Sivák
> >> > msivak at redhat.com
> >> > Red Hat Czech
> >> > RHEV-M SLA / Brno, CZ
> >> >
> >> > ----- Original Message -----
> >> > > On Fri, Mar 07, 2014 at 10:17:43AM +0100, Sandro Bonazzola wrote:
> >> > > > Il 07/03/2014 01:10, Jason Brooks ha scritto:
> >> > > > > Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and
> >> > > > > while I've managed to bring the engine up, I've only been able to do it
> >> > > > > manually, using "hosted-engine --vm-start".
> >> > > > >
> >> > > > > The ovirt-ha-agent service fails reliably for me, erroring out with
> >> > > > > "RequestError: Request failed: success."
> >> > > > >
> >> > > > > I've pasted error passages from the ha agent and vdsm logs below.
> >> > > > >
> >> > > > > Any pointers?
> >> > > >
> >> > > > looks like a VDSM bug, Dan?
> >> > >
> >> > > Why? The exception is raised from deep inside the ovirt_hosted_engine_ha
> >> > > code.
> >> > > _______________________________________________
> >> > > Users mailing list
> >> > > Users at ovirt.org
> >> > > http://lists.ovirt.org/mailman/listinfo/users
> >> > >
> >> > _______________________________________________
> >> > Users mailing list
> >> > Users at ovirt.org
> >> > http://lists.ovirt.org/mailman/listinfo/users
> >> >
> >> _______________________________________________
> >> Users mailing list
> >> Users at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> > 
> > 
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> > 
> 
> 
> -- 
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140311/7c8fdce5/attachment-0001.html>