[ovirt-users] Hosted engine on gluster problem
Sahina Bose
sabose at redhat.com
Wed Apr 13 14:21:08 UTC 2016
On 04/12/2016 01:33 PM, Sandro Bonazzola wrote:
>
>
> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond at nrggos.com.au
> <mailto:dbond at nrggos.com.au>> wrote:
>
> My setup is hyperconverged. I have placed my test results in
> https://bugzilla.redhat.com/show_bug.cgi?id=1298693
>
>
> Ok, so you're aware about the limitation of the single point of
> failure. If you drop the host referenced in hosted engine
> configuration for the initial setup it won't be able to connect to
> shared storage even if the other hosts in the cluster are up since the
> entry point is down.
> Note that hyperconverged deployment is not supported in 3.6.
This issue does not seem related to the single point of failure. Tested
this on a 3 node setup with each node mounting the volume hosting HE as
localhost:/engine. Since all nodes have glusterd running and belong to
same cluster, with any one node down - mount should continue to work.
But HE VM is restarted once a node is powered off.
broker.log :
Thread-4602::ERROR::2016-04-13
18:50:28,249::listener::192::ovirt_hosted_engine_ha.broker.list
ener.ConnectionHandler::(handle) Error handling request, data:
'set-storage-domain FilesystemB
ackend dom_type=glusterfs sd_uuid=7fe3707b-2435-4e71-b831-4daba08cc72c'
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 166,
in handle
data)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 299,
in _dispatch
.set_storage_domain(client, sd_type, **options)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
lin
e 66, in set_storage_domain
self._backends[client].connect()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line
456, in connect
self._dom_type)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line
108, in get_domain_path
" in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain
7fe3707b-2435-4e71-b831-4daba08cc72c not found
in /rhev/data-center/mnt/glusterSD
agent.log
MainThread::INFO::2016-04-13
18:50:26,020::storage_server::207::ovirt_hosted_engine_ha.lib.sto
rage_server.StorageServer::(connect_storage_server) Connecting storage
server
MainThread::INFO::2016-04-13
18:50:28,054::hosted_engine::807::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
Stopped VDSM domain monitor for 7fe3707b-2435-4e71-b831-4daba08cc72c
MainThread::INFO::2016-04-13
18:50:28,055::image::184::ovirt_hosted_engine_ha.lib.image.Image::(teardown_images)
Teardown images
MainThread::WARNING::2016-04-13
18:50:28,177::hosted_engine::675::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
Disconnecting the storage
MainThread::INFO::2016-04-13
18:50:28,177::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
Disconnecting storage server
The gluster mount logs for this time frame contain unmount messages
[2016-04-13 13:20:28.199429] I [fuse-bridge.c:4997:fuse_thread_proc]
0-fuse: unmounting /rhev/
data-center/mnt/glusterSD/localhost:_engine
[2016-04-13 13:20:28.199934] W [glusterfsd.c:1251:cleanup_and_exit]
(-->/lib64/libpthread.so.0
(+0x7dc5) [0x7ff9b3ceddc5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff9b53588b5] -
->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7ff9b5358739] ) 0-:
received signum (15), shut
ting down
[2016-04-13 13:20:28.199970] I [fuse-bridge.c:5704:fini] 0-fuse:
Unmounting '/rhev/data-center
/mnt/glusterSD/localhost:_engine'.
>
>
> Short description of setup:
>
> 3 hosts with 2 disks each set up with gluster replica 3 across the
> 6 disks volume name hosted-engine.
>
> Hostname hosted-storage configured in /etc//hosts to point to the
> host1.
>
> Installed hosted engine on host1 with the hosted engine storage
> path = hosted-storage:/hosted-engine
>
> Install first engine on h1 successful. Hosts h2 and h3 added to
> the hosted engine. All works fine.
>
> Additional storage and non-hosted engine hosts added etc.
>
> Additional VMs added to hosted-engine storage (oVirt Reports VM
> and Cinder VM). Additional VM's are hosted by other storage -
> cinder and NFS.
>
> The system is in production.
>
>
> Engine can be migrated around with the web interface.
>
>
> - 3.6.4 upgrade released, follow the upgrade guide, engine is
> upgraded first , new Centos kernel requires host reboot.
>
> - Engine placed on h2 - h3 into maintenance (local) upgrade and
> Reboot h3 - No issues - Local maintenance removed from h3.
>
> - Engine placed on h3 - h2 into maintenance (local) upgrade and
> Reboot h2 - No issues - Local maintenance removed from h2.
>
> - Engine placed on h3 -h1 into mainteance (local) upgrade and
> reboot h1 - engine crashes and does not start elsewhere,
> VM(cinder) on h3 on same gluster volume pauses.
>
> - Host 1 takes about 5 minutes to reboot (Enterprise box with all
> it's normal BIOS probing)
>
> - Engine starts after h1 comes back and stabilises
>
> - VM(cinder) unpauses itself, VM(reports) continued fine the
> whole time. I can do no diagnosis on the 2 VMs as the engine is
> not available.
>
> - Local maintenance removed from h1
>
>
> I don't believe the issue is with gluster itself as the volume
> remains accessible on all hosts during this time albeit with a
> missing server (gluster volume status) as each gluster server is
> rebooted.
>
> Gluster was upgraded as part of the process, no issues were seen here.
>
>
> I have been able to duplicate the issue without the upgrade by
> following the same sort of timeline.
>
>
> ________________________________
> From: Sandro Bonazzola <sbonazzo at redhat.com
> <mailto:sbonazzo at redhat.com>>
> Sent: Monday, 11 April 2016 7:11 PM
> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak;
> Sahina Bose
> Cc: Bond, Darryl; users
> Subject: Re: [ovirt-users] Hosted engine on gluster problem
>
>
>
> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
> <hawk at tbi.univie.ac.at
> <mailto:hawk at tbi.univie.ac.at><mailto:hawk at tbi.univie.ac.at
> <mailto:hawk at tbi.univie.ac.at>>> wrote:
> Hi Darryl,
>
> I'm still experimenting with my oVirt installation so I tried to
> recreate the problems you've described.
>
> My setup has three HA hosts for virtualization and three machines
> for the gluster replica 3 setup.
>
> I manually migrated the Engine from the initial install host (one)
> to host three. Then shut down host one manually and interrupted the
> fencing mechanisms so the host stayed down. This didn't bother the
> Engine VM at all.
>
> Did you move the host one to maintenance before shutting down?
> Or is this a crash recovery test?
>
>
>
> To make things a bit more challenging I then shut down host three
> while running the Engine VM. Of course the Engine was down for some
> time until host two detected the problem. It started the Engine VM
> and everything seems to be running quite well without the initial
> install host.
>
> Thanks for the feedback!
>
>
>
> My only problem is that the HA agent on host two and three refuse to
> start after a reboot due to the fact that the configuration of the
> hosted engine is missing. I wrote another mail to users at ovirt.org
> <mailto:users at ovirt.org><mailto:users at ovirt.org
> <mailto:users at ovirt.org>>
> about that.
>
> This is weird. Martin, Simone can you please investigate on this?
>
>
>
>
> Cheers
> Richard
>
> On 04/08/2016 01:38 AM, Bond, Darryl wrote:
> > There seems to be a pretty severe bug with using hosted engine
> on gluster.
> >
> > If the host that was used as the initial hosted-engine --deploy
> host goes away, the engine VM wil crash and cannot be restarted
> until the host comes back.
>
> is this an Hyperconverged setup?
>
>
> >
> > This is regardless of which host the engine was currently running.
> >
> >
> > The issue seems to be buried in the bowels of VDSM and is not an
> issue with gluster itself.
>
> Sahina, can you please investigate on this?
>
>
> >
> > The gluster filesystem is still accessable from the host that
> was running the engine. The issue has been submitted to bugzilla
> but the fix is some way off (4.1).
> >
> >
> > Can my hosted engine be converted to use NFS (using the gluster
> NFS server on the same filesystem) without rebuilding my hosted
> engine (ie change domainType=glusterfs to domainType=nfs)?
>
> >
> > What effect would that have on the hosted-engine storage domain
> inside oVirt, ie would the same filesystem be mounted twice or
> would it just break.
> >
> >
> > Will this actually fix the problem, does it have the same issue
> when the hosted engine is on NFS?
> >
> >
> > Darryl
> >
> >
> >
> >
> > ________________________________
> >
> > The contents of this electronic message and any attachments are
> intended only for the addressee and may contain legally
> privileged, personal, sensitive or confidential information. If
> you are not the intended addressee, and have received this email,
> any transmission, distribution, downloading, printing or
> photocopying of the contents of this message or attachments is
> strictly prohibited. Any legal privilege or confidentiality
> attached to this message and attachments is not waived, lost or
> destroyed by reason of delivery to any person other than intended
> addressee. If you have received this message and are not the
> intended addressee you should notify the sender by return email
> and destroy all copies of the message and any attachments. Unless
> expressly attributed, the views expressed in this email do not
> necessarily represent the views of the company.
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org <mailto:Users at ovirt.org><mailto:Users at ovirt.org
> <mailto:Users at ovirt.org>>
> >http://lists.ovirt.org/mailman/listinfo/users
> >
>
>
> --
> /dev/null
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org><mailto:Users at ovirt.org
> <mailto:Users at ovirt.org>>
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
> --
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community
> collaboration.
> See how it works at redhat.com <http://redhat.com><http://redhat.com>
>
>
>
>
> --
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com <http://redhat.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160413/b5a4addc/attachment-0001.html>
More information about the Users
mailing list