<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 04/12/2016 01:33 PM, Sandro
      Bonazzola wrote:<br>
    </div>
    <blockquote
cite="mid:CAPQRNT=_7QBQHvhC5UHoD4vBg9WUcdSRvg2uLn0sPct-=gb4FA@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Mon, Apr 11, 2016 at 11:44 PM,
            Bond, Darryl <span dir="ltr">&lt;<a moz-do-not-send="true"
                href="mailto:dbond@nrggos.com.au" target="_blank">dbond@nrggos.com.au</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">My setup
              is hyperconverged. I have placed my test results in <a
                moz-do-not-send="true"
                href="https://bugzilla.redhat.com/show_bug.cgi?id=1298693"
                rel="noreferrer" target="_blank"><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1298693">https://bugzilla.redhat.com/show_bug.cgi?id=1298693</a></a><br>
              <br>
            </blockquote>
            <div><br>
            </div>
            <div>Ok, so you're aware about the limitation of the single
              point of failure. If you drop the host referenced in
              hosted engine configuration for the initial setup it won't
              be able to connect to shared storage even if the other
              hosts in the cluster are up since the entry point is down.</div>
            <div>Note that hyperconverged deployment is not supported in
              3.6.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    <br>
    This issue does not seem related to the single point of failure.
    Tested this on a 3 node setup with each node mounting the volume
    hosting HE as localhost:/engine. Since all nodes have glusterd
    running and belong to same cluster, with any one node down - mount
    should continue to work. <br>
    But HE VM is restarted once a node is powered off.<br>
    <br>
    broker.log :<br>
    Thread-4602::ERROR::2016-04-13
    18:50:28,249::listener::192::ovirt_hosted_engine_ha.broker.list<br>
    ener.ConnectionHandler::(handle) Error handling request, data:
    'set-storage-domain FilesystemB<br>
    ackend dom_type=glusterfs
    sd_uuid=7fe3707b-2435-4e71-b831-4daba08cc72c'<br>
    Traceback (most recent call last):<br>
      File
    "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
    line 166,<br>
     in handle<br>
        data)<br>
      File
    "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
    line 299,<br>
     in _dispatch<br>
        .set_storage_domain(client, sd_type, **options)<br>
      File
    "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
    lin<br>
    e 66, in set_storage_domain<br>
        self._backends[client].connect()<br>
      File
    "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
    line<br>
     456, in connect<br>
        self._dom_type)<br>
      File
    "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
    line<br>
     108, in get_domain_path<br>
        " in {1}".format(sd_uuid, parent))<br>
    BackendFailureException: path to storage domain
    7fe3707b-2435-4e71-b831-4daba08cc72c not found<br>
     in /rhev/data-center/mnt/glusterSD<br>
    <br>
    agent.log<br>
    MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a>
    18:50:26,020::storage_server::207::ovirt_hosted_engine_ha.lib.sto<br>
    rage_server.StorageServer::(connect_storage_server) Connecting
    storage server<br>
    MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a>
    18:50:28,054::hosted_engine::807::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
    Stopped VDSM domain monitor for 7fe3707b-2435-4e71-b831-4daba08cc72c<br>
    MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a>
    18:50:28,055::image::184::ovirt_hosted_engine_ha.lib.image.Image::(teardown_images)
    Teardown images<br>
    MainThread::WARNING::2016-04-13
    18:50:28,177::hosted_engine::675::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
    Disconnecting the storage<br>
    MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a>
    18:50:28,177::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
    Disconnecting storage server<br>
    <br>
    <br>
    <br>
    The gluster mount logs for this time frame contain unmount messages<br>
    [2016-04-13 13:20:28.199429] I [fuse-bridge.c:4997:fuse_thread_proc]
    0-fuse: unmounting /rhev/<br>
    data-center/mnt/glusterSD/localhost:_engine<br>
    [2016-04-13 13:20:28.199934] W [glusterfsd.c:1251:cleanup_and_exit]
    (--&gt;/lib64/libpthread.so.0<br>
    (+0x7dc5) [0x7ff9b3ceddc5]
    --&gt;/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff9b53588b5]
    -<br>
    -&gt;/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7ff9b5358739] )
    0-: received signum (15), shut<br>
    ting down<br>
    [2016-04-13 13:20:28.199970] I [fuse-bridge.c:5704:fini] 0-fuse:
    Unmounting '/rhev/data-center<br>
    /mnt/glusterSD/localhost:_engine'.<br>
    <br>
    <br>
    <blockquote
cite="mid:CAPQRNT=_7QBQHvhC5UHoD4vBg9WUcdSRvg2uLn0sPct-=gb4FA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <br>
              Short description of setup:<br>
              <br>
              3 hosts with 2 disks each set up with gluster replica 3
              across the 6 disks volume name hosted-engine.<br>
              <br>
              Hostname hosted-storage configured in /etc//hosts to point
              to the host1.<br>
              <br>
              Installed hosted engine on host1 with the hosted engine
              storage path = hosted-storage:/hosted-engine<br>
              <br>
              Install first engine on h1 successful. Hosts h2 and h3
              added to the hosted engine. All works fine.<br>
              <br>
              Additional storage and non-hosted engine hosts added etc.<br>
              <br>
              Additional VMs added to hosted-engine storage (oVirt
              Reports VM and Cinder VM). Additional VM's are hosted by
              other storage - cinder and NFS.<br>
              <br>
              The system is in production.<br>
              <br>
              <br>
              Engine can be migrated around with the web interface.<br>
              <br>
              <br>
              - 3.6.4 upgrade released, follow the upgrade guide, engine
              is upgraded first , new Centos kernel requires host
              reboot.<br>
              <br>
              - Engine placed on h2 -  h3 into maintenance (local)
              upgrade and Reboot h3 - No issues - Local maintenance
              removed from h3.<br>
              <br>
              - Engine placed on h3 -  h2 into maintenance (local)
              upgrade and Reboot h2 - No issues - Local maintenance
              removed from h2.<br>
              <br>
              - Engine placed on h3 -h1 into mainteance (local) upgrade
              and reboot h1 - engine crashes and does not start
              elsewhere, VM(cinder)  on h3 on same gluster volume
              pauses.<br>
              <br>
              - Host 1 takes about 5 minutes to reboot (Enterprise box
              with all it's normal BIOS probing)<br>
              <br>
              - Engine starts after h1 comes back and stabilises<br>
              <br>
              - VM(cinder) unpauses itself,  VM(reports) continued fine
              the whole time. I can do no diagnosis on the 2 VMs as the
              engine is not available.<br>
              <br>
              - Local maintenance removed from h​1<br>
              <br>
              <br>
              I don't believe the issue is with gluster itself as the
              volume remains accessible on all hosts during this time
              albeit with a missing server (gluster volume status) as
              each gluster server is rebooted.<br>
              <br>
              Gluster was upgraded as part of the process, no issues
              were seen here.<br>
              <br>
              <br>
              I have been able to duplicate the issue without the
              upgrade by following the same sort of timeline.<br>
              <br>
              <br>
              ________________________________<br>
              From: Sandro Bonazzola &lt;<a moz-do-not-send="true"
                href="mailto:sbonazzo@redhat.com">sbonazzo@redhat.com</a>&gt;<br>
              Sent: Monday, 11 April 2016 7:11 PM<br>
              To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin
              Sivak; Sahina Bose<br>
              Cc: Bond, Darryl; users<br>
              Subject: Re: [ovirt-users] Hosted engine on gluster
              problem<br>
              <span class=""><br>
                <br>
                <br>
                On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck &lt;<a
                  moz-do-not-send="true"
                  href="mailto:hawk@tbi.univie.ac.at"><a class="moz-txt-link-abbreviated" href="mailto:hawk@tbi.univie.ac.at">hawk@tbi.univie.ac.at</a></a>&lt;mailto:<a
                  moz-do-not-send="true"
                  href="mailto:hawk@tbi.univie.ac.at"><a class="moz-txt-link-abbreviated" href="mailto:hawk@tbi.univie.ac.at">hawk@tbi.univie.ac.at</a></a>&gt;&gt;
                wrote:<br>
                Hi Darryl,<br>
                <br>
                I'm still experimenting with my oVirt installation so I
                tried to<br>
                recreate the problems you've described.<br>
                <br>
                My setup has three HA hosts for virtualization and three
                machines<br>
                for the gluster replica 3 setup.<br>
                <br>
                I manually migrated the Engine from the initial install
                host (one)<br>
                to host three. Then shut down host one manually and
                interrupted the<br>
                fencing mechanisms so the host stayed down. This didn't
                bother the<br>
                Engine VM at all.<br>
                <br>
                Did you move the host one to maintenance before shutting
                down?<br>
                Or is this a crash recovery test?<br>
                <br>
                <br>
                <br>
                To make things a bit more challenging I then shut down
                host three<br>
                while running the Engine VM. Of course the Engine was
                down for some<br>
                time until host two detected the problem. It started the
                Engine VM<br>
                and everything seems to be running quite well without
                the initial<br>
                install host.<br>
                <br>
                Thanks for the feedback!<br>
                <br>
                <br>
                <br>
                My only problem is that the HA agent on host two and
                three refuse to<br>
                start after a reboot due to the fact that the
                configuration of the<br>
              </span>hosted engine is missing. I wrote another mail to <a
                moz-do-not-send="true" href="mailto:users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:users@ovirt.org">users@ovirt.org</a></a>&lt;mailto:<a
                moz-do-not-send="true" href="mailto:users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:users@ovirt.org">users@ovirt.org</a></a>&gt;<br>
              <span class="">about that.<br>
                <br>
                This is weird. Martin,  Simone can you please
                investigate on this?<br>
                <br>
                <br>
                <br>
                <br>
                Cheers<br>
                Richard<br>
                <br>
                On 04/08/2016 01:38 AM, Bond, Darryl wrote:<br>
                &gt; There seems to be a pretty severe bug with using
                hosted engine on gluster.<br>
                &gt;<br>
                &gt; If the host that was used as the initial
                hosted-engine --deploy host goes away, the engine VM wil
                crash and cannot be restarted until the host comes back.<br>
                <br>
                is this an Hyperconverged setup?<br>
                <br>
                <br>
                &gt;<br>
                &gt; This is regardless of which host the engine was
                currently running.<br>
                &gt;<br>
                &gt;<br>
                &gt; The issue seems to be buried in the bowels of VDSM
                and is not an issue with gluster itself.<br>
                <br>
                Sahina, can you please investigate on this?<br>
                <br>
                <br>
                &gt;<br>
                &gt; The gluster filesystem is still accessable from the
                host that was running the engine. The issue has been
                submitted to bugzilla but the fix is some way off (4.1).<br>
                &gt;<br>
                &gt;<br>
                &gt; Can my hosted engine be converted to use NFS (using
                the gluster NFS server on the same filesystem) without
                rebuilding my hosted engine (ie change
                domainType=glusterfs to domainType=nfs)?<br>
                <br>
                &gt;<br>
                &gt; What effect would that have on the hosted-engine
                storage domain inside oVirt, ie would the same
                filesystem be mounted twice or would it just break.<br>
                &gt;<br>
                &gt;<br>
                &gt; Will this actually fix the problem, does it have
                the same issue when the hosted engine is on NFS?<br>
                &gt;<br>
                &gt;<br>
                &gt; Darryl<br>
                &gt;<br>
                &gt;<br>
                &gt;<br>
                &gt;<br>
                &gt; ________________________________<br>
                &gt;<br>
                &gt; The contents of this electronic message and any
                attachments are intended only for the addressee and may
                contain legally privileged, personal, sensitive or
                confidential information. If you are not the intended
                addressee, and have received this email, any
                transmission, distribution, downloading, printing or
                photocopying of the contents of this message or
                attachments is strictly prohibited. Any legal privilege
                or confidentiality attached to this message and
                attachments is not waived, lost or destroyed by reason
                of delivery to any person other than intended addressee.
                If you have received this message and are not the
                intended addressee you should notify the sender by
                return email and destroy all copies of the message and
                any attachments. Unless expressly attributed, the views
                expressed in this email do not necessarily represent the
                views of the company.<br>
                &gt; _______________________________________________<br>
                &gt; Users mailing list<br>
              </span>&gt; <a moz-do-not-send="true"
                href="mailto:Users@ovirt.org">Users@ovirt.org</a>&lt;mailto:<a
                moz-do-not-send="true" href="mailto:Users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a></a>&gt;<br>
              <span class="">&gt; <a moz-do-not-send="true"
                  href="http://lists.ovirt.org/mailman/listinfo/users"
                  rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
                &gt;<br>
                <br>
                <br>
                --<br>
                /dev/null<br>
                <br>
                <br>
                _______________________________________________<br>
                Users mailing list<br>
              </span><a moz-do-not-send="true"
                href="mailto:Users@ovirt.org">Users@ovirt.org</a>&lt;mailto:<a
                moz-do-not-send="true" href="mailto:Users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a></a>&gt;<br>
              <span class=""><a moz-do-not-send="true"
                  href="http://lists.ovirt.org/mailman/listinfo/users"
                  rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
                <br>
                <br>
                <br>
                <br>
                --<br>
                Sandro Bonazzola<br>
                Better technology. Faster innovation. Powered by
                community collaboration.<br>
              </span>See how it works at <a moz-do-not-send="true"
                href="http://redhat.com" rel="noreferrer"
                target="_blank">redhat.com</a>&lt;<a
                moz-do-not-send="true" href="http://redhat.com"
                rel="noreferrer" target="_blank"><a class="moz-txt-link-freetext" href="http://redhat.com">http://redhat.com</a></a>&gt;<br>
            </blockquote>
          </div>
          <br>
          <br clear="all">
          <div><br>
          </div>
          -- <br>
          <div class="gmail_signature">
            <div dir="ltr">
              <div>
                <div dir="ltr">Sandro Bonazzola<br>
                  Better technology. Faster innovation. Powered by
                  community collaboration.<br>
                  See how it works at <a moz-do-not-send="true"
                    href="http://redhat.com" target="_blank">redhat.com</a><br>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>