On 08/06/2015 02:38 PM, Vered Volansky wrote:
----- Original Message -----
> From: "Nicolas Ecarnot" <nicolas(a)ecarnot.net>
> To: "users(a)ovirt.org" <Users(a)ovirt.org>
> Sent: Wednesday, August 5, 2015 5:32:38 PM
> Subject: [ovirt-users] ovirt+gluster+NFS : storage hicups
>
> Hi,
>
> I used the two links below to setup a test DC :
>
>
http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
>
http://community.redhat.com/blog/2014/11/up-and-running-with-ovirt-3-5-pa...
>
> The only thing I did different is I did not usea hosted engine, but I
> dedicated a solid server for that.
> So I have one engine (CentOS 6.6), and 3 hosts (CentOS 7.0)
>
> As in the doc above, my 3 hosts are publishing 300 Go of replicated
> gluster storage, above which ctdb is managing a floating virtual ip that
> is used by NFS as the master storage domain.
>
> The last point is that the manager is also presenting a NFS storage I'm
> using as an export domain.
>
> It took me some time to plug all this setup as it is a bit more
> complicated as my other DC with a real SAN and no gluster, but it is
> eventually working (I can run VMs, migrate them...)
>
> I have made many severe tests (from a very dumb user point of view :
> unplug/replug the power cable of this server - does ctdb floats the vIP?
> does gluster self-heals?, does the VM restart?...)
> When precisely looking each layer one by one, all seems to be correct :
> ctdb is fast at managing the ip, NFS is OK, gluster seems to
> reconstruct, fencing eventually worked with the lanplus workaround, and
> so on...
>
> But from times to times, there seem to appear a severe hicup which I
> have great difficulties to diagnose.
> The messages in the web gui are not very precise, and not consistent:
> - some tell about some host having network issues, but I can ping it
> from every place it needs to be reached (especially from the SPM and the
> manager)
Ping doesn't say much as the ssh protocol is the one being used.
Please try this and report.
Please attach logs (engine+vdsm). Log snippets would be helpful (but more important are
full logs).
In general it smells like an ssh/firewall issue.
> "On host serv-vm-al01, Error: Network error during communication with
> the Host"
>
> - some tell that some volume is degraded, when it's not (gluster
> commands are showing no issue. Even the oVirt tab about the volumes are
> all green)
>
> - "Host serv-vm-al03 cannot access the Storage Domain(s) <UNKNOWN>
> attached to the Data Center"
> Just by waiting a couple of seconds lead to a self heal with no action.
>
> - Repeated "Detected change in status of brick
> serv-vm-al03:/gluster/data/brick of volume data from DOWN to UP."
> but absolutely no action is made on this filesystem.
This is coming from the earlier issue where the Host status was marked
Down, the engine sees these bricks as being Down as well, and hence the
state change messages
>
> At this time, zero VM is running in this test datacenter, and no action
> is made on the hosts. Though, I see some looping errors coming and
> going, and I find no way to diagnose.
>
> Amongst the *actions* that I had the idea to use to solve some issues :
> - I've found that trying to force the self-healing, and playing with
> gluster commands had no effect
> - I've found that playing with gluster adviced actions "find /gluster
> -exec stat {} \; ..." seem to have no either effect
> - I've found that forcing ctdb to move the vIp ("ctdb stop, ctdb
> continue") DID SOLVE most of these issue.
> I believe that it's not what ctdb is doing that helps, but maybe one of
> its shell hook that is cleaning some troubles?
>
> As this setup is complexe, I don't ask anyone a silver bullet, but maybe
> you may know which layer is the most fragile, and which one I should
> look at more closely?
>
> --
> Nicolas ECARNOT
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users