[ovirt-users] ovirt and gluster storage domain HA testing

Bill James bill.james at j2.com
Wed Feb 17 20:15:57 UTC 2016


This is a GLusterfs Domain.

I repeated my test (powering off ovirt1).
This time the VM on ovirt2 continued to run just fine.
I noticed that engine.log said ovirt3 was now the Storage Pool Manager.
So after ovirt1 was all healed, I powered off ovirt3.


ovirt1 powered off:
[root at ovirt2 test ~]# gluster peer status
Number of Peers: 2

Hostname: 10.100.108.31 (ovirt1)
Uuid: 758d0477-7c9d-496b-91a8-cc1113cf09c4
State: Peer in Cluster (Disconnected)

Hostname: ovirt3.test.j2noc.com
Uuid: dfe44d58-7050-4b8d-ba2f-5ddba1cab2e8
State: Peer in Cluster (Connected)
Other names:
ovirt3.test.j2noc.com
[root at ovirt2 test ~]# gluster volume status
Status of volume: gv1
Gluster process                             TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------
Brick ovirt2-ks.test.j2noc.com:/gluster-sto
re/brick1/gv1                               49152     0 Y       45784
Brick ovirt3-ks.test.j2noc.com:/gluster-sto
re/brick1/gv1                               49152     0 Y       8536
NFS Server on localhost                     2049      0 Y       45779
Self-heal Daemon on localhost               N/A       N/A Y       45789
NFS Server on ovirt3.test.j2noc.com         2049      0 Y       8546
Self-heal Daemon on ovirt3.test.j2noc.com   N/A       N/A Y       8551

Task Status of Volume gv1
------------------------------------------------------------------------------
There are no active volume tasks



ovirt3 powered off:
[root at ovirt2 test ~]# gluster peer status
Number of Peers: 2

Hostname: 10.100.108.31
Uuid: 758d0477-7c9d-496b-91a8-cc1113cf09c4
State: Peer in Cluster (Connected)

Hostname: ovirt3.test.j2noc.com
Uuid: dfe44d58-7050-4b8d-ba2f-5ddba1cab2e8
State: Peer in Cluster (Disconnected)
Other names:
ovirt3.test.j2noc.com



test VM hung, and was paused:
2016-02-17 11:53:51,254 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandl
ing.AuditLogDirector] (DefaultQuartzScheduler_Worker-14) [1f0d3877] 
Correlation
  ID: null, Call Stack: null, Custom Event ID: -1, Message: VM 
billjov1.test.j2noc.com has been paused due to unknown storage error.

I got impatient waiting for test VM (billjov1) to unpause so I selected 
"run" to activate it again.

Attached are engine.log, vdsm.log.xz, sanlock.log from ovirt2. ovirt* 
are all hardware nodes.
Also attached vdsm.log.1.xz from yesterday's test.


Thanks.


On 02/16/2016 11:13 PM, Nir Soffer wrote:
> On Wed, Feb 17, 2016 at 7:57 AM, Sahina Bose <sabose at redhat.com> wrote:
>>
>> On 02/17/2016 05:08 AM, Bill James wrote:
>>> I have a ovirt cluster with and system running ovirt-engine 3.6.2.6-1 on
>>> centos7.2
>>> and 3 hardware nodes running glusterfs 3.7.6-1 and centos7.2.
>>> I created a gluster volume using gluster cli and then went to add a
>>> storage domain.
>>>
>>> I created it with Path ovirt1-ks.test:/gv1 and all works fine, until
>>> ovirt1 goes down.
>>> Then ALL VMs pause till ovirt1 comes back up.
>>> Do I have to list all nodes in the path for this to work?
>>>
>>> Path: ovirt1-ks.test:/gv1 ovirt2-ks.test:/gv1 ovirt3-ks.test:/gv1
>>>
>>> Or how do I prevent ovirt1 from being single point of failure?
>>
>> Which type of storage domain have you created - NFS or GlusterFS?
>> With NFS, this could be a problem, as the nfs server running on ovirt1 can
>> become single point of failure. To work around this, you will need to set up
>> HA with CTDB or pacemaker/corosync depending on version of NFS you're using.
>>
>> If you're using glusterfs, did the hypervisor node go down as well when
>> ovirt1 went down? Only during mounting of the storage domain (either on
>> activate of hypervisor host/ reboot of host) is the server provided in path
>> required to access volume (ovirt1). You can provide
>> backup-volfile-servers=ovirt2-ks:ovirt3-ks in Mount options while creating
>> the storage domain.
> oVirt is automatically detecting the available bricks and generating the
> backup-volfile-servers option when connecting to a gluster storage domain.
>
> You can see if this worked in vdsm.log, the backup-volfile-servers option
> should appear in the mount command.
>
> Please attach vdsm.log showing the time you connected to the gluster
> storage domain, and the time of the failure.
>
> Attach also /var/log/sanlock.log - this is the best place to detect issues
> accessing storage as it reads and writes to all storage domains frequently.
>
>> Please also provide output of gluster volume info. (I'm assuming the bricks
>> on the remaining servers were online)
> Nir


Cloud Services for Business www.j2.com
j2 | eFax | eVoice | FuseMail | Campaigner | KeepItSafe | Onebox


This email, its contents and attachments contain information from j2 Global, Inc. and/or its affiliates which may be privileged, confidential or otherwise protected from disclosure. The information is intended to be for the addressee(s) only. If you are not an addressee, any disclosure, copy, distribution, or use of the contents of this message is prohibited. If you have received this email in error please notify the sender by reply e-mail and delete the original message and any copies. (c) 2015 j2 Global, Inc. All rights reserved. eFax, eVoice, Campaigner, FuseMail, KeepItSafe, and Onebox are registered trademarks of j2 Global, Inc. and its affiliates.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: engine.log.xz
Type: application/x-xz
Size: 191352 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160217/d5de0a34/attachment-0003.xz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanlock.log
Type: text/x-log
Size: 29068 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160217/d5de0a34/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.log.xz
Type: application/x-xz
Size: 391960 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160217/d5de0a34/attachment-0004.xz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.log.1.xz
Type: application/x-xz
Size: 737400 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160217/d5de0a34/attachment-0005.xz>


More information about the Users mailing list