This is a GLusterfs Domain.
I repeated my test (powering off ovirt1).
This time the VM on ovirt2 continued to run just fine.
I noticed that engine.log said ovirt3 was now the Storage Pool Manager.
So after ovirt1 was all healed, I powered off ovirt3.
ovirt1 powered off:
[root@ovirt2 test ~]# gluster peer status
Number of Peers: 2
Hostname: 10.100.108.31 (ovirt1)
Uuid: 758d0477-7c9d-496b-91a8-cc1113cf09c4
State: Peer in Cluster (Disconnected)
Hostname:
ovirt3.test.j2noc.com
Uuid: dfe44d58-7050-4b8d-ba2f-5ddba1cab2e8
State: Peer in Cluster (Connected)
Other names:
ovirt3.test.j2noc.com
[root@ovirt2 test ~]# gluster volume status
Status of volume: gv1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick ovirt2-ks.test.j2noc.com:/gluster-sto
re/brick1/gv1 49152 0 Y 45784
Brick ovirt3-ks.test.j2noc.com:/gluster-sto
re/brick1/gv1 49152 0 Y 8536
NFS Server on localhost 2049 0 Y 45779
Self-heal Daemon on localhost N/A N/A Y 45789
NFS Server on
ovirt3.test.j2noc.com 2049 0 Y 8546
Self-heal Daemon on
ovirt3.test.j2noc.com N/A N/A Y 8551
Task Status of Volume gv1
------------------------------------------------------------------------------
There are no active volume tasks
ovirt3 powered off:
[root@ovirt2 test ~]# gluster peer status
Number of Peers: 2
Hostname: 10.100.108.31
Uuid: 758d0477-7c9d-496b-91a8-cc1113cf09c4
State: Peer in Cluster (Connected)
Hostname:
ovirt3.test.j2noc.com
Uuid: dfe44d58-7050-4b8d-ba2f-5ddba1cab2e8
State: Peer in Cluster (Disconnected)
Other names:
ovirt3.test.j2noc.com
test VM hung, and was paused:
2016-02-17 11:53:51,254 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandl
ing.AuditLogDirector] (DefaultQuartzScheduler_Worker-14) [1f0d3877]
Correlation
ID: null, Call Stack: null, Custom Event ID: -1, Message: VM
billjov1.test.j2noc.com has been paused due to unknown storage error.
I got impatient waiting for test VM (billjov1) to unpause so I selected
"run" to activate it again.
Attached are engine.log, vdsm.log.xz, sanlock.log from ovirt2. ovirt*
are all hardware nodes.
Also attached vdsm.log.1.xz from yesterday's test.
Thanks.
On 02/16/2016 11:13 PM, Nir Soffer wrote:
On Wed, Feb 17, 2016 at 7:57 AM, Sahina Bose
<sabose(a)redhat.com> wrote:
>
> On 02/17/2016 05:08 AM, Bill James wrote:
>> I have a ovirt cluster with and system running ovirt-engine 3.6.2.6-1 on
>> centos7.2
>> and 3 hardware nodes running glusterfs 3.7.6-1 and centos7.2.
>> I created a gluster volume using gluster cli and then went to add a
>> storage domain.
>>
>> I created it with Path ovirt1-ks.test:/gv1 and all works fine, until
>> ovirt1 goes down.
>> Then ALL VMs pause till ovirt1 comes back up.
>> Do I have to list all nodes in the path for this to work?
>>
>> Path: ovirt1-ks.test:/gv1 ovirt2-ks.test:/gv1 ovirt3-ks.test:/gv1
>>
>> Or how do I prevent ovirt1 from being single point of failure?
>
> Which type of storage domain have you created - NFS or GlusterFS?
> With NFS, this could be a problem, as the nfs server running on ovirt1 can
> become single point of failure. To work around this, you will need to set up
> HA with CTDB or pacemaker/corosync depending on version of NFS you're using.
>
> If you're using glusterfs, did the hypervisor node go down as well when
> ovirt1 went down? Only during mounting of the storage domain (either on
> activate of hypervisor host/ reboot of host) is the server provided in path
> required to access volume (ovirt1). You can provide
> backup-volfile-servers=ovirt2-ks:ovirt3-ks in Mount options while creating
> the storage domain.
oVirt is automatically detecting the available bricks and generating the
backup-volfile-servers option when connecting to a gluster storage domain.
You can see if this worked in vdsm.log, the backup-volfile-servers option
should appear in the mount command.
Please attach vdsm.log showing the time you connected to the gluster
storage domain, and the time of the failure.
Attach also /var/log/sanlock.log - this is the best place to detect issues
accessing storage as it reads and writes to all storage domains frequently.
> Please also provide output of gluster volume info. (I'm assuming the bricks
> on the remaining servers were online)
Nir
Cloud Services for Business
www.j2.com
j2 | eFax | eVoice | FuseMail | Campaigner | KeepItSafe | Onebox
This email, its contents and attachments contain information from j2 Global, Inc. and/or
its affiliates which may be privileged, confidential or otherwise protected from
disclosure. The information is intended to be for the addressee(s) only. If you are not an
addressee, any disclosure, copy, distribution, or use of the contents of this message is
prohibited. If you have received this email in error please notify the sender by reply
e-mail and delete the original message and any copies. (c) 2015 j2 Global, Inc. All rights
reserved. eFax, eVoice, Campaigner, FuseMail, KeepItSafe, and Onebox are registered
trademarks of j2 Global, Inc. and its affiliates.