From: "Christopher Pereira" <kripper(a)imatronix.cl>
To: "Nir Soffer" <nsoffer(a)redhat.com>
Cc: devel(a)ovirt.org, "Liron Aravot" <laravot(a)redhat.com>
Sent: Wednesday, April 29, 2015 6:14:32 AM
Subject: Re: [ovirt-devel] "Please activate the master Storage Domain first"
On 28-04-2015 18:14, Nir Soffer wrote:
>> The DC storage master domain is on a (unrecoverable) storage on a remote
>> dead
>> host.
>> Engine is automatically setting another storage as the "Data
(Master)".
>> Seconds later, the unrecoverable storage is marked as "Data (Master)"
>> again.
>> There is no way to start the Datacenter.
>>
>> Both storages are gluster. The old (unrecoverable) one worked fine as a
>> master.
> This may be related to this bug:
>
https://bugzilla.redhat.com/1183977.
Ok. I added a comment and explained more in detail the issue on BZ.
> Are you using latest engine?
Yes,
ovirt-engine-3.6.0-0.0.master.20150427175110.git61dec8c.el7.centos.noarch
>> Any hint?
> If a one gluster node dies, and this brings down your data center,
> your gluster is probably not set up correctly. With proper replication
> everything should work after a storage node dies.
Right, in theory vdsm, ovirt-engine and gluster should all be stable
enough so that the Master Storage Domain is always alive.
Besides, oVirt DC admins should know that a Master Storage Domain can
not be removed or firewalled out from the DC without loosing the whole DC.
From another point of view, oVirt should be rock solid even in the case
Master Storage Domain went down.
It should not rely on a single SD but choose other available SD as the
new master SD, and that's the way it seems to be implemented (though not
always working).
Expected result : the alive SD should become the new MSD to reactivate
the DC.
Issue : Engine tries to set the alive SD as the new MSD but fails
without a reason.
> Please check this for the recommended configuration:
>
http://www.ovirt.org/Gluster_Storage_Domain_Reference
Thanks. Yes, we are applying replica 3 on "production".
On our lab, funny things are happening all the time with the master
nightly builds and latest gluster builds, but this helps to test and fix
issues on the run and generate extreme test cases making oVirt more robust.
Regards,
Chris
Hi Chris,
Can you please attach the engine/vdsm logs from the time the issue occurred?
thanks.