[ovirt-users] Cannot activate storage domain
SATHEESARAN
sasundar at redhat.com
Fri Dec 19 09:10:28 UTC 2014
Hi Brent,
From your logs, I could able to see that gluster mount has become
read-only.
This happens only two circumstances,
1. Client-side quorum is enabled on that volume and its not met
2. Client-side quorum is enabled on that volume and the first brick of
the replica pair went down.
I think you would have met the second scenario.
Client-side quorum is enabled on gluster volume to prevent network
partition problems which would lead to split-brain scenarios.
If you want to restore your storage domain, you can disable client-side
quorum momentarily using RHEVM UI
or using gluster cli. ( # gluster volume reset <vol-name> quorum-type )
Hope that helps.
-- Satheesaran S
On 12/19/2014 01:09 PM, Sahina Bose wrote:
> [+Sas - thanks for the link to virt-store usecase article inline]
>
> On 12/18/2014 06:56 PM, Brent Hartzell wrote:
>> Hello,
>>
>> I had actually gotten this sorted out, somewhat. If I disable server
>> quorum
>> on the volume, the storage domain will activate. The volume is/was
>> optimized
>> for virt store via oVirt. The brick in question was not the first brick
>> added to the volume through oVirt however, it appears that it may
>> have been
>> the first brick in the replica being used, but I'm not certain how to
>> find
>> this out.
>
> The recommended setting is to have both client and server side quorum
> turned on. But turning on server-side quorum with a 2-way replica
> volume would mean that your volume goes offline when one of the bricks
> goes down.
>
> "gluster volume info" command will give you information about the
> volume topology. So will the bricks sub-tab for Volume in oVirt. The
> order in which the bricks are listed, is the order of the replica sets.
>
>> Disabling quorum allowed me to get the VM's affected back online
>> however, is
>> this the recommended procedure? I tried to use replace-brick with
>> another
>> node but it failed because the failed brick was not available. Would we
>> leave quorum disabled until that brick gets replaced? IE - rebuild the
>> server with the same hostname/IP file structure and rebalance the
>> cluster?
>
> http://www.gluster.org/community/documentation/index.php/Virt-store-usecase
> - for recommendations on volume tunables.
>
> You could add another brick to your volume to make it a replica 3 and
> then turn on quorum?
>
> For help on recovering your volume, I suggest you write to
> gluster-users at gluster.org
>
>
>>
>> ////
>>
>> While that happened, I read somewhere about this happening with a
>> replica 2
>> - I've created a new volume with replica 3 and plan to test this
>> again. Is
>> there any info you can point me to for how to handle this when it
>> happens or
>> what the correct procedure is when a "first" brick fails?
>
>
>>
>>
>> -----Original Message-----
>> From: Sahina Bose [mailto:sabose at redhat.com]
>> Sent: Thursday, December 18, 2014 3:51 AM
>> To: Vered Volansky; Brent Hartzell
>> Cc: users at ovirt.org
>> Subject: Re: [ovirt-users] Cannot activate storage domain
>>
>>
>> On 12/18/2014 01:35 PM, Vered Volansky wrote:
>>> Adding Sahina.
>>>
>>> ----- Original Message -----
>>>> From: "Brent Hartzell" <brent.hartzell at outlook.com>
>>>> To: users at ovirt.org
>>>> Sent: Thursday, December 18, 2014 3:38:11 AM
>>>> Subject: [ovirt-users] Cannot activate storage domain
>>>>
>>>>
>>>>
>>>> Have the following:
>>>>
>>>>
>>>>
>>>> 6 hosts - virt + Gluster shared
>>>>
>>>>
>>>>
>>>> Gluster volume is distributed-replicate - replica 2
>>>>
>>>>
>>>>
>>>> Shutting down servers one at a time all work except for 1 brick. If
>>>> we shut down one specific brick (1 brick per host) - we're unable to
>>>> activate the storage domain. VM's that were actively running from
>>>> other bricks continue to run. Whatever was running form that specific
>>>> brick fails to run, gets paused etc.
>>>>
>>>>
>>>>
>>>> Error log shows the entry below. I'm not certain what it's saying is
>>>> read only.nothing is read only that I can find.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2014-12-17 19:57:13,362 ERROR
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-47) [4e9290a2] Command
>>>> SpmStatusVDSCommand(HostName = U23.domainame.net, HostId =
>>>> 0db58e46-68a3-4ba0-a8aa-094893c045a1, storagePoolId =
>>>> 7ccd6ea9-7d80-4170-afa1-64c10c185aa6) execution failed. Exception:
>>>> VDSErrorException: VDSGenericException: VDSErrorException: Failed to
>>>> SpmStatusVDS, error = [Errno 30] Read-only file system, code = 100
>>>>
>>>> 2014-12-17 19:57:13,363 INFO
>>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>>>> (DefaultQuartzScheduler_Worker-47) [4e9290a2]
>>>> hostFromVds::selectedVds - U23.domainname.net, spmStatus returned
>>>> null!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> According to Ovirt/Gluster, if a brick goes down, the VM should be
>>>> able to be restarted from another brick without issue. This does not
>>>> appear to be the case. If we take other bricks offline, it appears to
>> work as expected.
>>>> Something with this specific brick cases everything to break which
>>>> then makes any VM's that were running from the brick unable to start.
>> Do you have the recommended options for using volume as virt store
>> turned
>> on? Is client-side quorum turned on for the volume? Is the brick that
>> causes
>> the issue, the first brick in the replica set?
>>
>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>
More information about the Users
mailing list