[ovirt-users] storage redundancy in Ovirt

Tue Apr 18 17:19:25 UTC 2017

On Tue, Apr 18, 2017 at 12:23 AM Pavel Gashev <Pax at acronis.com> wrote:

> Nir,
>
> A process can chdir into mount point and then lazy umount it. Filesystem
> remains mounted while the process exists and current directory is on
> mounted filesystem.
>
> # struncate -s 1g masterfs.img
> # mkfs.ext4 masterfs.img
> # mkdir masterfs
> # mount -o loop masterfs.img masterfs
> # cd masterfs
> # umount -l .
> # touch file
> # ls
> # cd ..
> # ls masterfs
>
Interesting idea!

The only issue I see if not having a way to tell if the file system was
actually
unmounted. Does process termination guarantee that the file system was
unmounted?

Do you know if the behaviour is documented somewhere?

Nir

>
> ------------------------------
> *From:* Nir Soffer <nsoffer at redhat.com>
> *Sent:* Apr 17, 2017 8:40 PM
> *To:* Adam Litke; Pavel Gashev
> *Cc:* users
>
> *Subject:* Re: [ovirt-users] storage redundancy in Ovirt
>
> On Mon, Apr 17, 2017 at 6:54 PM Adam Litke <alitke at redhat.com> wrote:
>
>> On Mon, Apr 17, 2017 at 11:04 AM, Pavel Gashev <Pax at acronis.com> wrote:
>>
>>> Adam,
>>>
>>>
>>>
>>> You know, Sanlock has recovery mechanism that kills VDSM, or even
>>> triggers Watchdog to reboot SPM host in case it has lost the SPM lock.
>>>
>>> I’m asking because I had issues with my master storage that caused SPM
>>> host to reboot by Watchdog. And I was sure that it’s an intended behaviour.
>>> Isn’t it?
>>>
>>
>> Yes of course.  But an SPM host can fail but still maintain its
>> connection to the storage lease.  In this case still you need classic
>> fencing.
>>
>> Something new we are investigating is the use of sanlock's request
>> feature which allows a new host to take the lease away from the current
>> holder.  The current holder would be fenced by sanlock (watchdog if
>> necessary) and only once the lease is free would it be granted to the new
>> requester.
>>
>
> We can use the SPM lease to kill vdsm on the non-responsive SPM host,
> and start the SPM on another host, similar to the way we handle vms with
> a lease.
>
> But this does not help with the masterfs mounted on the SPM host. if vdsm
> is killed before it unmount it, strating the SPM on another host (and
> mounting
> the msasterfs on the new host) will corrupt the masterfs.
>
> When using file based storage (nfs, glusterfs) we don't have a masterfs so
> killing vdsm on the SPM should be good enough to start the SPM on another
> host, even if fencing is not possible.
>
> We can start with enabling sanlock based SPM fencing on file based storage.
>
> Nir
>
>
>>
>>
>>>
>>>
>>>
>>>
>>> *From: *Adam Litke <alitke at redhat.com>
>>> *Date: *Monday, 17 April 2017 at 17:32
>>> *To: *Pavel Gashev <Pax at acronis.com>
>>> *Cc: *Nir Soffer <nsoffer at redhat.com>, users <users at ovirt.org>
>>>
>>> *Subject: *Re: [ovirt-users] storage redundancy in Ovirt
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 17, 2017 at 9:26 AM, Pavel Gashev <Pax at acronis.com> wrote:
>>>
>>> Nir,
>>>
>>>
>>>
>>> Isn’t SPM managed via Sanlock? I believe there is no need to fence SPM
>>> host. Especially if there are no SPM tasks running.
>>>
>>>
>>>
>>> It's true that the exclusivity of the SPM role is enforced by Sanlock
>>> but you always need to fence a non-responsive SPM because there is no way
>>> to guarantee that the host is not still manipulating storage (eg. LV
>>> extensions) and we must ensure that only one host has the masterfs on the
>>> master storage domain mounted.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *<users-bounces at ovirt.org> on behalf of Nir Soffer <
>>> nsoffer at redhat.com>
>>> *Date: *Monday, 17 April 2017 at 16:06
>>> *To: *Konstantin Raskoshnyi <konrasko at gmail.com>, Dan Yasny <
>>> dyasny at gmail.com>
>>> *Cc: *users <users at ovirt.org>, FERNANDO FREDIANI <
>>> fernando.frediani at upx.com>
>>> *Subject: *Re: [ovirt-users] storage redundancy in Ovirt
>>>
>>>
>>>
>>> On Mon, Apr 17, 2017 at 8:24 AM Konstantin Raskoshnyi <
>>> konrasko at gmail.com> wrote:
>>>
>>> But actually, it didn't work well. After main SPM host went down I see
>>> this
>>>
>>>
>>>
>>>
>>>
>>> 2017-04-17 05:23:15,554Z ERROR
>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
>>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM Init:
>>> could not find reported vds or not up - pool: 'STG' vds_spm_id: '1'
>>>
>>> 2017-04-17 05:23:15,567Z INFO
>>>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
>>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM
>>> selection - vds seems as spm 'tank5'
>>>
>>> 2017-04-17 05:23:15,567Z WARN
>>>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
>>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] spm vds is
>>> non responsive, stopping spm selection.
>>>
>>>
>>>
>>> So that means only if BMC is up it's possible to automatically switch
>>>  SPM host?
>>>
>>>
>>>
>>> BMC?
>>>
>>>
>>>
>>> If your SPM is no responsive, the system will try to fence it. Did you
>>>
>>> configure power management for all hosts? did you check that it
>>>
>>> work? How did you simulate non-responsive host?
>>>
>>>
>>>
>>> If power management is not configured or fail, the system cannot
>>>
>>> move the spm to another host, unless you manually confirm that the
>>>
>>> SPM host was rebooted.
>>>
>>>
>>>
>>> Nir
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Sun, Apr 16, 2017 at 8:29 PM, Konstantin Raskoshnyi <
>>> konrasko at gmail.com> wrote:
>>>
>>> Oh, fence agent works fine if I select ilo4,
>>>
>>> Thank you for your help!
>>>
>>>
>>>
>>> On Sun, Apr 16, 2017 at 8:22 PM Dan Yasny <dyasny at gmail.com> wrote:
>>>
>>> On Sun, Apr 16, 2017 at 11:19 PM, Konstantin Raskoshnyi <
>>> konrasko at gmail.com> wrote:
>>>
>>> Makes sense.
>>>
>>> I was trying to set it up, but doesn't work with our staging hardware.
>>>
>>> We have old ilo100, I'll try again.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>> It is absolutely necessary for any HA to work properly. There's of
>>> course the "confirm host has been shutdown" option, which serves as an
>>> override for the fence command, but it's manual
>>>
>>>
>>>
>>> On Sun, Apr 16, 2017 at 8:18 PM Dan Yasny <dyasny at gmail.com> wrote:
>>>
>>> On Sun, Apr 16, 2017 at 11:15 PM, Konstantin Raskoshnyi <
>>> konrasko at gmail.com> wrote:
>>>
>>> Fence agent under each node?
>>>
>>>
>>>
>>> When you configure a host, there's the power management tab, where you
>>> need to enter the bmc details for the host. If you don't have fencing
>>> enabled, how do you expect the system to make sure a host running a service
>>> is actually down (and it is safe to start HA services elsewhere), and not,
>>> for example, just unreachable by the engine? How do you avoid a splitbraid
>>> -> SBA ?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Apr 16, 2017 at 8:14 PM Dan Yasny <dyasny at gmail.com> wrote:
>>>
>>> On Sun, Apr 16, 2017 at 11:13 PM, Konstantin Raskoshnyi <
>>> konrasko at gmail.com> wrote:
>>>
>>> "Corner cases"?
>>>
>>> I tried to simulate crash of SPM server and ovirt kept trying to
>>> reistablished connection to the failed node.
>>>
>>>
>>>
>>> Did you configure fencing?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Apr 16, 2017 at 8:10 PM Dan Yasny <dyasny at gmail.com> wrote:
>>>
>>> On Sun, Apr 16, 2017 at 7:29 AM, Nir Soffer <nsoffer at redhat.com> wrote:
>>>
>>> On Sun, Apr 16, 2017 at 2:05 PM Dan Yasny <dyasny at redhat.com> wrote:
>>>
>>>
>>>
>>>
>>>
>>> On Apr 16, 2017 7:01 AM, "Nir Soffer" <nsoffer at redhat.com> wrote:
>>>
>>> On Sun, Apr 16, 2017 at 4:17 AM Dan Yasny <dyasny at gmail.com> wrote:
>>>
>>> When you set up a storage domain, you need to specify a host to perform
>>> the initial storage operations, but once the SD is defined, it's details
>>> are in the engine database, and all the hosts get connected to it directly.
>>> If the first host you used to define the SD goes down, all other hosts will
>>> still remain connected and work. SPM is an HA service, and if the current
>>> SPM host goes down, SPM gets started on another host in the DC. In short,
>>> unless your actual NFS exporting host goes down, there is no outage.
>>>
>>>
>>>
>>> There is no storage outage, but if you shutdown the spm host, the spm
>>> host
>>>
>>> will not move to a new host until the spm host is online again, or you
>>> confirm
>>>
>>> manually that the spm host was rebooted.
>>>
>>>
>>>
>>> In a properly configured setup the SBA should take care of that. That's
>>> the whole point of HA services
>>>
>>>
>>>
>>> In some cases like power loss or hardware failure, there is no way to
>>> start
>>>
>>> the spm host, and the system cannot recover automatically.
>>>
>>>
>>>
>>> There are always corner cases, no doubt. But in a normal situation.
>>> where an SPM host goes down because of a hardware failure, it gets fenced,
>>> other hosts contend for SPM and start it. No surprises there.
>>>
>>>
>>>
>>>
>>>
>>> Nir
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Nir
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Apr 15, 2017 at 1:53 PM, Konstantin Raskoshnyi <
>>> konrasko at gmail.com> wrote:
>>>
>>> Hi Fernando,
>>>
>>> I see each host has direct connection nfs mount, but yes, if main host
>>> to which I connected nfs storage going down the storage becomes unavailable
>>> and all vms are down
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Apr 15, 2017 at 10:37 AM FERNANDO FREDIANI <
>>> fernando.frediani at upx.com> wrote:
>>>
>>> Hello Konstantin.
>>>
>>> That doesn`t make much sense make a whole cluster depend on a single
>>> host. From what I know any host talk directly to NFS Storage Array or
>>> whatever other Shared Storage you have.
>>>
>>> Have you tested that host going down if that affects the other with the
>>> NFS mounted directlly in a NFS Storage array ?
>>>
>>> Fernando
>>>
>>>
>>>
>>> 2017-04-15 12:42 GMT-03:00 Konstantin Raskoshnyi <konrasko at gmail.com>:
>>>
>>> In ovirt you have to attach storage through specific host.
>>>
>>> If host goes down storage is not available.
>>>
>>>
>>>
>>> On Sat, Apr 15, 2017 at 7:31 AM FERNANDO FREDIANI <
>>> fernando.frediani at upx.com> wrote:
>>>
>>> Well, make it not go through host1 and dedicate a storage server for
>>> running NFS and make both hosts connect to it.
>>>
>>> In my view NFS is much easier to manage than any other type of storage,
>>> specially FC and iSCSI and performance is pretty much the same, so you
>>> won`t get better results other than management going to other type.
>>>
>>> Fernando
>>>
>>>
>>>
>>> 2017-04-15 5:25 GMT-03:00 Konstantin Raskoshnyi <konrasko at gmail.com>:
>>>
>>> Hi guys,
>>>
>>> I have one nfs storage,
>>>
>>> it's connected through host1.
>>>
>>> host2 also has access to it, I can easily migrate vms between them.
>>>
>>>
>>>
>>> The question is - if host1 is down - all infrastructure is down, since
>>> all traffic goes through host1,
>>>
>>> is there any way in oVirt to use redundant storage?
>>>
>>>
>>>
>>> Only glusterfs?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Adam Litke
>>>
>>
>>
>>
>> --
>> Adam Litke
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170418/961a2d68/attachment-0001.html>