Re: [ovirt-users] storage redundancy in Ovirt

18 Apr 2017

      On Tue, Apr 18, 2017 at 12:23 AM Pavel Gashev <Pax@acronis.com> wrote:
...
Nir,
A process can chdir into mount point and then lazy umount it. Filesystem
remains mounted while the process exists and current directory is on
mounted filesystem.
# struncate -s 1g masterfs.img
# mkfs.ext4 masterfs.img
# mkdir masterfs
# mount -o loop masterfs.img masterfs
# cd masterfs
# umount -l .
# touch file
# ls
# cd ..
# ls masterfs
Interesting idea!

The only issue I see if not having a way to tell if the file system was
actually
unmounted. Does process termination guarantee that the file system was
unmounted?

Do you know if the behaviour is documented somewhere?

Nir
...
------------------------------
*From:* Nir Soffer <nsoffer@redhat.com>
*Sent:* Apr 17, 2017 8:40 PM
*To:* Adam Litke; Pavel Gashev
*Cc:* users
*Subject:* Re: [ovirt-users] storage redundancy in Ovirt
On Mon, Apr 17, 2017 at 6:54 PM Adam Litke <alitke@redhat.com> wrote:
...
On Mon, Apr 17, 2017 at 11:04 AM, Pavel Gashev <Pax@acronis.com> wrote:
...
Adam,
You know, Sanlock has recovery mechanism that kills VDSM, or even
triggers Watchdog to reboot SPM host in case it has lost the SPM lock.
I’m asking because I had issues with my master storage that caused SPM
host to reboot by Watchdog. And I was sure that it’s an intended behaviour.
Isn’t it?
Yes of course.  But an SPM host can fail but still maintain its
connection to the storage lease.  In this case still you need classic
fencing.
Something new we are investigating is the use of sanlock's request
feature which allows a new host to take the lease away from the current
holder.  The current holder would be fenced by sanlock (watchdog if
necessary) and only once the lease is free would it be granted to the new
requester.
We can use the SPM lease to kill vdsm on the non-responsive SPM host,
and start the SPM on another host, similar to the way we handle vms with
a lease.
But this does not help with the masterfs mounted on the SPM host. if vdsm
is killed before it unmount it, strating the SPM on another host (and
mounting
the msasterfs on the new host) will corrupt the masterfs.
When using file based storage (nfs, glusterfs) we don't have a masterfs so
killing vdsm on the SPM should be good enough to start the SPM on another
host, even if fencing is not possible.
We can start with enabling sanlock based SPM fencing on file based storage.
Nir
...
...
*From: *Adam Litke <alitke@redhat.com>
*Date: *Monday, 17 April 2017 at 17:32
*To: *Pavel Gashev <Pax@acronis.com>
*Cc: *Nir Soffer <nsoffer@redhat.com>, users <users@ovirt.org>
*Subject: *Re: [ovirt-users] storage redundancy in Ovirt
On Mon, Apr 17, 2017 at 9:26 AM, Pavel Gashev <Pax@acronis.com> wrote:
Nir,
Isn’t SPM managed via Sanlock? I believe there is no need to fence SPM
host. Especially if there are no SPM tasks running.
It's true that the exclusivity of the SPM role is enforced by Sanlock
but you always need to fence a non-responsive SPM because there is no way
to guarantee that the host is not still manipulating storage (eg. LV
extensions) and we must ensure that only one host has the masterfs on the
master storage domain mounted.
*From: *<users-bounces@ovirt.org> on behalf of Nir Soffer <
nsoffer@redhat.com>
*Date: *Monday, 17 April 2017 at 16:06
*To: *Konstantin Raskoshnyi <konrasko@gmail.com>, Dan Yasny <
dyasny@gmail.com>
*Cc: *users <users@ovirt.org>, FERNANDO FREDIANI <
fernando.frediani@upx.com>
*Subject: *Re: [ovirt-users] storage redundancy in Ovirt
On Mon, Apr 17, 2017 at 8:24 AM Konstantin Raskoshnyi <
konrasko@gmail.com> wrote:
But actually, it didn't work well. After main SPM host went down I see
this
2017-04-17 05:23:15,554Z ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
(DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM Init:
could not find reported vds or not up - pool: 'STG' vds_spm_id: '1'
2017-04-17 05:23:15,567Z INFO
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
(DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM
selection - vds seems as spm 'tank5'
2017-04-17 05:23:15,567Z WARN
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
(DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] spm vds is
non responsive, stopping spm selection.
So that means only if BMC is up it's possible to automatically switch
 SPM host?
BMC?
If your SPM is no responsive, the system will try to fence it. Did you
configure power management for all hosts? did you check that it
work? How did you simulate non-responsive host?
If power management is not configured or fail, the system cannot
move the spm to another host, unless you manually confirm that the
SPM host was rebooted.
Nir
Thanks
On Sun, Apr 16, 2017 at 8:29 PM, Konstantin Raskoshnyi <
konrasko@gmail.com> wrote:
Oh, fence agent works fine if I select ilo4,
Thank you for your help!
On Sun, Apr 16, 2017 at 8:22 PM Dan Yasny <dyasny@gmail.com> wrote:
On Sun, Apr 16, 2017 at 11:19 PM, Konstantin Raskoshnyi <
konrasko@gmail.com> wrote:
Makes sense.
I was trying to set it up, but doesn't work with our staging hardware.
We have old ilo100, I'll try again.
Thanks!
It is absolutely necessary for any HA to work properly. There's of
course the "confirm host has been shutdown" option, which serves as an
override for the fence command, but it's manual
On Sun, Apr 16, 2017 at 8:18 PM Dan Yasny <dyasny@gmail.com> wrote:
On Sun, Apr 16, 2017 at 11:15 PM, Konstantin Raskoshnyi <
konrasko@gmail.com> wrote:
Fence agent under each node?
When you configure a host, there's the power management tab, where you
need to enter the bmc details for the host. If you don't have fencing
enabled, how do you expect the system to make sure a host running a service
is actually down (and it is safe to start HA services elsewhere), and not,
for example, just unreachable by the engine? How do you avoid a splitbraid
-> SBA ?
On Sun, Apr 16, 2017 at 8:14 PM Dan Yasny <dyasny@gmail.com> wrote:
On Sun, Apr 16, 2017 at 11:13 PM, Konstantin Raskoshnyi <
konrasko@gmail.com> wrote:
"Corner cases"?
I tried to simulate crash of SPM server and ovirt kept trying to
reistablished connection to the failed node.
Did you configure fencing?
On Sun, Apr 16, 2017 at 8:10 PM Dan Yasny <dyasny@gmail.com> wrote:
On Sun, Apr 16, 2017 at 7:29 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Apr 16, 2017 at 2:05 PM Dan Yasny <dyasny@redhat.com> wrote:
On Apr 16, 2017 7:01 AM, "Nir Soffer" <nsoffer@redhat.com> wrote:
On Sun, Apr 16, 2017 at 4:17 AM Dan Yasny <dyasny@gmail.com> wrote:
When you set up a storage domain, you need to specify a host to perform
the initial storage operations, but once the SD is defined, it's details
are in the engine database, and all the hosts get connected to it directly.
If the first host you used to define the SD goes down, all other hosts will
still remain connected and work. SPM is an HA service, and if the current
SPM host goes down, SPM gets started on another host in the DC. In short,
unless your actual NFS exporting host goes down, there is no outage.
There is no storage outage, but if you shutdown the spm host, the spm
host
will not move to a new host until the spm host is online again, or you
confirm
manually that the spm host was rebooted.
In a properly configured setup the SBA should take care of that. That's
the whole point of HA services
In some cases like power loss or hardware failure, there is no way to
start
the spm host, and the system cannot recover automatically.
There are always corner cases, no doubt. But in a normal situation.
where an SPM host goes down because of a hardware failure, it gets fenced,
other hosts contend for SPM and start it. No surprises there.
Nir
Nir
On Sat, Apr 15, 2017 at 1:53 PM, Konstantin Raskoshnyi <
konrasko@gmail.com> wrote:
Hi Fernando,
I see each host has direct connection nfs mount, but yes, if main host
to which I connected nfs storage going down the storage becomes unavailable
and all vms are down
On Sat, Apr 15, 2017 at 10:37 AM FERNANDO FREDIANI <
fernando.frediani@upx.com> wrote:
Hello Konstantin.
That doesn`t make much sense make a whole cluster depend on a single
host. From what I know any host talk directly to NFS Storage Array or
whatever other Shared Storage you have.
Have you tested that host going down if that affects the other with the
NFS mounted directlly in a NFS Storage array ?
Fernando
2017-04-15 12:42 GMT-03:00 Konstantin Raskoshnyi <konrasko@gmail.com>:
In ovirt you have to attach storage through specific host.
If host goes down storage is not available.
On Sat, Apr 15, 2017 at 7:31 AM FERNANDO FREDIANI <
fernando.frediani@upx.com> wrote:
Well, make it not go through host1 and dedicate a storage server for
running NFS and make both hosts connect to it.
In my view NFS is much easier to manage than any other type of storage,
specially FC and iSCSI and performance is pretty much the same, so you
won`t get better results other than management going to other type.
Fernando
2017-04-15 5:25 GMT-03:00 Konstantin Raskoshnyi <konrasko@gmail.com>:
Hi guys,
I have one nfs storage,
it's connected through host1.
host2 also has access to it, I can easily migrate vms between them.
The question is - if host1 is down - all infrastructure is down, since
all traffic goes through host1,
is there any way in oVirt to use redundant storage?
Only glusterfs?
Thanks
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--
Adam Litke
--
Adam Litke