[ovirt-users] storage redundancy in Ovirt

Mon Apr 17 14:26:05 UTC 2017

Hi Nir,
BMC - board management controller, in my case I have ilo.
Yes I set up power management for all hosts - ovirt sees ilo status as ok.
I use remote pdu to shutdown the port, after that happens the picture I
attached.
After I switch power port on, ovirt is able to read ilo status, sees that
Linux is down and immediately switches the spm server.

On Mon, Apr 17, 2017 at 6:07 AM Nir Soffer <nsoffer at redhat.com> wrote:

> On Mon, Apr 17, 2017 at 8:24 AM Konstantin Raskoshnyi <konrasko at gmail.com>
> wrote:
>
>> But actually, it didn't work well. After main SPM host went down I see
>> this
>>
> [image: Screen Shot 2017-04-16 at 10.22.00 PM.png]
>>
>
>> 2017-04-17 05:23:15,554Z ERROR
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM Init:
>> could not find reported vds or not up - pool: 'STG' vds_spm_id: '1'
>> 2017-04-17 05:23:15,567Z INFO
>>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM
>> selection - vds seems as spm 'tank5'
>> 2017-04-17 05:23:15,567Z WARN
>>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
>> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] spm vds is
>> non responsive, stopping spm selection.
>>
>> So that means only if BMC is up it's possible to automatically switch
>>  SPM host?
>>
>
> BMC?
>
> If your SPM is no responsive, the system will try to fence it. Did you
> configure power management for all hosts? did you check that it
> work? How did you simulate non-responsive host?
>
> If power management is not configured or fail, the system cannot
> move the spm to another host, unless you manually confirm that the
> SPM host was rebooted.
>
> Nir
>
>
>>
>> Thanks
>>
>> On Sun, Apr 16, 2017 at 8:29 PM, Konstantin Raskoshnyi <
>> konrasko at gmail.com> wrote:
>>
>>> Oh, fence agent works fine if I select ilo4,
>>> Thank you for your help!
>>>
>>> On Sun, Apr 16, 2017 at 8:22 PM Dan Yasny <dyasny at gmail.com> wrote:
>>>
>>>> On Sun, Apr 16, 2017 at 11:19 PM, Konstantin Raskoshnyi <
>>>> konrasko at gmail.com> wrote:
>>>>
>>>>> Makes sense.
>>>>> I was trying to set it up, but doesn't work with our staging hardware.
>>>>> We have old ilo100, I'll try again.
>>>>> Thanks!
>>>>>
>>>>>
>>>> It is absolutely necessary for any HA to work properly. There's of
>>>> course the "confirm host has been shutdown" option, which serves as an
>>>> override for the fence command, but it's manual
>>>>
>>>>
>>>>> On Sun, Apr 16, 2017 at 8:18 PM Dan Yasny <dyasny at gmail.com> wrote:
>>>>>
>>>>>> On Sun, Apr 16, 2017 at 11:15 PM, Konstantin Raskoshnyi <
>>>>>> konrasko at gmail.com> wrote:
>>>>>>
>>>>>>> Fence agent under each node?
>>>>>>>
>>>>>>
>>>>>> When you configure a host, there's the power management tab, where
>>>>>> you need to enter the bmc details for the host. If you don't have fencing
>>>>>> enabled, how do you expect the system to make sure a host running a service
>>>>>> is actually down (and it is safe to start HA services elsewhere), and not,
>>>>>> for example, just unreachable by the engine? How do you avoid a splitbraid
>>>>>> -> SBA ?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Sun, Apr 16, 2017 at 8:14 PM Dan Yasny <dyasny at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Sun, Apr 16, 2017 at 11:13 PM, Konstantin Raskoshnyi <
>>>>>>>> konrasko at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> "Corner cases"?
>>>>>>>>> I tried to simulate crash of SPM server and ovirt kept trying to
>>>>>>>>> reistablished connection to the failed node.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Did you configure fencing?
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Apr 16, 2017 at 8:10 PM Dan Yasny <dyasny at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Sun, Apr 16, 2017 at 7:29 AM, Nir Soffer <nsoffer at redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Sun, Apr 16, 2017 at 2:05 PM Dan Yasny <dyasny at redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 16, 2017 7:01 AM, "Nir Soffer" <nsoffer at redhat.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Apr 16, 2017 at 4:17 AM Dan Yasny <dyasny at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> When you set up a storage domain, you need to specify a host
>>>>>>>>>>>>> to perform the initial storage operations, but once the SD is defined, it's
>>>>>>>>>>>>> details are in the engine database, and all the hosts get connected to it
>>>>>>>>>>>>> directly. If the first host you used to define the SD goes down, all other
>>>>>>>>>>>>> hosts will still remain connected and work. SPM is an HA service, and if
>>>>>>>>>>>>> the current SPM host goes down, SPM gets started on another host in the DC.
>>>>>>>>>>>>> In short, unless your actual NFS exporting host goes down, there is no
>>>>>>>>>>>>> outage.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There is no storage outage, but if you shutdown the spm host,
>>>>>>>>>>>> the spm host
>>>>>>>>>>>> will not move to a new host until the spm host is online again,
>>>>>>>>>>>> or you confirm
>>>>>>>>>>>> manually that the spm host was rebooted.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In a properly configured setup the SBA should take care of
>>>>>>>>>>>> that. That's the whole point of HA services
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In some cases like power loss or hardware failure, there is no
>>>>>>>>>>> way to start
>>>>>>>>>>> the spm host, and the system cannot recover automatically.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There are always corner cases, no doubt. But in a normal
>>>>>>>>>> situation. where an SPM host goes down because of a hardware failure, it
>>>>>>>>>> gets fenced, other hosts contend for SPM and start it. No surprises there.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nir
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Nir
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Apr 15, 2017 at 1:53 PM, Konstantin Raskoshnyi <
>>>>>>>>>>>>> konrasko at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Fernando,
>>>>>>>>>>>>>> I see each host has direct connection nfs mount, but yes, if
>>>>>>>>>>>>>> main host to which I connected nfs storage going down the storage becomes
>>>>>>>>>>>>>> unavailable and all vms are down
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Apr 15, 2017 at 10:37 AM FERNANDO FREDIANI <
>>>>>>>>>>>>>> fernando.frediani at upx.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello Konstantin.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That doesn`t make much sense make a whole cluster depend on
>>>>>>>>>>>>>>> a single host. From what I know any host talk directly to NFS Storage Array
>>>>>>>>>>>>>>> or whatever other Shared Storage you have.
>>>>>>>>>>>>>>> Have you tested that host going down if that affects the
>>>>>>>>>>>>>>> other with the NFS mounted directlly in a NFS Storage array ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Fernando
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2017-04-15 12:42 GMT-03:00 Konstantin Raskoshnyi <
>>>>>>>>>>>>>>> konrasko at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In ovirt you have to attach storage through specific host.
>>>>>>>>>>>>>>>> If host goes down storage is not available.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Apr 15, 2017 at 7:31 AM FERNANDO FREDIANI <
>>>>>>>>>>>>>>>> fernando.frediani at upx.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well, make it not go through host1 and dedicate a storage
>>>>>>>>>>>>>>>>> server for running NFS and make both hosts connect to it.
>>>>>>>>>>>>>>>>> In my view NFS is much easier to manage than any other
>>>>>>>>>>>>>>>>> type of storage, specially FC and iSCSI and performance is pretty much the
>>>>>>>>>>>>>>>>> same, so you won`t get better results other than management going to other
>>>>>>>>>>>>>>>>> type.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Fernando
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2017-04-15 5:25 GMT-03:00 Konstantin Raskoshnyi <
>>>>>>>>>>>>>>>>> konrasko at gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>>> I have one nfs storage,
>>>>>>>>>>>>>>>>>> it's connected through host1.
>>>>>>>>>>>>>>>>>> host2 also has access to it, I can easily migrate
>>>>>>>>>>>>>>>>>> vms between them.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The question is - if host1 is down - all infrastructure
>>>>>>>>>>>>>>>>>> is down, since all traffic goes through host1,
>>>>>>>>>>>>>>>>>> is there any way in oVirt to use redundant storage?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Only glusterfs?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170417/7329ac10/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2017-04-16 at 10.22.00 PM.png
Type: image/png
Size: 33452 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170417/7329ac10/attachment-0001.png>