[ovirt-devel] vdsm disabling logical volumes

Mon May 12 11:26:33 UTC 2014

On 05/12/2014 11:22 AM, Dan Kenigsberg wrote:
> On Wed, May 07, 2014 at 02:16:34PM +0200, Jiri Moskovcak wrote:
>> On 05/07/2014 11:37 AM, Dan Kenigsberg wrote:
>>> On Wed, May 07, 2014 at 09:56:03AM +0200, Jiri Moskovcak wrote:
>>>> On 05/07/2014 09:28 AM, Nir Soffer wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Jiri Moskovcak" <jmoskovc at redhat.com>
>>>>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>>>>> Cc: devel at ovirt.org, "Federico Simoncelli" <fsimonce at redhat.com>, "Allon Mureinik" <amureini at redhat.com>, "Greg
>>>>>> Padgett" <gpadgett at redhat.com>, "Doron Fediuck" <dfediuck at redhat.com>
>>>>>> Sent: Wednesday, May 7, 2014 10:21:28 AM
>>>>>> Subject: Re: [ovirt-devel] vdsm disabling logical volumes
>>>>>>
>>>>>> On 05/05/2014 03:19 PM, Nir Soffer wrote:
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Jiri Moskovcak" <jmoskovc at redhat.com>
>>>>>>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>>>>>>> Cc: devel at ovirt.org, "Federico Simoncelli" <fsimonce at redhat.com>, "Allon
>>>>>>>> Mureinik" <amureini at redhat.com>, "Greg
>>>>>>>> Padgett" <gpadgett at redhat.com>
>>>>>>>> Sent: Monday, May 5, 2014 3:44:21 PM
>>>>>>>> Subject: Re: [ovirt-devel] vdsm disabling logical volumes
>>>>>>>>
>>>>>>>> On 05/05/2014 02:37 PM, Nir Soffer wrote:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Jiri Moskovcak" <jmoskovc at redhat.com>
>>>>>>>>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>>>>>>>>> Cc: devel at ovirt.org, "Federico Simoncelli" <fsimonce at redhat.com>, "Allon
>>>>>>>>>> Mureinik" <amureini at redhat.com>, "Greg
>>>>>>>>>> Padgett" <gpadgett at redhat.com>
>>>>>>>>>> Sent: Monday, May 5, 2014 3:16:37 PM
>>>>>>>>>> Subject: Re: [ovirt-devel] vdsm disabling logical volumes
>>>>>>>>>>
>>>>>>>>>> On 05/05/2014 12:01 AM, Nir Soffer wrote:
>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>> From: "Jiri Moskovcak" <jmoskovc at redhat.com>
>>>>>>>>>>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>>>>>>>>>>> Cc: devel at ovirt.org
>>>>>>>>>>>> Sent: Sunday, May 4, 2014 9:23:49 PM
>>>>>>>>>>>> Subject: Re: [ovirt-devel] vdsm disabling logical volumes
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/04/2014 07:57 PM, Nir Soffer wrote:
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>> From: "Jiri Moskovcak" <jmoskovc at redhat.com>
>>>>>>>>>>>>>> To: devel at ovirt.org
>>>>>>>>>>>>>> Sent: Sunday, May 4, 2014 8:08:33 PM
>>>>>>>>>>>>>> Subject: [ovirt-devel] vdsm disabling logical volumes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greetings vdsm developers!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> While working on adding ISCSI support to the hosted engine tools, I
>>>>>>>>>>>>>> ran
>>>>>>>>>>>>>> into a problem with vdms. It seems that when stopped vdsm
>>>>>>>>>>>>>> deactivates
>>>>>>>>>>>>>> ALL logical volumes in it's volume group and when it starts it
>>>>>>>>>>>>>> reactivates only specific logical volumes. This is a problem for
>>>>>>>>>>>>>> hosted
>>>>>>>>>>>>>> engine tools as they create logical volumes in the same volume group
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> when vdsm deactivates the LVs the hosted engine tools don't have a
>>>>>>>>>>>>>> way
>>>>>>>>>>>>>> to reactivate it, because the services drop the root permissions and
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> running as vdsm and apparently only root can activate LVs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you describe what volumes are you creating, and why?
>>>>>>>>>>>>
>>>>>>>>>>>> We create hosted-engine.lockspace (for sanlock) and
>>>>>>>>>>>> hosted-engine.metadata (keeps data about hosted engine hosts)
>>>>>>>>>>>
>>>>>>>>>>> Do you create these lvs in every vdsm vg?
>>>>>>>>>>
>>>>>>>>>> - only in the first vg created by vdsm while deploying hosted-engine
>>>>>>>
>>>>>>> It seems that the hosted engine has single point of failure - the random
>>>>>>> vg that contains hosted engine data.
>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Is this part of the domain structure
>>>>>>>>>>> used by hosted engine, or it has nothing to do with the storage domain?
>>>>>>>>>>
>>>>>>>>>> - sorry, I don't understand this question. How can I tell if it has
>>>>>>>>>> something to do with the storage domain? It's for storing data about
>>>>>>>>>> hosts set up to run the hosted-engine and data about state of engine and
>>>>>>>>>> the state of VM running the engine.
>>>>>>>>>
>>>>>>>>> Can you tell us exactly what lvs you are creating, and on which vg?
>>>>>>>>>
>>>>>>>>> And how are you creating those lvs - I guess through vdsm?
>>>>>>>>>
>>>>>>>>
>>>>>>>> - no hosted-engine tools do that by calling:
>>>>>>>>
>>>>>>>> lvc = popen(stdin=subprocess.PIPE, stdout=subprocess.PIPE,
>>>>>>>>                        stderr=subprocess.PIPE,
>>>>>>>>                        args=["lvm", "lvcreate", "-L", str(size_bytes)+"B",
>>>>>>>>                              "-n", lv_name, vg_uuid])
>>>>>>>> ..
>>>>>>>
>>>>>>> How do you ensure that another host is not modifying the same vg in the
>>>>>>> same time?
>>>>>>>
>>>>>>> If you are not ensuring this, you will corrupt this vg sooner or later.
>>>>>>>
>>>>>>> When a storage domain is detached from a host, for example when the host
>>>>>>> is in maintenance mode, lvs on the shared storage may be deleted,
>>>>>>> invalidating
>>>>>>> the devices mapper maps for these devices. If you write to an lv with wrong
>>>>>>> maps, you may be writing to an extent belonging to another lv, corrupting
>>>>>>> that
>>>>>>> lv data, or even worse corrupting the engine vg data.
>>>>>>>
>>>>>>> How do you ensure that the lvs are not deleted while you are using them?
>>>>>>>
>>>>>>>>
>>>>>>>>> The output of lvs command on a host with hosted engine installed will
>>>>>>>>> help us to understand what you are doing, and then we can think more
>>>>>>>>> clearly
>>>>>>>>> what would be the best way to support this in vdsm.
>>>>>>>>
>>>>>>>> The output of lvs: http://fpaste.org/99196/93619139/
>>>>>>>>
>>>>>>>> HE created these two LVs:
>>>>>>>> ha_agent-hosted-engine.lockspace
>>>>>>>> ha_agent-hosted-engine.metadata
>>>>>>>
>>>>>>> Why do you create these lvs on a vg owned by vdsm?
>>>>>>>
>>>>>>> If you want total control of these lvs, I suggest that you create your own
>>>>>>> vg and put what ever lvs you like there.
>>>>>>>
>>>>>>
>>>>>> I would rather not go this way (at least not for 3.5) as it's too much
>>>>>> code changes in hosted-engine. On the other hand the logic in vdsm seems
>>>>>> wrong because it's not complementary (disabling all LVs and then
>>>>>> enabling just some of them) and should be fixed anyway. This problem is
>>>>>> blocking one of our 3.5 features so I've created rhbz#1094657 to track it.
>>>>>
>>>>> Can you elaborate on this? How should vdsm behave better, and why?
>>>>
>>>> Sure. So far I didn't hear any reason why it behaves like this and
>>>> it seems not logical to disable *all* and then enable just *some*.
>>>>
>>>> How: Disabling and enabling operations should be complementary.
>>>> Why: To be less surprising.
>>>
>>> There is an asymmetry between activation and deactivation of an LV. A
>>> mistakenly-active LV can cause data corruption. Making sure that this
>>> does not happen is more important than a new feature.
>>
>> - just out of a curiosity, how can mistakenly-active LV cause data
>> corruption? something like a stalled LV which refers to a volume
>> which doesn't exists anymore?
>>
>>>
>>> We do not want to deactivate and then re-activating the same set of LVs.
>>> That would be illogical. We intentionally deactivate LVs that are no
>>> longer used on the specific host - that's important if a qemu died while
>>> Vdsm was down, leaving a stale LV behind.
>>>
>>> Design-wise, Vdsm would very much like to keep its ownership of
>>> Vdsm-created storage domain. Let us discuss how your feature can be
>>> implemented without this breach of ownership.
>>>
>>
>> Ok, I agree that this should have been discussed with the storage
>> team at the design phase, so let's start from the beginning and try
>> to come up with a better solution.
>>    My problem is that I need a storage for the hosted-engine data
>> which is accessible from all hosts. It seems logical to use the same
>> physical storage as we use for "the storage". For NFS it's just a
>> file in
>> /rhev/data-center/mnt/<IP>\mountpoint/<UUID>/ha_agent/. So where/how
>> do you suggest to store such data in case of using lvm (iscsi in
>> this case). Can we use vdsm to set it up or do we have to duplicate
>> the lvm code and handle it our self?
>
> I think that for this to happen, we need to define a Vdsm verb that
> creates a volume on a storage domain that is unrelated to any pool. Such
> a verb is in planning; Federico, can its implementation be hasten in
> favor of hosted engine?
>
> On its own, this would not solve the problem of Vdsm deactivating all
> unused LVs.
>
> Jiri, could you describe why you keep your LV active, but not open?
>

- the setup flow goes approximately like this:

1. create the LVs for the hosted-engine
2. install the engine into the VM
3. add the host to the engine
   - this causes re-deploy of vdsm and deactivating the LVs
4. start the ha-agent and ha-broker services which uses the LVs

- I guess we could move the creation of the LVs after the vdsm is 
re-deployed, just before the HE services are started, but it won't fix 
the problem if the vdsm is restarted

--Jirka

> Dan.
>