On 10/15/2020 10:01 AM, Michael Thomas wrote:
Getting closer...
I recreated the storage domain and added rbd_default_features=3 to
ceph.conf. Now I see the new disk being created with (what I think
is) the correct set of features:
# rbd info rbd.ovirt.data/volume-f4ac68c6-e5f7-4b01-aed0-36a55b901
fbf
rbd image 'volume-f4ac68c6-e5f7-4b01-aed0-36a55b901fbf':
size 100 GiB in 25600 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 70aab541cb331
block_name_prefix: rbd_data.70aab541cb331
format: 2
features: layering
op_features:
flags:
create_timestamp: Thu Oct 15 06:53:23 2020
access_timestamp: Thu Oct 15 06:53:23 2020
modify_timestamp: Thu Oct 15 06:53:23 2020
However, I'm still unable to attach the disk to a VM. This time it's
a permissions issue on the ovirt node where the VM is running. It
looks like it can't read the temporary ceph config file that is sent
over from the engine:
Are you using octopus? If so, the config file that's generated is
missing the "[global]" at the top and octopus doesn't like that. It's
been patched upstream.
>
>
https://pastebin.com/pGjMTvcn
>
> The file '/tmp/brickrbd_nwc3kywk' on the ovirt node is only accessible
> by root:
>
> [root@ovirt4 ~]# ls -l /tmp/brickrbd_nwc3kywk
> -rw-------. 1 root root 146 Oct 15 07:25 /tmp/brickrbd_nwc3kywk
>
> ...and I'm guessing that it's being accessed by the vdsm user?
>
> --Mike
>
> On 10/14/20 10:59 AM, Michael Thomas wrote:
>> Hi Benny,
>>
>> You are correct, I tried attaching to a running VM (which failed), then
>> tried booting a new VM using this disk (which also failed). I'll use
>> the workaround in the bug report going forward.
>>
>> I'll just recreate the storage domain, since at this point I have
>> nothing in it to lose.
>>
>> Regards,
>>
>> --Mike
>>
>> On 10/14/20 9:32 AM, Benny Zlotnik wrote:
>>> Did you attempt to start a VM with this disk and it failed, or you
>>> didn't try at all? If it's the latter then the error is strange...
>>> If it's the former there is a known issue with multipath at the
>>> moment, see[1] for a workaround, since you might have issues with
>>> detaching volumes which later, because multipath grabs the rbd devices
>>> which would fail `rbd unmap`, it will be fixed soon by automatically
>>> blacklisting rbd in multipath configuration.
>>>
>>> Regarding editing, you can submit an RFE for this, but it is currently
>>> not possible. The options are indeed to either recreate the storage
>>> domain or edit the database table
>>>
>>>
>>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1881832#c8
>>>
>>>
>>>
>>>
>>> On Wed, Oct 14, 2020 at 3:40 PM Michael Thomas <wart(a)caltech.edu>
>>> wrote:
>>>>
>>>> On 10/14/20 3:30 AM, Benny Zlotnik wrote:
>>>>> Jeff is right, it's a limitation of kernel rbd, the
recommendation is
>>>>> to add `rbd default features = 3` to the configuration. I think
there
>>>>> are plans to support rbd-nbd in cinderlib which would allow using
>>>>> additional features, but I'm not aware of anything concrete.
>>>>>
>>>>> Additionally, the path for the cinderlib log is
>>>>> /var/log/ovirt-engine/cinderlib/cinderlib.log, the error in this
case
>>>>> would appear in the vdsm.log on the relevant host, and would look
>>>>> something like "RBD image feature set mismatch. You can disable
>>>>> features unsupported by the kernel with 'rbd feature
disable'"
>>>>
>>>> Thanks for the pointer! Indeed,
>>>> /var/log/ovirt-engine/cinderlib/cinderlib.log has the errors that I
>>>> was
>>>> looking for. In this case, it was a user error entering the RBDDriver
>>>> options:
>>>>
>>>>
>>>> 2020-10-13 15:15:25,640 - cinderlib.cinderlib - WARNING - Unknown
>>>> config
>>>> option use_multipath_for_xfer
>>>>
>>>> ...it should have been 'use_multipath_for_image_xfer'.
>>>>
>>>> Now my attempts to fix it are failing... If I go to 'Storage ->
>>>> Storage
>>>> Domains -> Manage Domain', all driver options are unedittable
>>>> except for
>>>> 'Name'.
>>>>
>>>> Then I thought that maybe I can't edit the driver options while a
disk
>>>> still exists, so I tried removing the one disk in this domain. But
>>>> even
>>>> after multiple attempts, it still fails with:
>>>>
>>>> 2020-10-14 07:26:31,340 - cinder.volume.drivers.rbd - INFO - volume
>>>> volume-5419640e-445f-4b3f-a29d-b316ad031b7a no longer exists in
>>>> backend
>>>> 2020-10-14 07:26:31,353 - cinderlib-client - ERROR - Failure occurred
>>>> when trying to run command 'delete_volume':
(psycopg2.IntegrityError)
>>>> update or delete on table "volumes" violates foreign key
constraint
>>>> "volume_attachment_volume_id_fkey" on table
"volume_attachment"
>>>> DETAIL: Key (id)=(5419640e-445f-4b3f-a29d-b316ad031b7a) is still
>>>> referenced from table "volume_attachment".
>>>>
>>>> See
https://pastebin.com/KwN1Vzsp for the full log entries related to
>>>> this removal.
>>>>
>>>> It's not lying, the volume no longer exists in the rbd pool, but the
>>>> cinder database still thinks it's attached, even though I was never
>>>> able
>>>> to get it to attach to a VM.
>>>>
>>>> What are my options for cleaning up this stale disk in the cinder
>>>> database?
>>>>
>>>> How can I update the driver options in my storage domain (deleting and
>>>> recreating the domain is acceptable, if possible)?
>>>>
>>>> --Mike
>>>>
>>>
>> _______________________________________________
>> Users mailing list -- users(a)ovirt.org
>> To unsubscribe send an email to users-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>>
https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3WIVWLKS347...
>>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZZZQJLXDG5O...