posix storage mount path error when creating volumes

My cluster was originally built on 4.3, and things were working as long as my SPM was on 4.3. I just killed off the last 4.3 host and rebuilt it as 4.4, and upgraded my cluster and DC to compatibility level 4.6. We had cephfs mounted as a posix FS which worked fine, but oddly in 4.3 we would end up with two mounts for the same volume. The configuration had a comma separated list of IPs as that is how ceph was configured for redundancy, and this is the mount that shows up on both 4.3 and 4.4 hosts (/rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/). But the 4.3 hosts would also have a duplicate mount which had the FQDN of one of the servers instead of the comma separated list. In 4.4, there's only a single mount and existing VMs will start just fine, but you can't create new disks or migrate existing disks onto the posix storage volume. My suspicion is this is an issue with the mount parser not liking the comma in the name of the mount from the error that I get on the SPM host when it tries to create a volume (migration would also fail on the volume creation task): 2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [vdsm.api] START createVolume(sdUUID='e8ec5645-fc1b-4d64-a145-44aa8ac5ef48', spUUID='2948c860-9bdf-11e8-a6b3-00163e0419f0', imgUUID='7d704b4d-1ebe-462f-b11e-b91039f43637', size='1073741824', volFormat=5, preallocate=1, diskType='DATA', volUUID='be6cb033-4e42-4bf5-a4a3-6ab5bf03edee', desc='{"DiskAlias":"test","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None, addBitmaps=False) from=::ffff:10.1.2.37,43490, flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7, task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (api:48) 2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [IOProcessClient] (Global) Starting client (__init__:340) 2021-08-31 19:34:07,782-0700 INFO (ioprocess/3193398) [IOProcess] (Global) Starting ioprocess (__init__:465) 2021-08-31 19:34:07,803-0700 INFO (jsonrpc/6) [vdsm.api] FINISH createVolume return=None from=::ffff:10.1.2.37,43490, flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7, task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (api:54) 2021-08-31 19:34:07,844-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] START task 2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (cmd=<bound method Task.commit of <vdsm.storage.task.Task object at 0x7f4894279860>>, args=None) (threadPool:146) 2021-08-31 19:34:07,869-0700 INFO (tasks/5) [storage.StorageDomain] Create placeholder /rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637 for image's volumes (sd:1718) 2021-08-31 19:34:07,869-0700 ERROR (tasks/5) [storage.TaskManager.Task] (Task='2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1945, in createVolume initial_size=initialSize, add_bitmaps=addBitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1216, in createVolume initial_size=initial_size, add_bitmaps=add_bitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1174, in create imgPath = dom.create_image(imgUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1721, in create_image "create_image_rollback", [image_dir]) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 385, in __init__ self.params = ParamList(argslist) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 298, in __init__ raise ValueError("ParamsList: sep %s in %s" % (sep, i)) ValueError: ParamsList: sep , in /rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637 2021-08-31 19:34:07,964-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task 2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (threadPool:148) This is a pretty major issue since we can no longer create new VMs. As a workaround, I could change the mount path of the volume to only reference a single IP, but oVirt won't let me edit the mount. I wonder if I could manually edit in the database, then reboot the hosts one by one to make the change take effect without having to shut down hundreds of VMs at once?

On Wed, Sep 1, 2021 at 6:21 PM Sketch <ovirt@rednsx.org> wrote:
My cluster was originally built on 4.3, and things were working as long as my SPM was on 4.3. I just killed off the last 4.3 host and rebuilt it as 4.4, and upgraded my cluster and DC to compatibility level 4.6.
We had cephfs mounted as a posix FS which worked fine, but oddly in 4.3 we would end up with two mounts for the same volume. The configuration had a comma separated list of IPs as that is how ceph was configured for redundancy, and this is the mount that shows up on both 4.3 and 4.4 hosts (/rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/).
This was never supported. We had this old fix that was rejected: https://gerrit.ovirt.org/c/vdsm/+/94027 but it will not help to solve the issue with the task argument below.
But the 4.3 hosts would also have a duplicate mount which had the FQDN of one of the servers instead of the comma separated list.
In 4.4, there's only a single mount and existing VMs will start just fine, but you can't create new disks or migrate existing disks onto the posix storage volume. My suspicion is this is an issue with the mount parser not liking the comma in the name of the mount from the error that I get on the SPM host when it tries to create a volume (migration would also fail on the volume creation task):
2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [vdsm.api] START createVolume(sdUUID='e8ec5645-fc1b-4d64-a145-44aa8ac5ef48', spUUID='2948c860-9bdf-11e8-a6b3-00163e0419f0', imgUUID='7d704b4d-1ebe-462f-b11e-b91039f43637', size='1073741824', volFormat=5, preallocate=1, diskType='DATA', volUUID='be6cb033-4e42-4bf5-a4a3-6ab5bf03edee', desc='{"DiskAlias":"test","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None, addBitmaps=False) from=::ffff:10.1.2.37,43490, flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7, task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (api:48) 2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [IOProcessClient] (Global) Starting client (__init__:340) 2021-08-31 19:34:07,782-0700 INFO (ioprocess/3193398) [IOProcess] (Global) Starting ioprocess (__init__:465) 2021-08-31 19:34:07,803-0700 INFO (jsonrpc/6) [vdsm.api] FINISH createVolume return=None from=::ffff:10.1.2.37,43490, flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7, task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (api:54) 2021-08-31 19:34:07,844-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] START task 2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (cmd=<bound method Task.commit of <vdsm.storage.task.Task object at 0x7f4894279860>>, args=None) (threadPool:146) 2021-08-31 19:34:07,869-0700 INFO (tasks/5) [storage.StorageDomain] Create placeholder /rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637 for image's volumes (sd:1718) 2021-08-31 19:34:07,869-0700 ERROR (tasks/5) [storage.TaskManager.Task] (Task='2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1945, in createVolume initial_size=initialSize, add_bitmaps=addBitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1216, in createVolume initial_size=initial_size, add_bitmaps=add_bitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1174, in create imgPath = dom.create_image(imgUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1721, in create_image "create_image_rollback", [image_dir]) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 385, in __init__ self.params = ParamList(argslist) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 298, in __init__ raise ValueError("ParamsList: sep %s in %s" % (sep, i)) ValueError: ParamsList: sep , in /rhev/data-center/mnt/10.1.88.75 ,10.1.88.76,10.1.88.77: _vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637 2021-08-31 19:34:07,964-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task 2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (threadPool:148)
I think the issue is the task arguments parser - these are separated by ",", and arguments including "," breaks the parser.
This is a pretty major issue since we can no longer create new VMs. As a workaround, I could change the mount path of the volume to only reference a single IP, but oVirt won't let me edit the mount. I wonder if I could manually edit in the database, then reboot the hosts one by one to make the change take effect without having to shut down hundreds of VMs at once?
This should work. Please file a bug for this, so we can consider this for the next release. Nir

On Wed, 1 Sep 2021, Nir Soffer wrote:
This is a pretty major issue since we can no longer create new VMs. As a workaround, I could change the mount path of the volume to only reference a single IP, but oVirt won't let me edit the mount. I wonder if I could manually edit in the database, then reboot the hosts one by one to make the change take effect without having to shut down hundreds of VMs at once?
This should work.
I tried editing the database to point to a single IP and it works, I'm able to mount it and create disks on it. However, attempts to migrate any live VMs between a system with the old mount and one with the new mount fail, presumably because the mountpoint is now named differently. I believe doing this would effectively split my cluster into hosts on two separate storage pools. I wonder if there's some way to force the mount name to be the same?

We are use CephFS domains without issues from 4.3, currently on 4.4 [root@ovirt-host1 /]# ls -la /rhev/data-center/mnt/*ovirt* '/rhev/data-center/mnt/172.16.16.2:3300,172.16.16.3:3300,172.16.16.4:3300:_ovirt__iso': total 0 drwxrwxrwx. 3 root root 1 Sep 3 15:43 . drwxr-xr-x. 5 vdsm kvm 182 Aug 19 14:35 .. drwxr-xr-x. 4 vdsm kvm 2 Aug 19 14:35 def09ff0-e986-44c4-ac1c-470668ec2822 [root@ovirt-host1 /]# mount | grep iso 172.16.16.2:3300,172.16.16.3:3300,172.16.16.4:3300:/ovirt_iso on /rhev/data-center/mnt/172.16.16.2:3300,172.16.16.3:3300,172.16.16.4:3300:_ovirt__iso type ceph (rw,relatime,seclabel,name=mds_ovirt_iso,secret=<hidden>,ms_mode=prefer-crc,dirstat,acl) k
On 1 Sep 2021, at 23:52, Nir Soffer <nsoffer@redhat.com> wrote:
This was never supported.
We had this old fix that was rejected: https://gerrit.ovirt.org/c/vdsm/+/94027 <https://gerrit.ovirt.org/c/vdsm/+/94027>
participants (3)
-
Konstantin Shalygin
-
Nir Soffer
-
Sketch