My cluster was originally built on 4.3, and things were working as long as
my SPM was on 4.3. I just killed off the last 4.3 host and rebuilt it as
4.4, and upgraded my cluster and DC to compatibility level 4.6.
We had cephfs mounted as a posix FS which worked fine, but oddly in 4.3 we
would end up with two mounts for the same volume. The configuration had a
comma separated list of IPs as that is how ceph was configured for
redundancy, and this is the mount that shows up on both 4.3 and 4.4 hosts
(/rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/). But
the 4.3 hosts would also have a duplicate mount which had the FQDN of one
of the servers instead of the comma separated list.
In 4.4, there's only a single mount and existing VMs will start just fine,
but you can't create new disks or migrate existing disks onto the posix
storage volume. My suspicion is this is an issue with the mount parser
not liking the comma in the name of the mount from the error that I get on
the SPM host when it tries to create a volume (migration would also fail
on the volume creation task):
2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [vdsm.api] START
createVolume(sdUUID='e8ec5645-fc1b-4d64-a145-44aa8ac5ef48',
spUUID='2948c860-9bdf-11e8-a6b3-00163e0419f0',
imgUUID='7d704b4d-1ebe-462f-b11e-b91039f43637', size='1073741824',
volFormat=5, preallocate=1, diskType='DATA',
volUUID='be6cb033-4e42-4bf5-a4a3-6ab5bf03edee',
desc='{"DiskAlias":"test","DiskDescription":""}',
srcImgUUID='00000000-0000-0000-0000-000000000000',
srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None,
addBitmaps=False) from=::ffff:10.1.2.37,43490,
flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7, task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed
(api:48)
2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [IOProcessClient] (Global) Starting client
(__init__:340)
2021-08-31 19:34:07,782-0700 INFO (ioprocess/3193398) [IOProcess] (Global) Starting
ioprocess (__init__:465)
2021-08-31 19:34:07,803-0700 INFO (jsonrpc/6) [vdsm.api] FINISH createVolume return=None
from=::ffff:10.1.2.37,43490, flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7,
task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (api:54)
2021-08-31 19:34:07,844-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] START task
2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (cmd=<bound method Task.commit of
<vdsm.storage.task.Task object at 0x7f4894279860>>, args=None) (threadPool:146)
2021-08-31 19:34:07,869-0700 INFO (tasks/5) [storage.StorageDomain] Create placeholder
/rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637
for image's volumes (sd:1718)
2021-08-31 19:34:07,869-0700 ERROR (tasks/5) [storage.TaskManager.Task]
(Task='2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed') Unexpected error (task:877)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in
_run
return fn(*args, **kargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in
run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79,
in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1945, in
createVolume
initial_size=initialSize, add_bitmaps=addBitmaps)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1216, in
createVolume
initial_size=initial_size, add_bitmaps=add_bitmaps)
File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1174, in
create
imgPath = dom.create_image(imgUUID)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1721, in
create_image
"create_image_rollback", [image_dir])
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 385, in
__init__
self.params = ParamList(argslist)
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 298, in
__init__
raise ValueError("ParamsList: sep %s in %s" % (sep, i))
ValueError: ParamsList: sep , in
/rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637
2021-08-31 19:34:07,964-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task
2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (threadPool:148)
This is a pretty major issue since we can no longer create new VMs. As a
workaround, I could change the mount path of the volume to only reference
a single IP, but oVirt won't let me edit the mount. I wonder if I could
manually edit in the database, then reboot the hosts one by one to make
the change take effect without having to shut down hundreds of VMs at
once?