[ovirt-users] hyperconverged single node with SSD cache fails gluster creation

4 Sep 2019

      I am seeing more success than failures at creating single and triple node hyperconverged setups after some weeks of experimentation so I am branching out to additional features: In this case the ability to use SSDs as cache media for hard disks.

I tried first with a single node that combined caching and compression and that fails during the creation of LVMs.

I tried again without the VDO compression, but actually the results where identical whilst VDO compression but without the LV cache worked ok.

I tried various combinations, using less space etc., but the results are always the same and unfortunately rather cryptic (substituted the physical disk label with {disklabel}):

TASK [gluster.infra/roles/backend_setup : Extend volume group] *****************
failed: [{hostname}] (item={u'vgname': u'gluster_vg_{disklabel}p1', u'cachethinpoolname': u'gluster_thinpool_gluster_vg_{disklabel}p1', u'cachelvname': u'cachelv_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachedisk': u'/dev/sda4', u'cachemetalvname': u'cache_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachemode': u'writeback', u'cachemetalvsize': u'70G', u'cachelvsize': u'630G'}) => {"ansible_loop_var": "item", "changed": false, "err": "  Physical volume \"/dev/mapper/vdo_{disklabel}p1\" still in use\n", "item": {"cachedisk": "/dev/sda4", "cachelvname": "cachelv_gluster_thinpool_gluster_vg_{disklabel}p1", "cachelvsize": "630G", "cachemetalvname": "cache_gluster_thinpool_gluster_vg_{disklabel}p1", "cachemetalvsize": "70G", "cachemode": "writeback", "cachethinpoolname": "gluster_thinpool_gluster_vg_{disklabel}p1", "vgname": "gluster_vg_{disklabel}p1"}, "msg": "Unable to reduce gluster_vg_{disklabel}p1 by /dev/dm-15.", "rc": 5}

somewhere within that I see something that points to a race condition ("still in use").

Unfortunately I have not been able to pinpoint the raw logs which are used at that stage and I wasn't able to obtain more info.

At this point quite a bit of storage setup is already done, so rolling back for a clean new attempt, can be a bit complicated, with reboots to reconcile the kernel with data on disk.

I don't actually believe it's related to single node and I'd be quite happy to move the creation of the SSD cache to a later stage, but in a VDO setup, this looks slightly complex to someone without intimate knowledge of LVS-with-cache-and-perhaps-thin/VDO/Gluster all thrown into one.

Needless the feature set (SSD caching & compressed-dedup) sounds terribly attractive but when things don't just work, it's more terrifying.

[ovirt-users] hyperconverged single node with SSD cache fails gluster creation

thomas＠hoberg.net