I am seeing more success than failures at creating single and triple node hyperconverged
setups after some weeks of experimentation so I am branching out to additional features:
In this case the ability to use SSDs as cache media for hard disks.
I tried first with a single node that combined caching and compression and that fails
during the creation of LVMs.
I tried again without the VDO compression, but actually the results where identical whilst
VDO compression but without the LV cache worked ok.
I tried various combinations, using less space etc., but the results are always the same
and unfortunately rather cryptic (substituted the physical disk label with {disklabel}):
TASK [gluster.infra/roles/backend_setup : Extend volume group] *****************
failed: [{hostname}] (item={u'vgname': u'gluster_vg_{disklabel}p1',
u'cachethinpoolname': u'gluster_thinpool_gluster_vg_{disklabel}p1',
u'cachelvname': u'cachelv_gluster_thinpool_gluster_vg_{disklabel}p1',
u'cachedisk': u'/dev/sda4', u'cachemetalvname':
u'cache_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachemode':
u'writeback', u'cachemetalvsize': u'70G', u'cachelvsize':
u'630G'}) => {"ansible_loop_var": "item",
"changed": false, "err": " Physical volume
\"/dev/mapper/vdo_{disklabel}p1\" still in use\n", "item":
{"cachedisk": "/dev/sda4", "cachelvname":
"cachelv_gluster_thinpool_gluster_vg_{disklabel}p1", "cachelvsize":
"630G", "cachemetalvname":
"cache_gluster_thinpool_gluster_vg_{disklabel}p1", "cachemetalvsize":
"70G", "cachemode": "writeback",
"cachethinpoolname": "gluster_thinpool_gluster_vg_{disklabel}p1",
"vgname": "gluster_vg_{disklabel}p1"}, "msg": "Unable
to reduce gluster_vg_{disklabel}p1 by /dev/dm-15.", "rc": 5}
somewhere within that I see something that points to a race condition ("still in
use").
Unfortunately I have not been able to pinpoint the raw logs which are used at that stage
and I wasn't able to obtain more info.
At this point quite a bit of storage setup is already done, so rolling back for a clean
new attempt, can be a bit complicated, with reboots to reconcile the kernel with data on
disk.
I don't actually believe it's related to single node and I'd be quite happy to
move the creation of the SSD cache to a later stage, but in a VDO setup, this looks
slightly complex to someone without intimate knowledge of
LVS-with-cache-and-perhaps-thin/VDO/Gluster all thrown into one.
Needless the feature set (SSD caching & compressed-dedup) sounds terribly attractive
but when things don't just work, it's more terrifying.