On Wed, Sep 4, 2019 at 9:27 PM <thomas(a)hoberg.net> wrote:
I am seeing more success than failures at creating single and triple
node
hyperconverged setups after some weeks of experimentation so I am branching
out to additional features: In this case the ability to use SSDs as cache
media for hard disks.
I tried first with a single node that combined caching and compression and
that fails during the creation of LVMs.
I tried again without the VDO compression, but actually the results where
identical whilst VDO compression but without the LV cache worked ok.
I tried various combinations, using less space etc., but the results are
always the same and unfortunately rather cryptic (substituted the physical
disk label with {disklabel}):
TASK [gluster.infra/roles/backend_setup : Extend volume group]
*****************
failed: [{hostname}] (item={u'vgname': u'gluster_vg_{disklabel}p1',
u'cachethinpoolname': u'gluster_thinpool_gluster_vg_{disklabel}p1',
u'cachelvname': u'cachelv_gluster_thinpool_gluster_vg_{disklabel}p1',
u'cachedisk': u'/dev/sda4', u'cachemetalvname':
u'cache_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachemode':
u'writeback', u'cachemetalvsize': u'70G', u'cachelvsize':
u'630G'}) =>
{"ansible_loop_var": "item", "changed": false,
"err": " Physical volume
\"/dev/mapper/vdo_{disklabel}p1\" still in use\n", "item":
{"cachedisk":
"/dev/sda4", "cachelvname":
"cachelv_gluster_thinpool_gluster_vg_{disklabel}p1", "cachelvsize":
"630G",
"cachemetalvname":
"cache_gluster_thinpool_gluster_vg_{disklabel}p1",
"cachemetalvsize": "70G", "cachemode":
"writeback", "cachethinpoolname":
"gluster_thinpool_gluster_vg_{disklabel}p1", "vgname":
"gluster_vg_{disklabel}p1"}, "msg": "Unable to reduce
gluster_vg_{disklabel}p1 by /dev/dm-15.", "rc": 5}
somewhere within that I see something that points to a race condition
("still in use").
Unfortunately I have not been able to pinpoint the raw logs which are used
at that stage and I wasn't able to obtain more info.
At this point quite a bit of storage setup is already done, so rolling
back for a clean new attempt, can be a bit complicated, with reboots to
reconcile the kernel with data on disk.
I don't actually believe it's related to single node and I'd be quite
happy to move the creation of the SSD cache to a later stage, but in a VDO
setup, this looks slightly complex to someone without intimate knowledge of
LVS-with-cache-and-perhaps-thin/VDO/Gluster all thrown into one.
Needless the feature set (SSD caching & compressed-dedup) sounds terribly
attractive but when things don't just work, it's more terrifying.
Hi Thomas,
The way we have to write the variables for 2.8 while setting up cache.
Currently we are writing something like this:
>>>
gluster_infra_cache_vars:
- vgname: vg_sdb2
cachedisk: /dev/sdb3
cachelvname: cachelv_thinpool_vg_sdb2
cachethinpoolname: thinpool_vg_sdb2
cachelvsize: '10G'
cachemetalvsize: '2G'
cachemetalvname: cache_thinpool_vg_sdb2
cachemode: writethrough
===================
Not that cachedisk is provided as /dev/sdb3 which would be extended with vg
vg_sdb2 ... this works well
The module will take care of extending the vg with /dev/sdb3.
*However with Ansible-2.8 we cannot provide like this but have to be more
explicit. And have to mention the pv underlying*
*this volume group vg_sdb2. So, with respect to 2.8 we have to write that
variable like:*
>>>>>>>>>>>
>>>
gluster_infra_cache_vars:
- vgname: vg_sdb2
cachedisk: '/dev/sdb2,/dev/sdb3'
cachelvname: cachelv_thinpool_vg_sdb2
cachethinpoolname: thinpool_vg_sdb2
cachelvsize: '10G'
cachemetalvsize: '2G'
cachemetalvname: cache_thinpool_vg_sdb2
cachemode: writethrough
=====================
Note that I have mentioned both /dev/sdb2 and /dev/sdb3.
This change is backward compatible, that is it works with 2.7 as well. I
have raised an issue with Ansible as well.
Which can be found here:
https://github.com/ansible/ansible/issues/56501
However, @olafbuitelaar has fixed this in gluster-ansible-infra, and the
patch is merged in master.
If you can checkout master branch, you should be fine.