When I first deployed my oVirt lab (v4.2.7 was latest and greatest) the ansible playbook didn't work for me.

So I decided to stop the gluster processes on one of the nodes, Wipe all LVM and recreate it manually. Finally , I have managed to use my SSD for write-back cache - but I found out that if your Chunk size is larger than the default limit - it will never push it to the spinning disks.

For details you can check 1668163 – LVM cache cannot flush buffer,change cache type or lvremove LV (CachePolicy 'cleaner' also doesn't work)

As we use either 'replica 2 arbiter 1' (old name replica 3 arbiter 1) or a pure replica 3 , we can afford a gluster node go 'pouf' as long as we have decent bandwidth and we use sharding.

So far I have changed my brick layout at least twice (for the cluster) without the VMs being affected - so you can still try to do the caching, but please check the comments in #1668163 about the chunk size of the cache.

Best Regards,

Strahil Nikolov

В неделя, 1 декември 2019 г., 16:02:36 ч. Гринуич+2, Thomas Hoberg <thomas@hoberg.net> написа:

Hi Gobinda,

unfortunately it's long gone, because I went back to an un-cached setup.

It was mostly a trial anyway, I had to re-do the 3-node HCI because it
had died rather horribly on me (a repeating issue I have so far had on
distinct sets of hardware, that I am still trying to hunt down...
separate topic).

And since it was a blank(ed) set of servers, I just decided to try the
SSD cache, to see if the Ansible script generation issue had been sorted
out as described from upstream. I was rather encouraged to see that the
Ansible script now had these changes included, that URS had described as
becoming necessary with a new Ansible version.

It doesn't actually make a lot of sense in the setup, because the SSD
cache is a single Samsung EVO 860 1TB unit while the storage is a RAID6
out of 7 4TB 2.5" drives (per server): Both have similar bandwidth, IOPS
would be very much workload dependent (the 2nd SSD I intended to use as
a mirror was unfortunately cut from the budget).

It has space left over because the OS doesn't need that much, but I
don't dare use a single SSD as a write-back cache, especially because
the RAID controller (HP420i) hides all wear information and doesn't seem
to pass TRIM either and for write-through I'm not sure it would do
noticeably better than the RAID controller (I configured that not to
cache the SSD, too).

So after it failed, I simply went back to no-cache for now. This HCI
cluster is using relatively low-power hardware recalled from retirement
that will host functional VMs, not high-performance workloads. They are
well equipped with RAM and that's always the fastest cache anyway.

I guess you should be able to add and remove the SSD as cache layer at
any time during the operation, because it's at a level oVirt doesn't
manage and I'd love to see examples as to how it's done. Especially the
removal part would be important to know, if your SSD signals unexpected
levels of wear and you need to swap them out on the fly.

If I hit across another opportunity to test (most likely a single node),
I will update here and make sure to collect a full set of log files
including the ansible main config file.

Thank you for your interest and the follow-up,

Thomas

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y45BAH7PXJN6C6HXG4VDX4TRRPCH6TOX/