Upgrade Memory of oVirt Nodes

Hello everyone, I have an oVirt 4.3.2.5 hyperconverged 3 node production environment and we want to add some RAM to it. Can I upgrade the RAM without my users noticing any disruptions and keep the VMs running? The way I thought I should do it was to migrate any running VMs to the other nodes, then set one node in maintenance mode, shut it down, place the new memory, bring it back up, remove it from maintenance mode and see how the installation reacts and repeat for the other two nodes. Is this correct or should I follow another way? Will there be a problem during the time when the nodes will not be identical in their resources? Thank you for your time, Souvalioti Maria

On May 19, 2020 11:16:35 AM GMT+03:00, souvaliotimaria@mail.com wrote:
Hello everyone,
I have an oVirt 4.3.2.5 hyperconverged 3 node production environment and we want to add some RAM to it.
Can I upgrade the RAM without my users noticing any disruptions and keep the VMs running?
The way I thought I should do it was to migrate any running VMs to the other nodes, then set one node in maintenance mode, shut it down, place the new memory, bring it back up, remove it from maintenance mode and see how the installation reacts and repeat for the other two nodes. Is this correct or should I follow another way?
Will there be a problem during the time when the nodes will not be identical in their resources?
Thank you for your time, Souvalioti Maria _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/F4E6DMLL23QU6K...
There is no requirement that your nodes should have the same ammount of RAM. At least my setup doesn't have the same RAM on each node. The problem with nonequal nodes is the scheduling policy, which reminds me to ask if there are any guides for seting a scheduling policy based on memory usage ? Best Regards, Strahil Nikolov

Just like Strahil, I would expect this to work just fine. RAM differences are actually the smallest concern, unless you run out of it in the mean-time. And there you may want to be careful and perhaps move VMs around manually with such a small HCI setup. oVirt will properly optimize the VMs and the hosts to fit, but I don't know what it will do when there simply isn't enough RAM to run the live VMs. Under the best of circumstances it should refuse to shut down the node you want to upgrade. Under less advantagious circumstances some VMs might get paused, or shut down (or killed?). I'd be interested to hear your experiences, somewhat less inclinded to try myself ;-) I'd play it safe and reduce the number of running VMs to relieve the RAM pressure. I assume there is a reason you want more RAM but the only way to get there is to reduce it first and that doesn't imply "unnoticeable".

On May 20, 2020 2:48:24 AM GMT+03:00, thomas@hoberg.net wrote:
Just like Strahil, I would expect this to work just fine. RAM differences are actually the smallest concern, unless you run out of it in the mean-time. And there you may want to be careful and perhaps move VMs around manually with such a small HCI setup.
oVirt will properly optimize the VMs and the hosts to fit, but I don't know what it will do when there simply isn't enough RAM to run the live VMs. Under the best of circumstances it should refuse to shut down the node you want to upgrade. Under less advantagious circumstances some VMs might get paused, or shut down (or killed?). I'd be interested to hear your experiences, somewhat less inclinded to try myself ;-)
I'd play it safe and reduce the number of running VMs to relieve the RAM pressure. I assume there is a reason you want more RAM but the only way to get there is to reduce it first and that doesn't imply "unnoticeable". _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BFCUPGDOSC3NLH...
Actually, If gou set a host into maintenance and there is no space to move - oVirt fails to migrate the VM and the node cannot be put into maintenance. In such case I usually power off the non-important VMs and then the node goes into the maintenance mode (I guess I was fast enough, as there is some timeout setting for going into that mode). From there, you will be able to power off, upgrade and return the node in the cluster. If a node reaches the 90% of memory usage, the ksm service kicks in and starts merging the same memory pages in order to save memory. Of course this is a trade of cpu cycles for extra memory - but I haven't seen a hypervisor that is running out of cpu (if you host HPC VMs could change that).Don't expect magic from KSM, but in my case (32GB) it gained 5-6 GB from my linux VMs, but this depends on the software in the VMs. Best Rregards, Strahil Nikolov

On Tue, May 19, 2020 at 10:18 AM <souvaliotimaria@mail.com> wrote:
Hello everyone,
I have an oVirt 4.3.2.5 hyperconverged 3 node production environment and we want to add some RAM to it.
Can I upgrade the RAM without my users noticing any disruptions and keep the VMs running?
The way I thought I should do it was to migrate any running VMs to the other nodes, then set one node in maintenance mode, shut it down, place the new memory, bring it back up, remove it from maintenance mode and see how the installation reacts and repeat for the other two nodes. Is this correct or should I follow another way?
Will there be a problem during the time when the nodes will not be identical in their resources?
Thank you for your time, Souvalioti Maria
Hi, as far as storage is concerned, are you using arbiter on one of the nodes (keeping only metadata on it and so only 2 copies of data) or a full copy on all of them? See here for more info and details: https://www.ovirt.org/develop/release-management/features/gluster/gluster-ar... If you are using arbiter on one node, when you are updating one of the other two nodes, you will have only 1 copy of data active, so take care of it. Also, if your VMs make heavy I/O, keep in mind that when rejoining one "data" node it could take time for it to align the data, depending on what has been processed during its downtime, so wait for all healing operations to have completed, before proceeding with the next data node. You can see healing information on a node for its volumes with the command gluster volume heal volume_name info More info here: https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/... Not verified on the field, but in theory oVirt should run those commands under the curtains for you and not let you put a host into maintenance if there are healing operations that need to be completed. See here: https://www.ovirt.org/develop/release-management/features/gluster/gluster-se... HIH, Gianluca

Hello again, Hope everyone's ok. I'm really sorry for being so late, but I came back with the update. We wanted to upgrade the memory because we plan on deploying several VMs on the platform, mostly for services, and thought that now was a better time to do the upgrade than later on. I manually migrated some of the VMs on one another node, trying to keep the percentage of the physical memory of each node somewhat equal. The upgrade of the memory of the nodes happened with no problem (from 32GB to 72GB). The VMs that were active on the time (8 VMs including HE) didn't show any kind of downtime, slow-down or any other issue. The percentage of the memory of the two on-line nodes at the time was around 75-80%, with the HE consuming the most. The upgrade happened right after I had to replace a failed SAS HDD (hot-plug) which contained the mirror of the HDD with the ovirt node OS. Everything went as my team and I hoped with no problems on either side. For the storage; the deployment we have is GlusterFS with one node as arbiter. When the node rejoined the others, it took around 8-10 mins for the healing operations to be completed and since then everything's going perfectly well. Thank you all for the time you took to respond to me and the valuable information you shared. It means a lot. Best regards, Maria
participants (4)
-
Gianluca Cecchi
-
souvaliotimaria@mail.com
-
Strahil Nikolov
-
thomas@hoberg.net