Xcp-ng, first impressions as an oVirt HCI alternative

Comments & motivational stuff were moved to the end... Source/license: Xen the hypervisor is moved to the Linux foundation. Perpetual open source, free to use. Xcp-ng is a distribution of Xen, produced by a small French company based on Xen using (currently) a Linux 4.19 LTS kernel and an EL7 frozen userland from July 2020: they promise open source and free to use forever Mode of operation: You install Xcp-ng on your (bare metal) hardware. You manage nodes via XenOrchestrator, which is a big Node.js application you can run in pretty much any way you want. Business model: You can buy support for Xcp-ng at different levels of quality. XenOrchestrator as AN APPLIANCE exposes different levels of functionality depending on the level of support you buy. But you get the full source code of the appliance and can compile it yourself to support the full set of qualities. There is a script out there, which allows you to auto-generate the appliance with a single command. In short you are never forced to pay, but better help can be purchased. How does it feel to a CentOS/RHEL user? The userland on the nodes is EL7, but you shouldn't touch that. CLI is classic Xen, nothing like KVM or oVirt. I guess libVirt and virsh should be similar, if they live up to their promise at all. Standard user-land on Orchestrator appliance is Debian, but you can build it on pretty much any Linux with that script: all interaction is meant to be done via Web-UI. Installation/setup: There is an image/ISO much like the oVirt node image. Based on a Linux 4.19 LTS kernel and an EL7 frozen userland from July 2020 and a freshly maintained Xen with tools. Installation on bare metal or VMs (e.g. for nested experimentation) is a snap, HCL isn't extraordinary. I'm still fighting to get the 2.5/5GBit USB3 NIC working that I like using for my smallest test systems. A single command on one node will download the "free"-Orchestrator appliance (aka Xoa) and install it as a VM on that node. It's installed as auto-launch and just point your brower to its IP to start with the GUI. There is various other ways to build or run the GUI, which can be run on anything remotely Linux, within the nodes or outside: more on this in the next section. The management appliance (Xoa) will run with only 2GB of RAM and 10GB of disk for a couple of hosts. You grow to dozens of hosts, give a little more RAM and it will be fine. Compare to the oVirt management engine it's very, very light seems to have vastly less parts that can break. And if it does, that doesn't matter, because it is pretty much stateless. E.g. pool membership and configuratoin is on the nodes, so if you connect from another Xoa they will just carry over. Ditto storage, that configuration which oVirt keeps in the management engines Postgres database, is on the nodes in Xcp and can be changed by any connected Xoa. Operation: Xen nodes are much more autonomous then oVirt hosts. The use whatever storage they might have locally, or attached via SAN/NAS/Gluster[!!!] and others.They will operate without a management engine much like "single node HCI oVirt" or they can be joined into a pool, which opens up live migration and HA. A pool is created by telling a node that it's the master now and then adding other nodes to join in. The master can be changed and nodes can be moved to other pools. Adding and removing nodes to a pool is very quick and easy and it's the same for additional storage repositories: Any shared storage added to any node is immediately visible to the pool and disks can be flipped between local and shared storage very easily (I haven't tried live disk moves, but they could work). Having nodes in a pool qualfies them for live migration (CPU architecture caveats apply). If storage is local, it will move with the VM, if storage is shared, only RAM will move. You can also move VMs not sharing a pool and even across different x86 variants (e.g. AMD and Intel), when VMs are down. If you've ever daddled with "Export domains" or "Backup domains" in oVirt, you just can't believe how quick and easy these things are in Xcp-ng. VMs and their disks can be moved, copied, cloned, backed-up and restore with a minimum of fuzz including continuous backups on running machines. You can label machines as "HA" so they'll always be restarted elsewhere, should a host go down. You can define policies for how to balance workloads across hosts and ensure that HA pairs won't share a host, pretty similar to oVirt. The "free" Xoa has plenty of "upgrade!" buttons all over the place. So I went ahead and build an appliance from source, that doesn't have these restrictions, just to see what that would get me. With this script here: https://github.com/ronivay/XenOrchestraInstallerUpdater you can build the Xoa on any machine/VM you happen to be running with one of the many supported Linux variants. I build one variant to run as a VM on Xcp-ng and I used another to run on an external machine: All three Xoas don't step on each others toes more than you make them: very, very cool! The built-in hypervisor interfaces and VIO drivers of almost any modern 64-Bit Linux make VMs very easy. For Windows guests there are drivers available, which make things as easy as with KVM. Unfortunately they don't seem to be exactly the same, so you may do better to remove guest drivers on machines you want to move. BTW good luck on trying to import OVAs exported from oVirt: Those contain tags, no other hypervisor wants to accept, sometimes not even oVirt itself. Articles on the forum suggest running CloneZilla at both ends for a strorage-less migration of VMs. I have not yet tested nested virtualization inside Xcp-ng (Xcp-ng works quite well in a nested environment), nor have I done tests with pass-through devices and GPUs. All of that is officially supported, with the typical caveats in case of older Nvidia drivers. So far every operation was quick and every button did what I expected it to do. When things failed, I didn't have to go through endless log files to find out what was wrong: the error messages were helpful enough to find the issues so far. Caveats: I haven't done extensive failure testing yet, except one: I've been running four host nodes as VMs on VMware workstation, with XCP-ng VMs being nested inside. At one point I managed to run out of storage and I had to shut down all "bare metal VMs" hard. Everything came back clean, only the migration project that had tripped the storage failure had to be restarted, after I created some space. There is no VDO support in the kernel provided. It should be easy enough to add given the sources and build scripts. But the benefit and the future of VDO are increasingly debated. The VMs use the VDI disk format, which is less powerful/flexible as CQOW with regards to thin allocation and trimming. It's on the roadmap, not yet in the product. Hyperconverged Storage: Xcp-ng so far doesn't come with a hyperconvered storage solution built in. Like so many clusters it moves the responsability to your storage layer (could be Gluster...). They are in the process of making LINSTOR a directly supported HCI option, even with such features as dispersed (erasure-coded) volumes to manage the write amplification/resilience as the node numbers increase beyond three. That's not ready today and they seem to take their sweet time about it. They label it "XOASAN" and want to make that a paid option. But again, the full source code is there and in the self-compiled Xoa appliance you should be able to use it for free. The current beta release only supports replicated mode and only up to four nodes. But it seems to work reliably. Write amplification is 4x, so bandwidth drops to 25% and is limited to the network speed, but reads will go to the local node at storage hardware bandwidths. The 2, 3 and 4 node replicated setup works today with the scripts they provide. That's not quite as efficient as the 2R+1A setup in oVirt, but seems rock solid and just works, which was never that easy in oVirt. Hyperconverged is very attractive at 3 nodes, because good fault resilience can't be done any cheaper. But the industry agrees that it tends to lose financial attraction when you grow to dozens of machines and nobody in is right mind would operate a real cloud using HCI. But going from 3 to say two dozen should be doable and easy: it never was for oVirt. XCP-ng or rather LINSTOR won't support any number or storage nodes or easy linear increases like Gluster could (in theory). So far it's only much, much better at getting things going. Forum and community: The documentation is rather good, but can be lighter on things which are "native Xen", as that would repeat a lot of the effort already expended by Citrix. It's good that I've been around the block a couple of times already since VM/370, but there are holes or missing details when you need to ask questions. The community isn't giant, but comfortably big enough. The forum's user interface is vastly better than this one, but then I've never seen something as slow as ovirt.org for a long time. Technical questions are answered extremely quickly and mostly by the staff from the small French company themselves. But mostly it's much easier to find answers to questions already asked, which is the most typical case. The general impression is that there are much fewer moving parts and things things that can go wrong. There is no Ansible, not a single daemon on the nodes, a management engine that seems very light and with minimal state that doesn't need a DBA to hold and manage it. The Xen hypervisor seems much smarter than KVM on its own and the Xoa has both a rich API to do things, but also offers an API to next-level management. It may have less overall features than oVirt, but I haven't found anything that I really missed. It's much easier and quicker to install and operate with nothing but GUI which is a godsent: I want to use the farm, not spend my time managing it. Motivational rant originally at the top... tl;dr My original priorities for chosing oVirt were: 1. CentOS as RHEL downstream -> stable platform, full vendor vulnerability management included, big benefit in compliance 2. Integrated HCI -> Just slap a couple (3/6/9) of leftover servers together for something fault resilient, no other components required, quick start out of the box solution with a few clicks in a GUI 3. Fully open source (& logs), can always read the source to understand what's going on, better than your typical support engineer 4. No support or license contract, unless you want/need it, but the ability to switch that on when it paid for itself The more famous competitors, vSphere and Nutanix didn't offer any of that. (Citrix) Xen I excluded, because a) Xen seemd "old school+niche" compared to KVM b) Citrix reduced "free" to "useless" I fell in love with Gluster from its design: it felt like a really smart solution. I fell out with Gluster from its operation and performance: I can't count how many times I had to restart daemons and issue "gluster heal" commands to resettle things after little more than a node update. I rediscovered Xcp-ng when I discovered that HCI and RHV had ben EOLd.

Hello, I have read several pessimistic posts of you, each time to be against the decisions of the ovirt community that you disagree. In general, my thoughts are that you want the community to be responsible for the Redhat descisions. Like you I find RHV end of support very sad, but I unlike you I believe oVirt is so an incredible software, and also an example of opensource success project that, I do believe, will survive thanks to awesome people who have been contributing it for 10 years. Other downstream projects like OLVM decided to switch from Xen to KVM. I've been working with ovirt since the very beginning in a large production success. Every people in my IT team are convinced that oVit makes our IT very stable and flexible (more than 300 VMs), and we convinced some partner to adopt it as well. My pain is that the oVirt project is underrated in comparison of the quality of the code, but you and I are actors of its popularity. I initially was not a developer, but thanks to oVirt, I'm now able to write complex playbooks for automatic deployments, as well I'm able now to debug python code. What I mean is that the project depends on the community members contribution, each one with his own capacity. For my own, I can help many beginners on the mailing list with simple tips as well as some others can translate into other languages. Yes, I am aware of competitors projects like proxmox, xen and now XCP-NG. There is no perfect project. Everybody should be involved into a project that corresponds to its expectations. In reality, I wonder about the goal of your posts, it seems that nothing goes into the good direction from your point of view... Did you contribute to change that? Did you pay anything to be so demanding? Thank you to all community for providing such a wonderful software, and a specific mention to community leaders (Sandro?) and other contributors, we need positive attitudes. If some beginners read my post, I want to tell them they are welcome, and they can be sure to find some quality into code, into updates, into innovation, into entreprise features, into mailing list support, and they are welcome to contribute to make virtualization greater and greater! Le 14/02/2022 à 20:28, Thomas Hoberg a écrit :
Comments & motivational stuff were moved to the end...
Source/license: Xen the hypervisor is moved to the Linux foundation. Perpetual open source, free to use. Xcp-ng is a distribution of Xen, produced by a small French company based on Xen using (currently) a Linux 4.19 LTS kernel and an EL7 frozen userland from July 2020: they promise open source and free to use forever
Mode of operation: You install Xcp-ng on your (bare metal) hardware. You manage nodes via XenOrchestrator, which is a big Node.js application you can run in pretty much any way you want.
Business model: You can buy support for Xcp-ng at different levels of quality. XenOrchestrator as AN APPLIANCE exposes different levels of functionality depending on the level of support you buy. But you get the full source code of the appliance and can compile it yourself to support the full set of qualities. There is a script out there, which allows you to auto-generate the appliance with a single command. In short you are never forced to pay, but better help can be purchased.
How does it feel to a CentOS/RHEL user? The userland on the nodes is EL7, but you shouldn't touch that. CLI is classic Xen, nothing like KVM or oVirt. I guess libVirt and virsh should be similar, if they live up to their promise at all. Standard user-land on Orchestrator appliance is Debian, but you can build it on pretty much any Linux with that script: all interaction is meant to be done via Web-UI.
Installation/setup: There is an image/ISO much like the oVirt node image. Based on a Linux 4.19 LTS kernel and an EL7 frozen userland from July 2020 and a freshly maintained Xen with tools. Installation on bare metal or VMs (e.g. for nested experimentation) is a snap, HCL isn't extraordinary. I'm still fighting to get the 2.5/5GBit USB3 NIC working that I like using for my smallest test systems.
A single command on one node will download the "free"-Orchestrator appliance (aka Xoa) and install it as a VM on that node. It's installed as auto-launch and just point your brower to its IP to start with the GUI.
There is various other ways to build or run the GUI, which can be run on anything remotely Linux, within the nodes or outside: more on this in the next section.
The management appliance (Xoa) will run with only 2GB of RAM and 10GB of disk for a couple of hosts. You grow to dozens of hosts, give a little more RAM and it will be fine. Compare to the oVirt management engine it's very, very light seems to have vastly less parts that can break.
And if it does, that doesn't matter, because it is pretty much stateless. E.g. pool membership and configuratoin is on the nodes, so if you connect from another Xoa they will just carry over. Ditto storage, that configuration which oVirt keeps in the management engines Postgres database, is on the nodes in Xcp and can be changed by any connected Xoa.
Operation: Xen nodes are much more autonomous then oVirt hosts. The use whatever storage they might have locally, or attached via SAN/NAS/Gluster[!!!] and others.They will operate without a management engine much like "single node HCI oVirt" or they can be joined into a pool, which opens up live migration and HA. A pool is created by telling a node that it's the master now and then adding other nodes to join in. The master can be changed and nodes can be moved to other pools. Adding and removing nodes to a pool is very quick and easy and it's the same for additional storage repositories: Any shared storage added to any node is immediately visible to the pool and disks can be flipped between local and shared storage very easily (I haven't tried live disk moves, but they could work).
Having nodes in a pool qualfies them for live migration (CPU architecture caveats apply). If storage is local, it will move with the VM, if storage is shared, only RAM will move.
You can also move VMs not sharing a pool and even across different x86 variants (e.g. AMD and Intel), when VMs are down. If you've ever daddled with "Export domains" or "Backup domains" in oVirt, you just can't believe how quick and easy these things are in Xcp-ng. VMs and their disks can be moved, copied, cloned, backed-up and restore with a minimum of fuzz including continuous backups on running machines.
You can label machines as "HA" so they'll always be restarted elsewhere, should a host go down. You can define policies for how to balance workloads across hosts and ensure that HA pairs won't share a host, pretty similar to oVirt.
The "free" Xoa has plenty of "upgrade!" buttons all over the place. So I went ahead and build an appliance from source, that doesn't have these restrictions, just to see what that would get me.
With this script here: https://github.com/ronivay/XenOrchestraInstallerUpdater you can build the Xoa on any machine/VM you happen to be running with one of the many supported Linux variants.
I build one variant to run as a VM on Xcp-ng and I used another to run on an external machine: All three Xoas don't step on each others toes more than you make them: very, very cool!
The built-in hypervisor interfaces and VIO drivers of almost any modern 64-Bit Linux make VMs very easy. For Windows guests there are drivers available, which make things as easy as with KVM. Unfortunately they don't seem to be exactly the same, so you may do better to remove guest drivers on machines you want to move. BTW good luck on trying to import OVAs exported from oVirt: Those contain tags, no other hypervisor wants to accept, sometimes not even oVirt itself. Articles on the forum suggest running CloneZilla at both ends for a strorage-less migration of VMs.
I have not yet tested nested virtualization inside Xcp-ng (Xcp-ng works quite well in a nested environment), nor have I done tests with pass-through devices and GPUs. All of that is officially supported, with the typical caveats in case of older Nvidia drivers.
So far every operation was quick and every button did what I expected it to do. When things failed, I didn't have to go through endless log files to find out what was wrong: the error messages were helpful enough to find the issues so far.
Caveats: I haven't done extensive failure testing yet, except one: I've been running four host nodes as VMs on VMware workstation, with XCP-ng VMs being nested inside. At one point I managed to run out of storage and I had to shut down all "bare metal VMs" hard. Everything came back clean, only the migration project that had tripped the storage failure had to be restarted, after I created some space.
There is no VDO support in the kernel provided. It should be easy enough to add given the sources and build scripts. But the benefit and the future of VDO are increasingly debated.
The VMs use the VDI disk format, which is less powerful/flexible as CQOW with regards to thin allocation and trimming. It's on the roadmap, not yet in the product.
Hyperconverged Storage: Xcp-ng so far doesn't come with a hyperconvered storage solution built in. Like so many clusters it moves the responsability to your storage layer (could be Gluster...).
They are in the process of making LINSTOR a directly supported HCI option, even with such features as dispersed (erasure-coded) volumes to manage the write amplification/resilience as the node numbers increase beyond three. That's not ready today and they seem to take their sweet time about it. They label it "XOASAN" and want to make that a paid option. But again, the full source code is there and in the self-compiled Xoa appliance you should be able to use it for free.
The current beta release only supports replicated mode and only up to four nodes. But it seems to work reliably. Write amplification is 4x, so bandwidth drops to 25% and is limited to the network speed, but reads will go to the local node at storage hardware bandwidths.
The 2, 3 and 4 node replicated setup works today with the scripts they provide. That's not quite as efficient as the 2R+1A setup in oVirt, but seems rock solid and just works, which was never that easy in oVirt.
Hyperconverged is very attractive at 3 nodes, because good fault resilience can't be done any cheaper. But the industry agrees that it tends to lose financial attraction when you grow to dozens of machines and nobody in is right mind would operate a real cloud using HCI. But going from 3 to say two dozen should be doable and easy: it never was for oVirt.
XCP-ng or rather LINSTOR won't support any number or storage nodes or easy linear increases like Gluster could (in theory). So far it's only much, much better at getting things going.
Forum and community: The documentation is rather good, but can be lighter on things which are "native Xen", as that would repeat a lot of the effort already expended by Citrix. It's good that I've been around the block a couple of times already since VM/370, but there are holes or missing details when you need to ask questions.
The community isn't giant, but comfortably big enough. The forum's user interface is vastly better than this one, but then I've never seen something as slow as ovirt.org for a long time.
Technical questions are answered extremely quickly and mostly by the staff from the small French company themselves. But mostly it's much easier to find answers to questions already asked, which is the most typical case.
The general impression is that there are much fewer moving parts and things things that can go wrong. There is no Ansible, not a single daemon on the nodes, a management engine that seems very light and with minimal state that doesn't need a DBA to hold and manage it. The Xen hypervisor seems much smarter than KVM on its own and the Xoa has both a rich API to do things, but also offers an API to next-level management.
It may have less overall features than oVirt, but I haven't found anything that I really missed. It's much easier and quicker to install and operate with nothing but GUI which is a godsent: I want to use the farm, not spend my time managing it.
Motivational rant originally at the top... tl;dr
My original priorities for chosing oVirt were: 1. CentOS as RHEL downstream -> stable platform, full vendor vulnerability management included, big benefit in compliance 2. Integrated HCI -> Just slap a couple (3/6/9) of leftover servers together for something fault resilient, no other components required, quick start out of the box solution with a few clicks in a GUI 3. Fully open source (& logs), can always read the source to understand what's going on, better than your typical support engineer 4. No support or license contract, unless you want/need it, but the ability to switch that on when it paid for itself
The more famous competitors, vSphere and Nutanix didn't offer any of that.
(Citrix) Xen I excluded, because a) Xen seemd "old school+niche" compared to KVM b) Citrix reduced "free" to "useless"
I fell in love with Gluster from its design: it felt like a really smart solution. I fell out with Gluster from its operation and performance: I can't count how many times I had to restart daemons and issue "gluster heal" commands to resettle things after little more than a node update.
I rediscovered Xcp-ng when I discovered that HCI and RHV had ben EOLd. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/H4O4MHM5MVTHKR...
-- Nathanaël Blanchet Supervision réseau SIRE 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr

On Monday, February 14th, 2022 at 3:03 PM, Nathanaël Blanchet <blanchet@abes.fr> wrote:
If some beginners read my post, I want to tell them they are welcome, and they can be sure to find some quality into code, into updates, into innovation, into entreprise features, into mailing list support, and they are welcome to contribute to make virtualization greater and greater!
I'm new and I found this post very helpful. And, thank you for the welcome! :) Cheers, Glen

Il giorno mar 15 feb 2022 alle ore 00:04 Nathanaël Blanchet < blanchet@abes.fr> ha scritto:
Hello,
I have read several pessimistic posts of you, each time to be against the decisions of the ovirt community that you disagree. In general, my thoughts are that you want the community to be responsible for the Redhat descisions.
Like you I find RHV end of support very sad, but I unlike you I believe oVirt is so an incredible software, and also an example of opensource success project that, I do believe, will survive thanks to awesome people who have been contributing it for 10 years. Other downstream projects like OLVM decided to switch from Xen to KVM.
I've been working with ovirt since the very beginning in a large production success. Every people in my IT team are convinced that oVit makes our IT very stable and flexible (more than 300 VMs), and we convinced some partner to adopt it as well. My pain is that the oVirt project is underrated in comparison of the quality of the code, but you and I are actors of its popularity. I initially was not a developer, but thanks to oVirt, I'm now able to write complex playbooks for automatic deployments, as well I'm able now to debug python code. What I mean is that the project depends on the community members contribution, each one with his own capacity. For my own, I can help many beginners on the mailing list with simple tips as well as some others can translate into other languages.
Yes, I am aware of competitors projects like proxmox, xen and now XCP-NG. There is no perfect project. Everybody should be involved into a project that corresponds to its expectations.
In reality, I wonder about the goal of your posts, it seems that nothing goes into the good direction from your point of view... Did you contribute to change that? Did you pay anything to be so demanding?
Thank you to all community for providing such a wonderful software, and a specific mention to community leaders (Sandro?) and other contributors, we need positive attitudes.
If some beginners read my post, I want to tell them they are welcome, and they can be sure to find some quality into code, into updates, into innovation, into entreprise features, into mailing list support, and they are welcome to contribute to make virtualization greater and greater!
Thanks Nathanaël, this is a great post!
-- Nathanaël Blanchet
Supervision réseau SIRE 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

I think that Nathaniel is right. Yes, the project can be better (which project can't) , and despite the hardship with a major contributor leaving oVirt - I believe that the community can keep it going. I personally picked oVirt , not to become proficient in RHV, nor because it has a leading market share - but because it's open source and efficient. If I had more time, I would have contributed more into the Ansible roles which still need some polishing but I hope I will be able to do that in the next months. Best Regards,Strahil Nikolov On Tue, Feb 15, 2022 at 12:24, Sandro Bonazzola<sbonazzo@redhat.com> wrote: _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DRQMHCDCQIXCU2...

Am I pessimistic about the future of oVirt? Quite honestely, yes. Do I want it to fail? Absolutely not! In fact I wanted it to be a viable and reliable product and live up to its motto "designed to manage your entire enterprise infrastructure". It turned out to be very mixed: It has bugs, I consider rather debilitating. It has also survived critical failures in hardware, that would have resulted in data loss and didn't because Gluster provided replicas that survived. I have reported on both success and failure. You look through my posts, you will find them both. My impression is that leaders in the oVirt community have not been transparent enough about the quality of the support they provide to the various parts. E.g. it was only very recently that Nir wrote, that HCI was only ever supported by Gluster contributors and not tested by the oVirt core teams. I believe that the EOL of the downstream products will accelerate the dwindling of the communty Sandro has described in his post. I honestly want to be wrong, after all I am losing years of work and expertise. So I posted this report on Xcp-ng, because it may be an options, who like me cannot operate on hope alone, but need to provide a service their users can trust to have a future without an EOL already formulated. I would recommend that you all do a proper assessment of both platforms and potentially others out there and learn from each other. With factionism the state of the art cannot progress.

On Tue, Feb 15, 2022 at 8:50 PM Thomas Hoberg <thomas@hoberg.net> wrote:
Am I pessimistic about the future of oVirt? Quite honestely, yes.
Do I want it to fail? Absolutely not! In fact I wanted it to be a viable and reliable product and live up to its motto "designed to manage your entire enterprise infrastructure".
It turned out to be very mixed: It has bugs, I consider rather debilitating. It has also survived critical failures in hardware, that would have resulted in data loss and didn't because Gluster provided replicas that survived.
I have reported on both success and failure. You look through my posts, you will find them both.
My impression is that leaders in the oVirt community have not been transparent enough about the quality of the support they provide to the various parts. E.g. it was only very recently that Nir wrote, that HCI was only ever supported by Gluster contributors and not tested by the oVirt core teams.
For quite some time, ovirt-system-tests did test also HCI, routinely. Admittedly, this flow never had the breadth of the "plain" (separate storage) flows. oVirt and Gluster teams did work in cooperation. I am not sure that Nir's statement is actually that significant. It clarifies more the organizational structure within Red Hat than the intended quality of all relevant projects (and products). You can see Red Hat's lifecycle pages for the relevant products to see Red Hat's plans for them. I think it was already clear, but if not, clarifying: From Red Hat's POV, the replacement of Gluster is Ceph, and the replacement of oVirt is kubevirt/OpenShift Virtualization/OKD Virtualization. This definitely does not mean that oVirt is intended to die - on the contrary - quite a lot of what we did in recent months was in order to make it easier to allow oVirt to survive after Red Hat's involvement diminishes.
I believe that the EOL of the downstream products will accelerate the dwindling of the communty Sandro has described in his post. I honestly want to be wrong, after all I am losing years of work and expertise.
So I posted this report on Xcp-ng, because it may be an options, who like me cannot operate on hope alone, but need to provide a service their users can trust to have a future without an EOL already formulated.
I would recommend that you all do a proper assessment of both platforms and potentially others out there and learn from each other.
With factionism the state of the art cannot progress.
Agreed. Best regards, -- Didi

On Tue, Feb 15, 2022 at 8:50 PM Thomas Hoberg <thomas(a)hoberg.net> wrote:
For quite some time, ovirt-system-tests did test also HCI, routinely. Admittedly, this flow never had the breadth of the "plain" (separate storage) flows.
I've known virtualization from the days of the VM/370. And I immediately ran and bought the very first VMware [1.0] product, when they did this nice trick of using the SMM and binary translation to implement a hypervisor on an architecture that excluded VM (except the virtual 8086) [IMHO] on purpose. I did follow the hypervisor wars, but settled on OpenVZ because it delivered compliance, which wasn't firmly established for hypervisors at the time and much better consolidation. Redhat then took a very arrogant stance against containers at the time, perhaps because Parallels (OpenVZ) and LXC (Canonical) were competitors or simply because it had just "won" with Qumranet/KVM against Citrix/Xen: I could not convince my Redhat contacts that it wasn't an either/or choice, but that both approaches were complimentary. Well SuSE didn't listen either, nor did Oracle for that matter. It took Docker to break that lose and it took Google brushing up their let-me-containerise-that into Kubernetes to burn Redhat into action and ditch their first OpenShift (sorry, if I don't remember every detail). Nobody had a perfect grasp of where things were going nor the perfect product offer and there was quite a lot of politics involved as well: not every decision was based on technical merit. I came back to VM orchestration because I went from compliance driven production architecture to supporting our research labs, which needed GPU support for ML and the ability to run set of PoC machines, both stable and without (compliance oriented) production support teams. They also needed transportable setups, that could be shipped around the world and operated without local support. oVirt offered a turn-key HCI solution with a reasonable minimum of three nodes, which could be scaled up or out within one or two orders of computing power magnitude. Any single fault could reasonably survive a day or week-end, until a spare box could be obtained locally and put in place. Even a double fault wouldn't necessarily result in a complete loss or shutdown, just a define minimal operations mode, ready for self-recovery, once the hardware was replaced: so perfect, in theory, I was positively in love... That love is at the root of my current disillusionment, for sure, but it was you who showed the "entire enterprise" t*ts! At the same time it offered the ability grow into something much bigger with dedicated storage (and skills), because I understood well enough, that you won't run a cloud on HCI. oVirt was like a hybrid car and could run on batteries or fuel. Most importantly it offered running things in a "labs mode" using oVirt and CentOS and in a "production mode" using RHEL and RHV, so if one of your prototypes got customers, we could just switch it over to our production teams who know how to handle auditors looking into every nook and cranny. So the hybrid car could have air-bags and ESP all around for the Autobahn, or sand buggies in the dunes. These four options are gone now, rather suddenly and with little warning; there is only one variant left, which is in full metamorphosis. It's no longer hybrid, HCI is gone. And there is no longer a choice between a compliant safe version and the devil-may-care agile variant, that won't always run after every change and close any vulnerability before the cyberwar reaches you. OpenShift with kubevirt may well point much better in the direction of where the industry is going, but it's far from the turnkey fault resilient set of boxes, which you could just put everywhere and have fixed by people who'd just unpack a replacement box and rewire three cables, letting the HCI cluster heal itself. It's also a complete inversion where etcd runs VMs as if they were containers while oVirt ran VMs, ...that might have containers in them. And that may be a good thing to have, for those who want that. And that even includes me and my company, who is running OpenShift on RHEL in the big data centers. It's just that for the labs and in the field, this doesn't fit, an HCI setup is what we need below, while we'll happily run OpenShift and (nested) kubevirt on top, to ensure our software is keeping with the times as it evolves. Now that it's finally in the open that 3/4 oVirt/RHV/HCI options are gone, I'd like to help those who like me need one or three of them for their business. You should help and support that, because you shouldn't want happy oVirt'lers to be stranded and potentially angry. There isn't even an issue of competition any more, because the Xen guys aren't selling Kubernetes, just running that, with CoreOS, RHEL or whatnot inside VMs.
oVirt and Gluster teams did work in cooperation. I am not sure that Nir's statement is actually that significant. It clarifies more the organizational structure within Red Hat than the intended quality of all relevant projects (and products). You can see Red Hat's lifecycle pages for the relevant products to see Red Hat's plans for them.
I think it was already clear, but if not, clarifying: From Red Hat's POV, the replacement of Gluster is Ceph, and the replacement of oVirt is kubevirt/OpenShift Virtualization/OKD Virtualization. This definitely does not mean that oVirt is intended to die - on the contrary - quite a lot of what we did in recent months was in order to make it easier to allow oVirt to survive after Red Hat's involvement diminishes. Please put that on the home page, update the Wikis, tell the story. It's not a bad story, it's just a complete rewrite of the original book as a vSphere inspired Open Source blend.
And I'd love to have an open discussion about why oVirt couldn't also be *the* HCI solution that everybody has at home running on three Raspberry PI for a bullet proof home edge. Better yet: How can it become that going forward?
Agreed.
Best regards,

On Mon, 14 Feb 2022, Thomas Hoberg wrote:
Xen nodes are much more autonomous then oVirt hosts. The use whatever storage they might have locally, or attached via SAN/NAS/Gluster[!!!] and others. They will operate without a management engine [...] Any shared storage added to any node is immediately visible to the pool and disks can be flipped between local and shared storage very easily
It may be worth pointing out that these are not limitations of libvirt (which can even live-migrate VMs between non-shared storage), but only of oVirt. The impression I've got from this mailing list is they are intentional design decisions to enforce "correctness" of the cluster. I do feel like it comes up often enough on the mailing list that it would be nice if there was a way to designate a multi-host cluster as having both local and shared storage. I think the flexibility would help those running dev or small prod clusters. Maybe someone just needs to step up and write a patch for it?

The impression I've got from this mailing list is they are intentional design decisions to enforce "correctness" of the cluster.
My understanding of cluster (ever since the VAX) is that it's a fault-tolerance mechanism and that was originally one of the major selling points of these hypervisor orchestration suites like vSphere, RHV etc. Since that doesn't quite work with local storage, I understand why they left it out for live migration. AFAIK it works just fine for VMs that are down or disks that are not attached.
participants (7)
-
Glen Jarvis
-
Nathanaël Blanchet
-
Sandro Bonazzola
-
Sketch
-
Strahil Nikolov
-
Thomas Hoberg
-
Yedidyah Bar David