Dear oVirt Gurus,
Using the oVirt user VM portal seems to not work through the squid proxy setup (configured as per the guide). The page loads and login works fine through the proxy, but the asynchronous requests just hang. I've attached a screenshot, but you can see the "api" endpoint just hanging in a web inspector:
This works fine when not going through the proxy.
Is there a way to force noVNC HTML as the console mode through the web-ui, or at least have it as an option if not default?
The console seems not to work when logged in with a base 'user role'.
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
I'm running Version 220.127.116.11-1.el7, and after reboot the engine machine no longer could login into administration portal with this error:
sun.security.validator.ValidatorException: PKIX path validation faile
java.security.cert.CertPathValidatorException: validity check failed
I'm using a self signed cert.
Once again my production ovirt cluster is collapsing in on itself. My
servers are intermittently unavailable or degrading, customers are noticing
and calling in. This seems to be yet another gluster failure that I
haven't been able to pin down.
I posted about this a while ago, but didn't get anywhere (no replies that I
found). The problem started out as a glusterfsd process consuming large
amounts of ram (up to the point where ram and swap were exhausted and the
kernel OOM killer killed off the glusterfsd process). For reasons not
clear to me at this time, that resulted in any VMs running on that host and
that gluster volume to be paused with I/O error (the glusterfs process is
usually unharmed; why it didn't continue I/O with other servers is
confusing to me).
I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and
data-hdd). The first 3 are replica 2+arb; the 4th (data-hdd) is replica
3. The first 3 are backed by an LVM partition (some thin provisioned) on
an SSD; the 4th is on a seagate hybrid disk (hdd + some internal flash for
acceleration). data-hdd is the only thing on the disk. Servers are Dell
R610 with the PERC/6i raid card, with the disks individually passed through
to the OS (no raid enabled).
The above RAM usage issue came from the data-hdd volume. Yesterday, I
cought one of the glusterfsd high ram usage before the OOM-Killer had to
run. I was able to migrate the VMs off the machine and for good measure,
reboot the entire machine (after taking this opportunity to run the
software updates that ovirt said were pending). Upon booting back up, the
necessary volume healing began. However, this time, the healing caused all
three servers to go to very, very high load averages (I saw just under 200
on one server; typically they've been 40-70) with top reporting IO Wait at
7-20%. Network for this volume is a dedicated gig network. According to
bwm-ng, initially the network bandwidth would hit 50MB/s (yes, bytes), but
tailed off to mostly in the kB/s for a while. All machines' load averages
were still 40+ and gluster volume heal data-hdd info reported 5 items
needing healing. Server's were intermittently experiencing IO issues, even
on the 3 gluster volumes that appeared largely unaffected. Even the OS
activities on the hosts itself (logging in, running commands) would often
be very delayed. The ovirt engine was seemingly randomly throwing engine
down / engine up / engine failed notifications. Responsiveness on ANY VM
was horrific most of the time, with random VMs being inaccessible.
I let the gluster heal run overnight. By morning, there were still 5 items
needing healing, all three servers were still experiencing high load, and
servers were still largely unstable.
I've noticed that all of my ovirt outages (and I've had a lot, way more
than is acceptable for a production cluster) have come from gluster. I
still have 3 VMs who's hard disk images have become corrupted by my last
gluster crash that I haven't had time to repair / rebuild yet (I believe
this crash was caused by the OOM issue previously mentioned, but I didn't
know it at the time).
Is gluster really ready for production yet? It seems so unstable to
me.... I'm looking at replacing gluster with a dedicated NFS server likely
FreeNAS. Any suggestions? What is the "right" way to do production
storage on this (3 node cluster)? Can I get this gluster volume stable
enough to get my VMs to run reliably again until I can deploy another
Property: default route, host - true, DC - false
I have 4 nics.
bond0 = 2 x 10g
eno1 = ovirtmgmt
eno2 = for vm traffic.
eno2 says it's out of sync, hosts network config differs from DC,
default route: host - true, DC - false.
I have tried "sync all networks" but the message still remains
Where can I look to fix the issue?
I am currently building a new virtualization cluster with oVirt, using
AMD EPYC processors (AMD EPYC 7351P). At the moment I'm running oVirt
Node Version 4.2.3 @ CentOS 7.4.1708.
We have the situation that the processor type is recognized as "AMD
Opteron G3". With this type of instruction set the VMs are not able to
do AES in hardware, this results in poor performance in our case.
I found some information that tells me that this problem should be
solved with CentOS 7.5
My actual questions:
- Are there any further information about the AMD EPYC support?
- Any information about an update of the oVirt node to CentOS 7.5?
we upgraded ovirt from version 4.1 to 4.2.6. Rebooted all vms.
We missed two vms that were at Cluster Compatibility Version 3.6.
There was a gluster/network IO problem and vms got paused. We were able
to recover all the other vms from the paused state but we have two vms
that won't run because:
"Cannot run VM. The Custom Compatibility Version of VM VM_NAME (3.6) is
not supported in Data Center compatibility version 4.1."
Can we force the CCV of the paused vm to 4.1?
I recently deployed oVirt in some hosts to evaluate it as one of my main virtualization tool. I like it a lot, but I don't know if this behaviour is normal: I just configured my iSCSI network against a IBM V7000, and from that point on, the host with the role "SPM" continuosly is reading from storage at about 7 or 8 Mbps. I don't know what it i doing, due to no storage domains are still created. How can I know what it is doing ?