4.4 bug? Seems 100% repeatable
by dan.creed@thecreeds.net
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, deployment errors: code 505: Host quigonn installation failed. Failed to configure management network on the host., code 1120: Failed to configure management network on host quigonn due to setup networks failure., code 9000: Failed to verify Power Management configuration for Host quigonn., code 10802: VDSM quigonn command HostSetupNetworksVDS failed: Internal JSON-RPC error: {'reason': 'Three or more nameservers are only supported when using either IPv4 or IPv6 nameservers but not both.'}, fix accordingly and re-deploy."}
This has me utterly confused, I only have 1 name server specified in /etc/resolv.conf
Not sure where it's getting this from?
4 years, 7 months
ovirt 4.4 GA
by dan.creed@thecreeds.net
Just downloaded the ISO 4.4. an hour ago from link on the home page, and FYI: Still has the pop-up during install that it is pre-release, etc..
4 years, 7 months
HostedEngine HA Engine Network Status
by Joseph Goldman
Hi List,
Running a 3 node setup for a client, i'm constantly having the
HostedEngine move itself around, whatever node its on ends up penalizing
its score so low that it forces a migrate to the other node.
Looking at /var/log/ovirt-hosted-engine-ha/agent.log shows a decent
amount of:
MainThread::INFO::2020-05-21
15:47:54,742::states::135::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 319 due to network status
What I want to know is how do I get more debug out of this to know
what network status its concerned about, so I can go about stablising it.
The system is heavily monitored with ping checks, never drops link and
never drops ICMP. None of its VM's falter accessing shared NFS space for
disk storage so I'm not sure what the concern is. The node will
literally over time penalise itself down to ~2000 and then HA agent will
want it to swap nodes. It's not necessarily a bad thing but generates a
heap of status emails multiple times a day which is just garbage - and
makes the HE unavailable sometimes when mid-admin task.
Any help is appreciated.
Thanks,
Joe
4 years, 7 months
Are the Host storage requirements changed in oVirt 4.4?
by Stefano Stagnaro
Seems like the new minimum local storage for an Host is now 63 GiB,
compared to the 55 GiB of the 4.3.
If so, the four Installation guides should be updated reflecting the new
value.
Thank you and congratulations for the new release.
— Stefano
4 years, 7 months
Re: oVirt 4.4.0 Release is now generally available
by Sandro Bonazzola
Il giorno mer 20 mag 2020 alle ore 15:07 Staniforth, Paul <
P.Staniforth(a)leedsbeckett.ac.uk> ha scritto:
>
>
> Thanks and well done everyone.
>
> No virtual release party ? 🙂
>
You're welcome helping organizing one :-) I would be happy to join!
I wonder how a virtual release party may look like?
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
[image: |Our code is open_] <https://www.redhat.com/en/our-code-is-open>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
4 years, 7 months
oVirt and Fedora
by Sandro Bonazzola
If you have followed the oVirt project for a few releases you already know
oVirt has struggled to keep the pace with the fast innovation cycles Fedora
Project is following.
Back in September 2019 CentOS project launched CentOS Stream as a rolling
preview of future RHEL kernels and features, providing an upstream
development platform for ecosystem developers that sits between Fedora and
RHEL.
Since then the oVirt project tried to keep the software working on Fedora,
CenOS Stream, and RHEL/CentOS but it became quickly evident the project
lacked resources to keep the project running on three platforms. Further,
our user surveys show that oVirt users strongly prefer using oVirt on
CentOS and RHEL.
With the upcoming end of life of Fedora 30 the oVirt project has decided to
stop trying to keep the pace with this amazing platform, focusing on
stabilizing the software codebase on RHEL / CentOS Linux. By focusing our
resources and community efforts on RHEL/CentOS Linux and Centos Stream, we
can provide better support for those platforms and use more time for moving
oVirt forward.
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
[image: |Our code is open_] <https://www.redhat.com/en/our-code-is-open>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
4 years, 7 months
Re: Migrate hosted engine to standalone host
by Yedidyah Bar David
On Mon, May 18, 2020 at 1:33 PM Anton Louw via Users <users(a)ovirt.org>
wrote:
>
>
> Hello,
>
>
>
> I am not sure if the below thread went missing somewhere. Just to add on
> as well, when moving the hosted engine to a standalone host, will the
> process look similar to the below:
>
I never tried that, but:
>
>
> Backup and remove Hosted Engine:
>
>
>
> 1. Backup Hosted Engine (engine-backup --scope=all --mode=backup
> --file=Full --log=Log_Full)
>
> 2. Download the backup files from HE using WinSCP
>
> 3. Enable global Maintenance (hosted-engine --set-maintenance
> --mode=global)
>
> 4. Power down hosted engine (hosted-engine --vm-shutdown)
>
>
>
>
> ---------------------------------------------------------------------------------------------
>
> Redeploy Hosted Engine:
>
>
>
> Build new CentOS VM in another environment
>
> *Same IP
>
> *Same name
>
> *Same resources
>
>
>
> 1. sudo yum install
> https://resources.ovirt.org/pub/yum-repo/ovirt-release43.rpm
>
> 2. sudo yum install -y ovirt-engine
>
> 3. Copy backup files to newly deployed CentOS VM
>
> 4. engine-backup --mode=restore --file=Full --log=Log_Full --provision-db
> --provision-dwh-db --restore-permissions
>
You might want to pass also --he-remove-storage-vm , and perhaps also
--he-remove-hosts . This removes them, obviously, only from the database,
does not connect to them or do anything on them.
You'll then have to add them again. You will not be able to do that if
there are running VMs on them, I think.
So:
If you need to keep the VMs up, and have some of them on the hosted-engine
hosts, do not use these options, but do test very well, and study the bugs
for which these options were added - you can check git log for this file
searching this:
https://github.com/oVirt/ovirt-engine/commits/master?after=3cd2766ae1576b...
https://github.com/oVirt/ovirt-engine/commit/542e4a318584c8601159b4bd6d57...
Bug-Url: https://bugzilla.redhat.com/1240466
Bug-Url: https://bugzilla.redhat.com/1235200
The engine will still think that it's a hosted-engine, so it will likely be
confused. Perhaps --he-remove-storage-vm is enough for that.
This is likely to be problematic, see above bugs.
If you don't need to keep the VMs up, it's probably safer to just take all
of them down, pass also --he-remove-hosts, and then, in the new engine, add
back the hosts. Or even:
Install a new engine, do not restore the backup, add the hosts, and import
the storage domains.
> 5. after retore has completed, run engine-setup
>
>
>
> Or are there any additional steps I need to take? I have tried this in my
> one lab environment, and it works without any issues, however when trying
> this on my second lab environment, everything shows as “down”, ie. Hosts,
> Data Center and Storage Domains.
>
No idea why they are down. Check engine.log. How long did you wait? Did the
engine have access to the hosts? Can you manually ssh to them from the
engine machine?
Best regards,
--
Didi
4 years, 7 months
locked snapshot after snapshot preview
by Jiří Sléžka
Hi,
Ovirt 4.3.9.4-1.el7, CentOS7...
colleague of me tried yesterday do a snapshot preview on his vm but
operation got stalled.
I can see in Task list in Manager operation "Preview VM Snapshot před AD
test of VM install_W10_LTSC-REK" in state Finalizing (6 hours ago)
In the Events I see
May 19, 2020, 4:00:17 PM
Failed to complete Snapshot-Preview před AD test for VM
install_W10_LTSC-REK.
18acb86e-581c-43f5-bf99-f8d738870976
May 19, 2020, 3:57:40 PM
Failed to run VM install_W10_LTSC-REK due to a failed validation:
[Cannot run VM. The VM is performing an operation on a Snapshot. Please
wait for the operation to finish, and try again.] (User: ******@*********).
d892b7dd-741b-4c61-ac37-b1077eaeaf79
When I grepping correlation id from failed run from engine log a got
zcat /var/log/ovirt-engine/engine.log-20200520.gz | grep
"d892b7dd-741b-4c61-ac37-b1077eaeaf79"
2020-05-19 15:57:40,435+02 INFO
[org.ovirt.engine.core.bll.RunVmCommand] (default task-210)
[d892b7dd-741b-4c61-ac37-b1077eaeaf79] Lock Acquired to object
'EngineLock:{exclusiveLocks='[2f1fb183-55ea-477c-a75b-625f889e4c79=VM]',
sharedLocks=''}'
2020-05-19 15:57:40,489+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(default task-210) [d892b7dd-741b-4c61-ac37-b1077eaeaf79] EVENT_ID:
USER_FAILED_RUN_VM(54), Failed to run VM install_W10_LTSC-REK due to a
failed validation: [Cannot run VM. The VM is performing an operation on
a Snapshot. Please wait for the operation to finish, and try again.]
(User: kra0002(a)CRO.slu.cz).
2020-05-19 15:57:40,489+02 WARN
[org.ovirt.engine.core.bll.RunVmCommand] (default task-210)
[d892b7dd-741b-4c61-ac37-b1077eaeaf79] Validation of action 'RunVm'
failed for user kra0002(a)CRO.slu.cz. Reasons:
VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VM_IS_DURING_SNAPSHOT
2020-05-19 15:57:40,489+02 INFO
[org.ovirt.engine.core.bll.RunVmCommand] (default task-210)
[d892b7dd-741b-4c61-ac37-b1077eaeaf79] Lock freed to object
'EngineLock:{exclusiveLocks='[2f1fb183-55ea-477c-a75b-625f889e4c79=VM]',
sharedLocks=''}'
greping correlation id on failed snapshot preview shows only INFO
events, not ERRORS nor WARNINGS and it looks to me like everything was
FINISHed
When I try to display locked entities a got
cd /usr/share/ovirt-engine/setup/dbutils
./unlock_entity.sh -q -t all -c
Locked VMs
vm_name | snapshot_id
----------------------+--------------------------------------
install_W10_LTSC-REK | 9d53c7b4-8ec5-4d31-8ab1-16bb75ab8f2b
Locked templates
template_name | disk_id
---------------+---------
Locked disks
vm_id | disk_id
--------------------------------------+--------------------------------------
2f1fb183-55ea-477c-a75b-625f889e4c79 | 707db1f6-0859-48ab-b9d0-a97619ed8b0b
Locked snapshots
vm_id | snapshot_id
--------------------------------------+--------------------------------------
2f1fb183-55ea-477c-a75b-625f889e4c79 | 9d53c7b4-8ec5-4d31-8ab1-16bb75ab8f2b
Illegal images
vm_name | image_guid
---------+------------
I am about to try unlock that snapshot but for sure...
* how can I be sure that snapshot operations is not still running (it
would be nice to have some sort of progress bar o better reporting what
is happening in Manager...)
* looks it like bug? Should I fill bugreport or publish more logs?
Cheers,
Jiri
4 years, 7 months
Host "Unassigned Logical Networks" list is wrong
by Matthias Leopold
Hi,
I'm having a special issue with assigning logical networks to host
interfaces via label.
I'm used to assigning logical networks (tagged VLANs, VM network flag) a
"Network label", which I can choose from a drop down menu in the UI
dialog. These networks are automatically flagged "assign" and "require"
in my cluster and I expect them to be synced to hosts physical interface
to which I "drag and dropped" the label. This always seemed to work and
I never worried.
Now I noticed that when I look at the host "Setup Host Networks" dialog
again, that the last(?) logical networks I provisioned as explained
above show up as "Unassigned Logical Networks" with a mouse over text of
"Network should be assigned to 'foo' via label 'bar'. However for some
reason it isn't". This has to be some presentation error, because the
networks are in fact assigned which is also visible in the hosts
"Network Interfaces" tab.
The "Sync All Network" buttons in "Host" - "Network Interfaces" and
"Cluster" - "Logical Networks" tabs are inactive.
When hosts are put to maintenance and activated again the error disappears.
My oVirt version is 4.3.7.
Cluster switch type is "Linux Bridge".
This may seem a minor error, but since it affects my production clusters
and a couple of VLANs I can't afford to play around with host network
configuration. Can anybody explain this? Any help would be appreciated.
thx
Matthias
4 years, 7 months