On Wed, Sep 1, 2021 at 4:26 PM Gianluca Cecchi <gianluca.cecchi(a)gmail.com>
wrote:
On Wed, Sep 1, 2021 at 4:00 PM Yedidyah Bar David
<didi(a)redhat.com> wrote:
>
> >
> > So I think there was something wrong with my system or probably a
> regression on this in 4.4.8.
> >
> > I see these lines in ansible steps of deploy of RHV 4.3 -> 4.4
> >
> > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Remove host used to
> redeploy]
> > [ INFO ] changed: [localhost -> 192.168.222.170]
> >
> > possibly this step should remove the host that I'm reinstalling...?
>
> It should. From the DB, before adding it again. Matches on the uuid
> (search the code for unique_id_out if you want the details). Why?
>
> (I didn't follow all this thread, ignoring the rest for now...)
>
> Best regards,
>
>
>
It was the step I suspect there was a regression for in 4.4.8 (comparing
with 4.4.7) when updating the first hosted-engine host during the upgrade
flow and retaining its hostname details.
I'm going to test with latest async 2 4.4.8 and see if it solves the
problem. Otherwise I'm going to open a bugzilla sending the logs.
Gianluca
So tried with 4.4.8 async 2 but the same problem
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Check actual cluster
location]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Enable GlusterFS at
cluster level]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Set VLAN ID at datacenter
level]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Get active list of active
firewalld zones]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Configure libvirt
firewalld zone]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add host]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Include after_add_host
tasks files]
[ INFO ] You can now connect to
https://novirt2.localdomain.local:6900/ovirt-engine/ and check the status
of this host and eventually remediate it, please continue only when the
host is listed as 'up'
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock
file]
[ INFO ] changed: [localhost -> localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until
/tmp/ansible.wy3ichvk_he_setup_lock is removed, delete it once ready to
proceed]
the host keeps remaining as NoNResponsive in local engine and in engine.log
the same
2021-09-10 08:44:51,481+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) []
Command 'GetCapabilitiesAsyncVDSCommand(HostName =
novirt2.localdomain.local,
VdsIdAndVdsVDSCommandParametersBase:{hostId='ca9ff6f7-5a7c-4168-9632-998c52f76cfa',
vds='Host[novirt2.localdomain.local,ca9ff6f7-5a7c-4168-9632-998c52f76cfa]'})'
execution failed: java.net.ConnectException: Connection refused
so the initial install/config of novirt2 doesn't start
So the scenario is
initial 4.3.10 with 2 hosts (novirt1 and novirt2) and 1 she engine (novmgr)
iSCSI based storage: hosted_engine storage domain and one data storage
domain
This is nested env so that through snapshots I can try and repeat steps.
novirt1 and novirt2 are two VMS under one oVirt 4.4 env composed by one
single host and an external engine
the steps:
1 vm running under novirt1 and hosted engine running under novir2 at the
beginning
. global maintenance
. stop engine
. backup
. shutdown engine vm and scratch novirt2
actually I simulate scenario where I deploy novirt2 on a new hw, that is a
clone of novirt2 VM
Already tested (in previous version of 4.4.8) that if I go through a
different hostname it works
As novirt2 and novirt1 (in 4.3) are VMS running on the same hypervisor I
see that in their hw details I have the same serial number and the usual
random uuid
novirt1
uuid B1EF9AFF-D4BD-41A1-B26E-7DD0CC440963
serial number 00fa984c-d5a1-e811-906e-00163566263e
novirt2
uuid D584E962-5461-4FA5-AFFA-DB413E17590C
serial number 00fa984c-d5a1-e811-906e-00163566263e
and the new novirt2 that has a different uuid, being a clone has (from
dmidecode)
uuid: 10b9031d-a475-4b41-a134-bad2ede3cf11
serial Number: 00fa984c-d5a1-e811-906e-00163566263e
Unfortunately I cannot try at the moment the scenario where I deploy the
new novirt2 on the same virtual hw, because in the first 4.3 install I
configured the OS disk as 50Gb and with this size 4.4.8 complains about
insufficient space. And having the snapshot active in preview I cannot
resize the disk
Eventually I can reinstall 4.3 on an 80Gb disk and try the same,
maintaining the same hw ... but this would imply that in general I cannot
upgrade using different hw and reusing the same hostnames.... correct?
anyway if you want to check generated logs at local engine side and novirt2
side here they are:
Contents under /var/log of novmgr (tar.gz format)
https://drive.google.com/file/d/1e4WwN4D8GDBpsGqwpwM40MGcLISeOzGO/view?us...
Contents under /var/log/of novirt2 (tar.gz format)
https://drive.google.com/file/d/1uQxlsbPVclW4xcAbCP8dXyIF2HlLqaR-/view?us...
Backup made in 4.3
https://drive.google.com/file/d/19x4cUhXt2NQkmfTNS8AeLOC7mXtj6IYG/view?us...
thanks
Gianluca