[OLVM] Host non responsive after installation

I am using Oracle Linux Virtualization Manager, following this guide: https://docs.oracle.com/en/virtualization/oracle-linux-virtualization-manage... After adding a host to the engine, the host becomes non responsive due to network errors: engine.log 2021-04-27 14:53:02,255Z ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-32356) [38586e0e] Host installation failed for host 'c97604b3-5774-4260-92fd-633257aa7498', 'GPU2-2': Network error during communication with the host Help resolving this would be much appreciated!

On Wed, Apr 28, 2021 at 7:16 AM <alan@softdrive.co> wrote:
I am using Oracle Linux Virtualization Manager, following this guide: https://docs.oracle.com/en/virtualization/oracle-linux-virtualization-manage...
After adding a host to the engine, the host becomes non responsive due to network errors: engine.log 2021-04-27 14:53:02,255Z ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-32356) [38586e0e] Host installation failed for host 'c97604b3-5774-4260-92fd-633257aa7498', 'GPU2-2': Network error during communication with the host
Help resolving this would be much appreciated! _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FI3PF5DIPC37H4...
Hi, I am not sure how much is OLVM compatible with oVirt, but can you please share more details? Which version of ovirt-engine and vdsm is it? Are you able to reach the host manually e.g via ssh with specified FQDN/IP? Thanks, Ales -- Ales Musil Software Engineer - RHV Network Red Hat EMEA <https://www.redhat.com> amusil@redhat.com IM: amusil <https://red.ht/sig>

oVirt version: 4.3 - ovirt-engine-appliance.x86_64 4.3-20200603.1.0.2.el7 vdsm version: 4.30.46-1.0.3.el7 I can connect to the host manually. If I supply invalid SSH args when adding the host, it fails immediately rather than running for a while and failing later.

On Wed, Apr 28, 2021 at 3:09 PM <alan@softdrive.co> wrote:
oVirt version: 4.3 - ovirt-engine-appliance.x86_64 4.3-20200603.1.0.2.el7 vdsm version: 4.30.46-1.0.3.el7
I can connect to the host manually. If I supply invalid SSH args when adding the host, it fails immediately rather than running for a while and failing later.
Can you please share vdsm.log and supervdsm.log from the host? Thanks, Ales
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3IZYCB37KK2VXT...
-- Ales Musil Software Engineer - RHV Network Red Hat EMEA <https://www.redhat.com> amusil@redhat.com IM: amusil <https://red.ht/sig>

Logs can be found here: https://www.dropbox.com/s/vdvkj7kp0d4d73v/vdsmlogs.tar.gz?dl=0 Couldn't attach them directly due to size limits. Thanks! - Alan On Fri, Apr 30, 2021 at 3:06 AM Ales Musil <amusil@redhat.com> wrote:
On Wed, Apr 28, 2021 at 3:09 PM <alan@softdrive.co> wrote:
oVirt version: 4.3 - ovirt-engine-appliance.x86_64 4.3-20200603.1.0.2.el7 vdsm version: 4.30.46-1.0.3.el7
I can connect to the host manually. If I supply invalid SSH args when adding the host, it fails immediately rather than running for a while and failing later.
Can you please share vdsm.log and supervdsm.log from the host?
Thanks, Ales
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3IZYCB37KK2VXT...
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil@redhat.com IM: amusil <https://red.ht/sig>

I can see some ssl handshake: SSLError. I would recommend reinstalling certificates from the engine. Regards, Ales On Sun, May 2, 2021 at 2:50 PM Alan Daniels <alan@softdrive.co> wrote:
Logs can be found here: https://www.dropbox.com/s/vdvkj7kp0d4d73v/vdsmlogs.tar.gz?dl=0
Couldn't attach them directly due to size limits.
Thanks! - Alan
On Fri, Apr 30, 2021 at 3:06 AM Ales Musil <amusil@redhat.com> wrote:
On Wed, Apr 28, 2021 at 3:09 PM <alan@softdrive.co> wrote:
oVirt version: 4.3 - ovirt-engine-appliance.x86_64 4.3-20200603.1.0.2.el7 vdsm version: 4.30.46-1.0.3.el7
I can connect to the host manually. If I supply invalid SSH args when adding the host, it fails immediately rather than running for a while and failing later.
Can you please share vdsm.log and supervdsm.log from the host?
Thanks, Ales
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3IZYCB37KK2VXT...
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil@redhat.com IM: amusil <https://red.ht/sig>
-- Ales Musil Software Engineer - RHV Network Red Hat EMEA <https://www.redhat.com> amusil@redhat.com IM: amusil <https://red.ht/sig>

I'll research this. Could you give me some more details on what to do? Can I use vdsm-client -a <HOST> to test this as well? On Mon, May 3, 2021 at 2:26 AM Ales Musil <amusil@redhat.com> wrote:
I can see some ssl handshake: SSLError.
I would recommend reinstalling certificates from the engine.
Regards, Ales
On Sun, May 2, 2021 at 2:50 PM Alan Daniels <alan@softdrive.co> wrote:
Logs can be found here: https://www.dropbox.com/s/vdvkj7kp0d4d73v/vdsmlogs.tar.gz?dl=0
Couldn't attach them directly due to size limits.
Thanks! - Alan
On Fri, Apr 30, 2021 at 3:06 AM Ales Musil <amusil@redhat.com> wrote:
On Wed, Apr 28, 2021 at 3:09 PM <alan@softdrive.co> wrote:
oVirt version: 4.3 - ovirt-engine-appliance.x86_64 4.3-20200603.1.0.2.el7 vdsm version: 4.30.46-1.0.3.el7
I can connect to the host manually. If I supply invalid SSH args when adding the host, it fails immediately rather than running for a while and failing later.
Can you please share vdsm.log and supervdsm.log from the host?
Thanks, Ales
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3IZYCB37KK2VXT...
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil@redhat.com IM: amusil <https://red.ht/sig>
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil@redhat.com IM: amusil <https://red.ht/sig>

Some new information: I found that the VDSM traffic (anything not SSH) was being blocked by the default network security rules (not on the host). After fixing this, I am still unable to add a host though. If I try vdsm-client -a <HOST> Host getCapabilities I get this error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618) Rather than the previous error of just timing out. So maybe there is still an issue with the certs. When I add the host, it now fails with: VDSM GPU2-2 command HostSetupNetworksVDS failed: Message timeout which can be caused by communication issues and I can't SSH into the host anymore. If I restart the host, it becomes NonOperational because ovirtmgmt is missing. Thanks! - Alan On Mon, May 3, 2021 at 9:20 AM Alan Daniels <alan@softdrive.co> wrote:
I'll research this. Could you give me some more details on what to do?
Can I use vdsm-client -a <HOST> to test this as well?
On Mon, May 3, 2021 at 2:26 AM Ales Musil <amusil@redhat.com> wrote:
I can see some ssl handshake: SSLError.
I would recommend reinstalling certificates from the engine.
Regards, Ales
On Sun, May 2, 2021 at 2:50 PM Alan Daniels <alan@softdrive.co> wrote:
Logs can be found here: https://www.dropbox.com/s/vdvkj7kp0d4d73v/vdsmlogs.tar.gz?dl=0
Couldn't attach them directly due to size limits.
Thanks! - Alan
On Fri, Apr 30, 2021 at 3:06 AM Ales Musil <amusil@redhat.com> wrote:
On Wed, Apr 28, 2021 at 3:09 PM <alan@softdrive.co> wrote:
oVirt version: 4.3 - ovirt-engine-appliance.x86_64 4.3-20200603.1.0.2.el7 vdsm version: 4.30.46-1.0.3.el7
I can connect to the host manually. If I supply invalid SSH args when adding the host, it fails immediately rather than running for a while and failing later.
Can you please share vdsm.log and supervdsm.log from the host?
Thanks, Ales
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3IZYCB37KK2VXT...
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil@redhat.com IM: amusil <https://red.ht/sig>
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil@redhat.com IM: amusil <https://red.ht/sig>

If it helps, I had to create a network called ovirtmgmt. This was my network setup Configure 2 nics to be in a bond. Configure a bond with the bridge being ovirtmgmt Configure a bridge called ovirtmgmt with a static up. Make sure ovirtmgmt is DEFROUTE=Yes Sent from my iPhone
On 3 May 2021, at 17:17, Alan Daniels <alan@softdrive.co> wrote:
Some new information: I found that the VDSM traffic (anything not SSH) was being blocked by the default network security rules (not on the host). After fixing this, I am still unable to add a host though.
If I try vdsm-client -a <HOST> Host getCapabilities I get this error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618)
Rather than the previous error of just timing out. So maybe there is still an issue with the certs.
When I add the host, it now fails with: VDSM GPU2-2 command HostSetupNetworksVDS failed: Message timeout which can be caused by communication issues and I can't SSH into the host anymore.
If I restart the host, it becomes NonOperational because ovirtmgmt is missing.
Thanks! - Alan
On Mon, May 3, 2021 at 9:20 AM Alan Daniels <alan@softdrive.co> wrote: I'll research this. Could you give me some more details on what to do?
Can I use vdsm-client -a <HOST> to test this as well?
On Mon, May 3, 2021 at 2:26 AM Ales Musil <amusil@redhat.com> wrote: I can see some ssl handshake: SSLError.
I would recommend reinstalling certificates from the engine.
Regards, Ales
On Sun, May 2, 2021 at 2:50 PM Alan Daniels <alan@softdrive.co> wrote: Logs can be found here: https://www.dropbox.com/s/vdvkj7kp0d4d73v/vdsmlogs.tar.gz?dl=0
Couldn't attach them directly due to size limits.
Thanks! - Alan
On Fri, Apr 30, 2021 at 3:06 AM Ales Musil <amusil@redhat.com> wrote:
On Wed, Apr 28, 2021 at 3:09 PM <alan@softdrive.co> wrote: oVirt version: 4.3 - ovirt-engine-appliance.x86_64 4.3-20200603.1.0.2.el7 vdsm version: 4.30.46-1.0.3.el7
I can connect to the host manually. If I supply invalid SSH args when adding the host, it fails immediately rather than running for a while and failing later.
Can you please share vdsm.log and supervdsm.log from the host?
Thanks, Ales
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3IZYCB37KK2VXT...
-- Ales Musil Software Engineer - RHV Network Red Hat EMEA amusil@redhat.com IM: amusil
-- Ales Musil Software Engineer - RHV Network Red Hat EMEA amusil@redhat.com IM: amusil
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KTCZI3ZVRKPKC3...

Fixed. Thanks for the help! This is specific to our CSP, but some thoughts that might help: - Utilities like tcpdump were helpful in analysis. It ultimately ended up being network issues (rather than SSL or something in the oVirt setup). - In my environment, the host had two NICs, and the FQDN of the host resolves to the IP address on the first NIC. However, it needed to use the network that the second NIC was connected to. So adding the host by this IP address instead of the FQDN worked (or modifying the hosts file to resolve the FQDN instead of using DNS).
participants (4)
-
Alan Daniels
-
alan@softdrive.co
-
Ales Musil
-
James Loker-Steele