[ovirt-devel] [ OST Failure Report ] [ oVirt HE master ] [ 17/09/17 ] [ engine-setup ]

Mon Sep 18 10:12:47 UTC 2017

On Mon, Sep 18, 2017 at 12:09 PM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

>
>
> On Sun, Sep 17, 2017 at 11:14 AM, Eyal Edri <eedri at redhat.com> wrote:
>
>>
>>
>> On Sun, Sep 17, 2017 at 11:50 AM, Yaniv Kaul <ykaul at redhat.com> wrote:
>>
>>>
>>>
>>> On Sun, Sep 17, 2017 at 11:47 AM, Eyal Edri <eedri at redhat.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> It looks like HE suite ( both 'master' and '4.1' ) is failing
>>>> constantly, most likely due to 7.4 updates.
>>>>
>>>
> I'm investigating the issue on master.
> In my case I choose to configure the engine VM with a static IP address
> and engine-setup failed on the engine VM since it wasn't able to check
> available OVN related packages.
>
> So we have two distinct issues here:
> 1. we are executing engine-setup with --offline cli option but the OVN
> plugins are ignoring it.
>
> 2. the engine VM has no connectivity.
> I dig it a bit and I found that the default gateway wasn't configured on
> the engine VM although it's correctly set in cloud-init meta-data file.
> So it seams that on 7.4 cloud-init is failing to set the default gateway:
>
> [root at enginevm ~]# nmcli con show "System eth0" | grep -i GATEWAY
> connection.gateway-ping-timeout:        0
> ipv4.gateway:                           --
> ipv6.gateway:                           --
> IP4.GATEWAY:                            --
> IP6.GATEWAY:                            fe80::c4ee:3eff:fed5:fad9
> [root at enginevm ~]# nmcli con modify "System eth0" ipv4.gateway
> Error: value for 'ipv4.gateway' is missing.
> [root at enginevm ~]#
> [root at enginevm ~]# nmcli con show "System eth0" | grep -i GATEWAY
> connection.gateway-ping-timeout:        0
> ipv4.gateway:                           --
> ipv6.gateway:                           --
> IP4.GATEWAY:                            --
> IP6.GATEWAY:                            fe80::c4ee:3eff:fed5:fad9
> [root at enginevm ~]# nmcli con modify "System eth0" ipv4.gateway 192.168.1.1
> [root at enginevm ~]# nmcli con reload "System eth0"
> [root at enginevm ~]# nmcli con up "System eth0"
> Connection successfully activated (D-Bus active path: /org/freedesktop/
> NetworkManager/ActiveConnection/3)
> [root at enginevm ~]# nmcli con show "System eth0" | grep -i GATEWAY
> connection.gateway-ping-timeout:        0
> ipv4.gateway:                           192.168.1.1
> ipv6.gateway:                           --
> IP4.GATEWAY:                            192.168.1.1
> IP6.GATEWAY:                            fe80::c4ee:3eff:fed5:fad9
> [root at enginevm ~]# mount /dev/sr0 /mnt/
> mount: /dev/sr0 is write-protected, mounting read-only
> [root at enginevm ~]# cat /mnt/meta-data
> instance-id: d8b22f43-1565-44e2-916f-f211c7e07f13
> local-hostname: enginevm.localdomain
> network-interfaces: |
>   auto eth0
>   iface eth0 inet static
>     address 192.168.1.204
>     network 192.168.1.0
>     netmask 255.255.255.0
>     broadcast 192.168.1.255
>     gateway 192.168.1.1
>
>
An upstream user was also reporting that he updated his host and his engine
VM to Centos 7.4 and it failed to reboot the engine VM with 7.4 kernel
hanging at
"Probing EDD (edd=off to disable)...ok". He manually forced the old 7.3
kernel via grub menu and his engine VM correctly booted.
I wasn't able to reproduce it here.

>
>
>
>> So there is no suspected patch from oVirt side that might have caused it.
>>>>
>>>
>>> It's the firewall. I've fixed it[1] and specifically[2] but probably not
>>> completely.
>>>
>>
>> Great! Wasn't aware your patch address that, I've replied on the patch
>> itself, but I think we need to split the fix to 2 seperate patches.
>>
>>
>>>
>>> Perhaps we should try to take[2] separately.
>>> Y.
>>>
>>> [1] https://gerrit.ovirt.org/#/c/81766/
>>> [2] https://gerrit.ovirt.org/#/c/81766/3/common/deploy-scrip
>>> ts/setup_storage_unified_he_extra_el7.sh
>>>
>>>
>>>
>>>
>>>> It is probably also the reason why HC suites are failing, since they
>>>> are using also HE for deployments.
>>>>
>>>> I think this issue should BLOCK the Alpha release tomorrow, or at the
>>>> minimum, we need to verify its an OST issue and not a real regression.
>>>>
>>>> Links to relevant failures:
>>>> http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-sui
>>>> te-master/37/consoleFull
>>>> http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-sui
>>>> te-4.1/33/console
>>>>
>>>> Error snippet:
>>>>
>>>> 03:01:38
>>>> 03:01:38           --== STORAGE CONFIGURATION ==--
>>>> 03:01:38
>>>> 03:02:47 [ ERROR ] Error while mounting specified storage path:
>>>> mount.nfs: No route to host
>>>> 03:02:58 [WARNING] Cannot unmount /tmp/tmp2gkFwJ
>>>> 03:02:58 [ ERROR ] Failed to execute stage 'Environment customization':
>>>> mount.nfs: No route to host
>>>>
>>>>
>>>> --
>>>>
>>>> Eyal edri
>>>>
>>>>
>>>> ASSOCIATE MANAGER
>>>>
>>>> RHV DevOps
>>>>
>>>> EMEA VIRTUALIZATION R&D
>>>>
>>>>
>>>> Red Hat EMEA <https://www.redhat.com/>
>>>> <https://red.ht/sig> TRIED. TESTED. TRUSTED.
>>>> <https://redhat.com/trusted>
>>>> phone: +972-9-7692018 <+972%209-769-2018>
>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Eyal edri
>>
>>
>> ASSOCIATE MANAGER
>>
>> RHV DevOps
>>
>> EMEA VIRTUALIZATION R&D
>>
>>
>> Red Hat EMEA <https://www.redhat.com/>
>> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>> phone: +972-9-7692018 <+972%209-769-2018>
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20170918/aa23198f/attachment.html>