Hi,
If I understand it correctly, the HE Hosts try to ping (or SSH, or
otherwise reach) the Engine host. If it reaches it, then it passes the
liveness check. If it cannot reach it, then it fails. So to me this error
means that there is some configuration, somewhere, that is trying to reach
the engine on the old address (which fails when the engine has the new
address).
I do not know where in the *host* configuration this data lives, so I
cannot suggest where you need to change it.
Can 10.16.248.x reach 10.8.236.x and vice-versa?
Maybe multi-home the engine on both networks for now until you figure it out?
-derek
On Tue, July 23, 2019 9:13 am, carl langlois wrote:
Hi,
We have managed to stabilize the DNS udpate in out network. Now the
current
situation is.
I have 3 hosts that can run the engine (hosted-engine).
They were all in the 10.8.236.x. Now i have moved one of them in the
10.16.248.x.
If i boot the engine on one of the host that is in the 10.8.236.x the
engine is going up with status "good". I can access the engine UI. I can
see all my hosts even the one in the 10.16.248.x network.
But if i boot the engine on the hosted-engine host that was switch to the
10.16.248.x the engine is booting. I can ssh to it but the status is
always
" fail for liveliness check".
The main difference is that when i boot on the host that is in the
10.16.248.x network the engine gets a address in the 248.x network.
On the engine i have this in the
/var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log
019-07-23
09:05:30|MFzehi|YYTDiS|jTq2w8|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can
not sample data, oVirt Engine is not updating the statistics. Please check
your oVirt Engine status.|9704
the engine.log seems okey.
So i need to understand what this " liveliness check" do(or try to do) so
i
can investigate why the engine status is not becoming good.
The initial deployment was done in the 10.8.236.x network. Maybe is as
something to do with that.
Thanks & Regards
Carl
On Thu, Jul 18, 2019 at 8:53 AM Miguel Duarte de Mora Barroso <
mdbarroso(a)redhat.com> wrote:
> On Thu, Jul 18, 2019 at 2:50 PM Miguel Duarte de Mora Barroso
> <mdbarroso(a)redhat.com> wrote:
> >
> > On Thu, Jul 18, 2019 at 1:57 PM carl langlois <crl.langlois(a)gmail.com>
> wrote:
> > >
> > > Hi Miguel,
> > >
> > > I have managed to change the config for the ovn-controler.
> > > with those commands
> > > ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=ssl:
> 10.16.248.74:6642
> > > ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=10.16.248.65
> > > and restating the services
> >
> > Yes, that's what the script is supposed to do, check [0].
> >
> > Not sure why running vdsm-tool didn't work for you.
> >
> > >
> > > But even with this i still have the "fail for liveliness check"
when
> starting the ovirt engine. But one thing i notice with our new network
> is
> that the reverse DNS does not work(IP -> hostname). The forward is
> working
> fine. I am trying to see with our IT why it is not working.
> >
> > Do you guys use OVN? If not, you could disable the provider, install
> > the hosted-engine VM, then, if needed, re-add / re-activate it .
>
> I'm assuming it fails for the same reason you've stated initially -
> i.e. ovn-controller is involved; if it is not, disregard this msg :)
> >
> > [0] -
>
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/se...
> >
> > >
> > > Regards.
> > > Carl
> > >
> > > On Thu, Jul 18, 2019 at 4:03 AM Miguel Duarte de Mora Barroso <
> mdbarroso(a)redhat.com> wrote:
> > >>
> > >> On Wed, Jul 17, 2019 at 7:07 PM carl langlois
> <crl.langlois(a)gmail.com>
> wrote:
> > >> >
> > >> > Hi
> > >> > Here is the output of the command
> > >> >
> > >> > [root@ovhost1 ~]# vdsm-tool --vvverbose ovn-config 10.16.248.74
> ovirtmgmt
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,581::cmdutils::150::root::(exec_cmd) lshw -json -disable usb
> -disable pcmcia -disable isapnp -disable ide -disable scsi -disable dmi
> -disable memory -disable cpuinfo (cwd None)
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,738::cmdutils::158::root::(exec_cmd) SUCCESS: <err> = '';
<rc>
> = 0
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,741::routes::109::root::(get_gateway) The gateway 10.16.248.1
> is
> duplicated for the device ovirtmgmt
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,742::routes::109::root::(get_gateway) The gateway 10.16.248.1
> is
> duplicated for the device ovirtmgmt
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,742::cmdutils::150::root::(exec_cmd) /sbin/tc qdisc show (cwd
> None)
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,744::cmdutils::158::root::(exec_cmd) SUCCESS: <err> = '';
<rc>
> = 0
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,745::cmdutils::150::root::(exec_cmd) /sbin/tc class show dev
> enp2s0f1 classid 0:1388 (cwd None)
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,747::cmdutils::158::root::(exec_cmd) SUCCESS: <err> = '';
<rc>
> = 0
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,766::cmdutils::150::root::(exec_cmd)
> /usr/share/openvswitch/scripts/ovs-ctl status (cwd None)
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,777::cmdutils::158::root::(exec_cmd) SUCCESS: <err> = '';
<rc>
> = 0
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,778::vsctl::67::root::(commit) Executing commands:
> /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge --
> list Port -- list Interface
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,778::cmdutils::150::root::(exec_cmd) /usr/bin/ovs-vsctl
> --timeout=5 --oneline --format=json -- list Bridge -- list Port -- list
> Interface (cwd None)
> > >> > MainThread::DEBUG::2019-07-17
> 13:02:52,799::cmdutils::158::root::(exec_cmd) SUCCESS: <err> = '';
<rc>
> = 0
> > >> > netlink/events::DEBUG::2019-07-17
> 13:02:52,802::concurrent::192::root::(run) START thread
> <Thread(netlink/events, started daemon 140299323660032)> (func=<bound
> method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at
> 0x7f99fb618c90>>, args=(), kwargs={})
> > >> > netlink/events::DEBUG::2019-07-17
> 13:02:54,805::concurrent::195::root::(run) FINISH thread
> <Thread(netlink/events, started daemon 140299323660032)>
> > >> > Using default PKI files
> > >> >
> > >> > I do not see any indication of the config??
> > >>
> > >> And afterwards when you execute "ovs-vsctl list Open_vSwitch"
does
> it
> > >> reflect the updated value ?
> > >>
> > >> This command would have to be performed in the node where hosted
> > >> engine will be hosted - not sure if it's possible to determine
> before
> > >> hand which one it will be. If not, you should run it in all the
> nodes
> > >> in the cluster, to be sure.
> > >>
> > >> >
> > >> > Regards
> > >> > Carl
> > >> >
> > >> > On Wed, Jul 17, 2019 at 11:40 AM carl langlois <
> crl.langlois(a)gmail.com> wrote:
> > >> >>
> > >> >> Hi
> > >> >>
> > >> >> I have open a bug
>
https://bugzilla.redhat.com/show_bug.cgi?id=1730776
> > >> >>
> > >> >> I have try this command "vdsm-tool ovn-config
10.16.248.74
> ovirtmgmt" on one of the host but nothing changed. After a restart of
> the
> ovn-controler i still get
> > >> >>
> > >> >> 2019-07-17T15:38:52.572Z|00033|reconnect|INFO|ssl:
> 10.8.236.244:6642: waiting 8 seconds before reconnect
> > >> >> 2019-07-17T15:39:00.578Z|00034|reconnect|INFO|ssl:
> 10.8.236.244:6642: connecting...
> > >> >> 2019-07-17T15:39:05.720Z|00035|fatal_signal|WARN|terminating
> with
> signal 15 (Terminated)
> > >> >> 2019-07-17T15:39:05.863Z|00001|vlog|INFO|opened log file
> /var/log/openvswitch/ovn-controller.log
> > >> >>
> 2019-07-17T15:39:05.864Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
> connecting...
> > >> >>
> 2019-07-17T15:39:05.864Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
> connected
> > >> >> 2019-07-17T15:39:05.865Z|00004|reconnect|INFO|ssl:
> 10.8.236.244:6642: connecting...
> > >> >> 2019-07-17T15:39:06.865Z|00005|reconnect|INFO|ssl:
> 10.8.236.244:6642: connection attempt timed out
> > >> >> 2019-07-17T15:39:06.865Z|00006|reconnect|INFO|ssl:
> 10.8.236.244:6642: waiting 1 seconds before reconnect
> > >> >> 2019-07-17T15:39:07.867Z|00007|reconnect|INFO|ssl:
> 10.8.236.244:6642: connecting...
> > >> >> 2019-07-17T15:39:08.867Z|00008|reconnect|INFO|ssl:
> 10.8.236.244:6642: connection attempt timed out
> > >> >> 2019-07-17T15:39:08.868Z|00009|reconnect|INFO|ssl:
> 10.8.236.244:6642: waiting 2 seconds before reconnect
> > >> >> 2019-07-17T15:39:10.870Z|00010|reconnect|INFO|ssl:
> 10.8.236.244:6642: connecting...
> > >> >> 2019-07-17T15:39:12.872Z|00011|reconnect|INFO|ssl:
> 10.8.236.244:6642: connection attempt timed out
> > >> >> 2019-07-17T15:39:12.872Z|00012|reconnect|INFO|ssl:
> 10.8.236.244:6642: waiting 4 seconds before reconnect
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Wed, Jul 17, 2019 at 10:56 AM Miguel Duarte de Mora Barroso
<
> mdbarroso(a)redhat.com> wrote:
> > >> >>>
> > >> >>> On Wed, Jul 17, 2019 at 3:01 PM carl langlois <
> crl.langlois(a)gmail.com> wrote:
> > >> >>> >
> > >> >>> > Hi Miguel
> > >> >>> >
> > >> >>> > if i do ovs-vsctl list Open_vSwitch i get
> > >> >>> >
> > >> >>> > uuid :
ce94c4b1-7eb2-42e3-8bfd-96e1dec40dea
> > >> >>> > bridges :
[9b0738ee-594d-4a87-8967-049a8b1a5774]
> > >> >>> > cur_cfg : 1
> > >> >>> > datapath_types : [netdev, system]
> > >> >>> > db_version : "7.14.0"
> > >> >>> > external_ids : {hostname="ovhost2",
> ovn-bridge-mappings="", ovn-encap-ip="10.8.236.150",
> ovn-encap-type=geneve,
> ovn-remote="ssl:10.8.236.244:6642",
> system-id="7c39d07b-1d54-417b-bf56-7a0f1a07f832"}
> > >> >>> > iface_types : [geneve, gre, internal, lisp,
patch,
> stt,
> system, tap, vxlan]
> > >> >>> > manager_options : []
> > >> >>> > next_cfg : 1
> > >> >>> > other_config : {}
> > >> >>> > ovs_version : "2.7.3"
> > >> >>> > ssl : []
> > >> >>> > statistics : {}
> > >> >>> > system_type : centos
> > >> >>> > system_version : "7"
> > >> >>> >
> > >> >>> > I can see two addresses that are on the old
network..
> > >> >>>
> > >> >>> Yes, those are it.
> > >> >>>
> > >> >>> Use the tool I mentioned to update that to the correct
> addresses
> on
> > >> >>> the network, and re-try.
> > >> >>>
> > >> >>> vdsm-tool ovn-config <engine_ip_on_net> <name of
the management
> network>
> > >> >>>
> > >> >>> > Regards
> > >> >>> > Carl
> > >> >>> >
> > >> >>> >
> > >> >>> > On Wed, Jul 17, 2019 at 8:21 AM carl langlois <
> crl.langlois(a)gmail.com> wrote:
> > >> >>> >>
> > >> >>> >> Hi Miguel,
> > >> >>> >>
> > >> >>> >> I will surely open a bugs, any specific ovirt
componenent to
> select when openeing the bug?
> > >> >>>
> > >> >>> ovirt-engine
> > >> >>>
> > >> >>> >>
> > >> >>> >> When you say that the hosted-engine should have
trigger a
> the
> update. Do you mean is was suppose to trigger the update and did not
> work
> or it is something missing?
> > >> >>>
> > >> >>> I sincerely do not know. @Dominik Holler, could you shed
some
> light into this ?
> > >> >>>
> > >> >>> >> Could i have missed a step when switching the
network?
> > >> >>> >>
> > >> >>> >> Also if i try to do ovs-vsctl list . The list
command
> require
> a Table name. Not sure what table to use?
> > >> >>> >>
> > >> >>> >> Regards
> > >> >>> >> Carl
> > >> >>> >>
> > >> >>> >>
> > >> >>> >>
> > >> >>> >> On Wed, Jul 17, 2019 at 4:21 AM Miguel Duarte de
Mora
> Barroso <
> mdbarroso(a)redhat.com> wrote:
> > >> >>> >>>
> > >> >>> >>> On Tue, Jul 16, 2019 at 8:48 PM carl langlois
<
> crl.langlois(a)gmail.com> wrote:
> > >> >>> >>> >
> > >> >>> >>> > Hi
> > >> >>> >>> >
> > >> >>> >>> > We are in a process of changing our
network connection.
> Our
> current network is using 10.8.256.x and we will change to 10.16.248.x.
> We
> have a HA ovirt cluster (around 10 nodes) currently configure on the
> 10.8.256.x. So my question is is it possible to relocate the ovirt
> cluster
> to the 10.16.248.x. We have tried to move everything to the new network
> without success. All the node seem to boot up properly, our gluster
> storage
> also work properly.
> > >> >>> >>> > When we try to start the hosted-engine
it goes up but
> fail
> the liveliness check. We have notice in the
> /var/log/openvswitch/ovn-controller.log that he is triying to connect to
> the hold ip address of the hosted-engine vm.
> > >> >>> >>> >
019-07-16T18:41:29.483Z|01992|reconnect|INFO|ssl:
> 10.8.236.244:6642: waiting 8 seconds before reconnect
> > >> >>> >>> >
2019-07-16T18:41:37.489Z|01993|reconnect|INFO|ssl:
> 10.8.236.244:6642: connecting...
> > >> >>> >>> >
2019-07-16T18:41:45.497Z|01994|reconnect|INFO|ssl:
> 10.8.236.244:6642: connection attempt timed out
> > >> >>> >>> >
> > >> >>> >>> > So my question is were is the
10.8.236.244 come from.
> > >> >>> >>>
> > >> >>> >>> Looks like the ovn controllers were not
updated during the
> network change.
> > >> >>> >>>
> > >> >>> >>> The wrong IP is configured within
openvswitch, you can see
> it
> in the
> > >> >>> >>> (offending) nodes through "ovs-vsctl
list . ". It'll be a
> key
> in the
> > >> >>> >>> 'external_ids' column called
'ovn-remote' .
> > >> >>> >>>
> > >> >>> >>> This is not the solution, but a work-around;
you could try
> to
> > >> >>> >>> configure the ovn controllers via:
> > >> >>> >>> vdsm-tool ovn-config <engine_ip_on_net>
<name of the
> management network>
> > >> >>> >>>
> > >> >>> >>> Despite the provided work-around, I really
think the hosted
> engine
> > >> >>> >>> should have triggered the ansible role that
in turn
> triggers
> this
> > >> >>> >>> reconfiguration.
> > >> >>> >>>
> > >> >>> >>> Would you open a bug with this information ?
> > >> >>> >>>
> > >> >>> >>>
> > >> >>> >>> >
> > >> >>> >>> > The routing table for one of our host
look like this
> > >> >>> >>> >
> > >> >>> >>> > estination Gateway Genmask
Flags
> Metric
> Ref Use Iface
> > >> >>> >>> > default gateway 0.0.0.0
UG 0
> 0 0 ovirtmgmt
> > >> >>> >>> > 10.16.248.0 0.0.0.0
255.255.255.0 U 0
> 0 0 ovirtmgmt
> > >> >>> >>> > link-local 0.0.0.0
255.255.0.0 U
> 1002
> 0 0 eno1
> > >> >>> >>> > link-local 0.0.0.0
255.255.0.0 U
> 1003
> 0 0 eno2
> > >> >>> >>> > link-local 0.0.0.0
255.255.0.0 U
> 1025
> 0 0 ovirtmgmt
> > >> >>> >>> >
> > >> >>> >>> > Any help would be really appreciated.
> > >> >>> >>> >
> > >> >>> >>> > Regards
> > >> >>> >>> > Carl
> > >> >>> >>> >
> > >> >>> >>> >
> > >> >>> >>> >
> > >> >>> >>> >
> > >> >>> >>> >
_______________________________________________
> > >> >>> >>> > Users mailing list -- users(a)ovirt.org
> > >> >>> >>> > To unsubscribe send an email to
users-leave(a)ovirt.org
> > >> >>> >>> > Privacy Statement:
>
https://www.ovirt.org/site/privacy-policy/
> > >> >>> >>> > oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> > >> >>> >>> > List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DBQUWEPPDK2...
>
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UB72PHIP2FO...
--
Derek Atkins 617-623-3745
derek(a)ihtfp.com