Engine status : unknown stale-data on single node
by Wood, Randall
I have a three node Ovirt cluster where one node has stale-data for the hosted engine, but the other two nodes do not:
Output of `hosted-engine --vm-status` on a good node:
```
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host ovirt2.low.mdds.tcs-sec.com (id: 1) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : ovirt2.low.mdds.tcs-sec.com
Host ID : 1
Engine status : {"health": "good", "vm": "up", "detail": "Up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : f91f57e4
local_conf_timestamp : 9915242
Host timestamp : 9915241
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=9915241 (Fri Mar 27 14:38:14 2020)
host-id=1
score=3400
vm_conf_refresh_time=9915242 (Fri Mar 27 14:38:14 2020)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host ovirt1.low.mdds.tcs-sec.com (id: 2) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : ovirt1.low.mdds.tcs-sec.com
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 48f9c0fc
local_conf_timestamp : 9218845
Host timestamp : 9218845
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=9218845 (Fri Mar 27 14:38:22 2020)
host-id=2
score=3400
vm_conf_refresh_time=9218845 (Fri Mar 27 14:38:22 2020)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host ovirt3.low.mdds.tcs-sec.com (id: 3) status ==--
conf_on_shared_storage : True
Status up-to-date : False
Hostname : ovirt3.low.mdds.tcs-sec.com
Host ID : 3
Engine status : unknown stale-data
Score : 3400
stopped : False
Local maintenance : False
crc32 : 620c8566
local_conf_timestamp : 1208310
Host timestamp : 1208310
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1208310 (Mon Dec 16 21:14:24 2019)
host-id=3
score=3400
vm_conf_refresh_time=1208310 (Mon Dec 16 21:14:24 2019)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
!! Cluster is in GLOBAL MAINTENANCE mode !!
```
I tried the steps in https://access.redhat.com/discussions/3511881, but `hosted-engine --vm-status` on the node with stale data shows:
```
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
```
One the stale node, ovirt-ha-agent and ovirt-ha-broker are continually restarting. Since it seems the agent depends on the broker, the broker logs includes this snippet, repeating roughly every 3 seconds:
```
MainThread::INFO::2020-03-27 15:01:06,584::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-03-27 15:01:06,584::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-03-27 15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-03-27 15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-03-27 15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-03-27 15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-03-27 15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-03-27 15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-03-27 15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-03-27 15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-03-27 15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-03-27 15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-03-27 15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-03-27 15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-03-27 15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-03-27 15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-03-27 15:01:06,590::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::INFO::2020-03-27 15:01:06,678::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-03-27 15:01:06,678::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-03-27 15:01:06,717::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-03-27 15:01:06,732::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::WARNING::2020-03-27 15:01:08,940::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: [Errno 5] Input/output error: '/rhev/data-center/mnt/glusterSD/ovirt2:_engine/182a4a94-743f-4941-89c1-dc2008ae1cf5/ha_agent/hosted-engine.lockspace'
```
I restarted the stale node yesterday, but it still shows stale data from December of last year.
What is the recommended way for me to try to recover from this?
(This came to my attention when warnings concerning space on the /var/log partition began popping up.)
Thank you,
Randall
4 years, 7 months
Re: Local network
by Tommaso - Shellrent
This is what i've got:
*ovs-vsctl show*
03a038d4-e81c-45e0-94d1-6f18d6504f1f
Bridge br-int
fail_mode: secure
Port "ovn-765f43-0"
Interface "ovn-765f43-0"
type: geneve
options: {csum="true", key=flow, remote_ip="xxx.169.yy.6"}
Port br-int
Interface br-int
type: internal
Port "vnet1"
Interface "vnet1"
Port "ovn-b33f6e-0"
Interface "ovn-b33f6e-0"
type: geneve
options: {csum="true", key=flow, remote_ip="xxx.169.yy.2"}
Port "vnet3"
Interface "vnet3"
Port "ovn-8678d9-0"
Interface "ovn-8678d9-0"
type: geneve
options: {csum="true", key=flow, remote_ip="xxx.169.yy.8"}
Port "ovn-fdd090-0"
Interface "ovn-fdd090-0"
type: geneve
options: {csum="true", key=flow, remote_ip="xxx.169.yy.4"}
ovs_version: "2.11.0"
I suppose that the vnic are:
Port "vnet1"
Interface "vnet1"
Port "vnet3"
Interface "vnet3"
on the engine:
*ovn-nbctl show*
switch a1f30e99-3ab7-46a4-925d-287871905cab
(ovirt-local_network_definitiva-d58aea97-bb20-4e8f-bcc3-5277754846bb)
port b82f3479-b459-4c26-aff0-053d15c74ddd
addresses: ["56:6f:96:b1:00:4c"]
port 52f09a28-1645-45ff-9b84-1e53a81bb399
addresses: ["56:6f:96:b1:00:4b"]
*ovn-sbctl show*
Chassis "ab5bdfdd-8df4-4e9b-9ce9-565cfd513a4d"
hostname: "pvt-41f18-002.serverlet.com"
Encap geneve
ip: "aaa.31.bbb.224"
options: {csum="true"}
Port_Binding "b82f3479-b459-4c26-aff0-053d15c74ddd"
Port_Binding "52f09a28-1645-45ff-9b84-1e53a81bb399"
Il 31/03/20 13:39, Staniforth, Paul ha scritto:
> The engine runs the controller so ovn-sbctl won't work, on the hosts,
> use ovs-vsctl show
>
> Paul S.
> ------------------------------------------------------------------------
> *From:* Tommaso - Shellrent <tommaso(a)shellrent.com>
> *Sent:* 31 March 2020 12:13
> *To:* Staniforth, Paul <P.Staniforth(a)leedsbeckett.ac.uk>;
> users(a)ovirt.org <users(a)ovirt.org>
> *Subject:* Re: [ovirt-users] Local network
>
> *Caution External Mail:* Do not click any links or open any
> attachments unless you trust the sender and know that the content is safe.
>
> Hi.
>
> on engine all seems fine.
>
> on host the command "ovn-sbctl show" is stuck, and with a strace a se
> the following error:
>
>
> connect(5, {sa_family=AF_LOCAL,
> sun_path="/var/run/openvswitch/ovnsb_db.sock"}, 37) = -1 ENOENT (No
> such file or directory)
>
>
>
>
>
>
> Il 31/03/20 11:18, Staniforth, Paul ha scritto:
>>
>> .Hello Tommaso,
>> on your oVirt engine host run
>> check the north bridge controller
>> ovn-nbctl show
>> this should show a software switch for each ovn logical network witch
>> any ports that are active( in your case you should have 2)
>>
>> check the south bridge controller
>> ovn-sbctl show
>> this should show the software switch on each host with a geneve tunnel.
>>
>> on each host run
>> ovs-vsctl show
>> this should show the virtual switch with a geneve tunnel to each
>> other host and a port for any active vnics
>>
>> Regards,
>> Paul S.
>>
>> ------------------------------------------------------------------------
>> *From:* Tommaso - Shellrent <tommaso(a)shellrent.com>
>> <mailto:tommaso@shellrent.com>
>> *Sent:* 31 March 2020 09:27
>> *To:* users(a)ovirt.org <mailto:users@ovirt.org> <users(a)ovirt.org>
>> <mailto:users@ovirt.org>
>> *Subject:* [ovirt-users] Local network
>>
>> *Caution External Mail:* Do not click any links or open any
>> attachments unless you trust the sender and know that the content is
>> safe.
>>
>> Hi to all.
>>
>> I'm trying to connect two vm, on the same "local storage" host,
>> with an internal isolated network.
>>
>> My setup;
>>
>> VM A:
>>
>> * eth0 with an external ip
>> * eth1, with 1922.168.1.1/24
>>
>> VM B
>>
>> * eth0 with an external ip
>> * eth1, with 1922.168.1.2/24
>>
>> the eth1 interfaces are connetter by a network created on external
>> provider ovirt-network-ovn , whithout a subnet defined.
>>
>> Now, the external ip works fine, but the two vm cannot connect
>> through the local network
>>
>> ping: ko
>> arping: ko
>>
>>
>> any idea to what to check?
>>
>>
>> Regards
>>
>> --
>> --
>> Shellrent - Il primo hosting italiano Security First
>>
>> *Tommaso De Marchi*
>> /COO - Chief Operating Officer/
>> Shellrent Srl
>> Via dell'Edilizia, 19 - 36100 Vicenza
>> Tel. 0444321155 <tel:+390444321155> | Fax 04441492177
>>
>> To view the terms under which this email is distributed, please go to:-
>> http://leedsbeckett.ac.uk/disclaimer/email/
> --
> --
> Shellrent - Il primo hosting italiano Security First
>
> *Tommaso De Marchi*
> /COO - Chief Operating Officer/
> Shellrent Srl
> Via dell'Edilizia, 19 - 36100 Vicenza
> Tel. 0444321155 <tel:+390444321155> | Fax 04441492177
>
> To view the terms under which this email is distributed, please go to:-
> http://leedsbeckett.ac.uk/disclaimer/email/
--
--
Shellrent - Il primo hosting italiano Security First
*Tommaso De Marchi*
/COO - Chief Operating Officer/
Shellrent Srl
Via dell'Edilizia, 19 - 36100 Vicenza
Tel. 0444321155 <tel:+390444321155> | Fax 04441492177
4 years, 7 months
How to install Ovirt Node without ISO
by raphael.garcia@centralesupelec.fr
Hello
Is it possible to install an Ovirt node on a CentOs 7 server without iso (CD or USB).
Sorry for this newbie question.
4 years, 7 months