Hello,

On Sun, Aug 8, 2021 at 9:08 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Usually this is not the problem.

Start checking:
1. Export FS is mounted
2. NFS server is running (after all this is a single node NFS setup)
3. Check that vdsmd , supervdsmd and sanlock are running
4. If needed, enable debug for the ovirt-ha-{agent,broker} as usually the default log level won't show the problem.

Best Regards,
Strahil Nikolov



1. All NFS shares are exported, hosted storage (used by the hosted engine) is mounted by oVirt.
$ mount | grep rhev
localhost:/exports/hosted on /rhev/data-center/mnt/localhost:_exports_hosted type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1)

2. NFS is working as expected.
$ exportfs | grep exports
/exports/hosted  127.0.0.1/255.255.255.0
/exports/data 127.0.0.1/255.255.255.0
/exports/iso   127.0.0.1/255.255.255.0
/exports/export 127.0.0.1/255.255.255.0

3. All services seem to run just fine (minus broker and agent).
$ ps -AH | /bin/egrep -e 'vdsm|sanlock'
   2282 ?        00:00:00   sanlock
   2284 ?        00:00:00     sanlock-helper
   5065 ?        00:00:02   supervdsmd
  12259 ?        00:20:15   vdsmd

4. In both cases I can see the problem in the log.

Broker:
--------------------------------------------------------------------------------------------------------------------------------------------------
MainThread::INFO::2021-08-08 19:46:06,962::status_broker::121::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker::(__init__) Status broker initialized.
Listener::INFO::2021-08-08 19:46:06,962::listener::44::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__) Initializing RPCServer
Listener::INFO::2021-08-08 19:46:06,963::listener::57::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__) RPCServer ready
StatusStorageThread::ERROR::2021-08-08 19:46:06,985::storage_broker::167::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Corrupted metadata from /run/vdsm/storage/9541c195-9f59-4225-91be-53391b4f1bb3/10cb67f7-6be2-47e4-9268-81fca9862057/deadf86f-b937-4172-8359-90c991dc2ecf
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 163, in get_raw_stats
    data = bdata.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 4191756: invalid start byte
StatusStorageThread::ERROR::2021-08-08 19:46:06,986::status_broker::98::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state.
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 163, in get_raw_stats
    data = bdata.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 4191756: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 94, in run
    self._storage_broker.get_raw_stats()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 169, in get_raw_stats
    .format(str(e)))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Corrupted read metadata: 'utf-8' codec can't decode byte 0xb9 in position 4191756: invalid start byte
StatusStorageThread::ERROR::2021-08-08 19:46:06,987::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart) Trying to restart the broker
Listener::INFO::2021-08-08 19:46:07,464::broker::77::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Server shutting down
Listener::INFO::2021-08-08 19:46:07,464::monitor::117::ovirt_hosted_engine_ha.broker.monitor.Monitor::(stop_all_submonitors) Stopping all submonitors
MainThread::INFO::2021-08-08 19:46:08,060::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.4.7 started

Agent:
--------------------------------------------------------------------------------------------------------------------------------------------------
MainThread::INFO::2021-08-08 19:36:25,467::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'addr': '192.168.1.9', 'network_test': 'tcp', 'tcp_t_address': '192.168.1.2', 'tcp_t_port': '22'}
MainThread::ERROR::2021-08-08 19:36:25,468::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2021-08-08 19:36:25,470::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 85, in start_monitor
    response = self._proxy.start_monitor(type, options)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
    verbose=self.__verbose
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python3.6/http/client.py", line 1264, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.6/http/client.py", line 978, in send
    self.connect()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect
    self.sock.connect(base64.b16decode(self.host))
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
    return action(he)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
    return he.start_monitoring()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring
    self._initialize_broker()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 561, in _initialize_broker
    m.get('options', {}))
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 91, in start_monitor
    ).format(t=type, o=options, e=e)
ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'addr': '192.168.1.9', 'network_test': 'tcp', 'tcp_t_address': '192.168.1.2', 'tcp_t_port': '22'}]











В неделя, 8 август 2021 г., 20:06:46 ч. Гринуич+3, Gilboa Davara <gilboad@gmail.com> написа:





On Sun, Aug 8, 2021 at 7:53 PM Gilboa Davara <gilboad@gmail.com> wrote:
> Hello all,
>
> During the night, one of my (smaller) setups, a single node self hosted engine (localhost NFS) crashed due to what-looks-like a massive disk failure (Software RAID6, with 10 drives + spare).
> After a reboot, I let the RAID resync with a fresh drive) and went on to start oVirt.
> However, no such luck.
> Two issues:
> 1. ovirt-ha-broker fails due to broken hosted engine state (log attached).
> 2. ovirt-ha-agent fails due to network test (tcp) even though both remote-host and DNS servers are active. (log attached).
>
> Two questions:
> 1. Can I somehow force the agent to disable the network liveliness test?
> 2. Can I somehow force the broker to rebuild / fix the hosted engine state?
>
> - Gilboa

FWIW switching agent network test to none (via hosted-engine --set-shared-config network_test none --type=he_local) doesn't seem to work.
(Unless I'm missing the point and the agent is failing due to broker issues and not due to a failed network liveliness check).


- Gilboa

 

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OH4H5K2FZXO6YNVFU6W3XL7NHW6N5LAU/