
On Sun, Jan 14, 2018 at 4:34 PM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:57 PM, Jayme <jaymef@gmail.com> wrote:
Sure not a problem. For the first issue regarding agent and broker crashing. Again the hosted engine VM is up and running at this time, I have no idea why the logs are saying volume doesn't exist and why file /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 does not exist when the file actually does exist in that path.
Perhaps not enough permissions?
Can you try reading it as user 'vdsm'? E.g.
su - vdsm
su - vdsm -s /bin/bash
cp /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 /dev/null
I assume this problem is most likely also related or causing my other problems when accessing hosted vm snapshot section of web gui as well.
vdsm log:
jsonrpc/0::ERROR::2018-01-14 09:48:09,302::task::875::storage.TaskManager.Task::(_setError) (Task='37eba553-9c13-4e69-90f7-d0c987cc694c') Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in prepareImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage raise se.VolumeDoesNotExist(leafUUID) VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) jsonrpc/0::ERROR::2018-01-14 09:48:09,303::dispatcher::82::storage.Dispatcher::(wrapper) FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
agent log:
MainThread::ERROR::2018-01-14 09:49:26,546::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::ERROR::2018-01-14 09:49:37,782::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2018-01-14 09:49:37,783::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor .format(type, options, e)) RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory
MainThread::ERROR::2018-01-14 09:49:37,783::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
broker log:
StatusStorageThread::ERROR::2018-01-12 14:03:57,629::status_broker::85::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 81, in run entry.data File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 212, in put_stats .format(str(e))) RequestError: failed to write metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,629::storage_broker::160::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata from /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 151, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,630::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 88, in run self._storage_broker.get_raw_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 162, in get_raw_stats .format(str(e))) RequestError: failed to read metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
Syslog:
Jan 12 16:52:34 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:34 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:34 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:34 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:34 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:34 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:36 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='73141dec-9d8f-4164-9c4e-67c43a102eff') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:36 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:36 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:36 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service failed.
Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:36 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:36 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:36 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:37 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='bc7af1e2-0ab2-4164-ae88-d2bee03500f9') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:37 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:37 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:38 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:38 cultivar0 systemd: start request repeated too quickly for ovirt-ha-broker.service Jan 12 16:52:38 cultivar0 systemd: Failed to start oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:40 cultivar0 systemd: ovirt-ha-agent.service holdoff time over, scheduling restart. Jan 12 16:52:40 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Monitoring Agent... Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor#012 .format(type, options, e))#012RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service: main process exited, code=exited, status=157/n/a Jan 12 16:52:42 cultivar0 systemd: Unit ovirt-ha-agent.service entered failed state. Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service failed.
On Sun, Jan 14, 2018 at 9:46 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more details on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the hosted VM engine.
3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this package does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required? I do see some additional details in the ui.log that I can post if helpful.
There is obviously something odd going on here with the hosted engine VM. All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi
-- Didi