On Sun, Jan 14, 2018 at 4:34 PM, Yedidyah Bar David <didi(a)redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:57 PM, Jayme <jaymef(a)gmail.com>
wrote:
> Sure not a problem. For the first issue regarding agent and broker
> crashing. Again the hosted engine VM is up and running at this time, I have
> no idea why the logs are saying volume doesn't exist and why file
>
/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
> does not exist when the file actually does exist in that path.
Perhaps not enough permissions?
Can you try reading it as user 'vdsm'? E.g.
su - vdsm
cp
/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
/dev/null
>
> I assume this problem is most likely also related or causing my other
> problems when accessing hosted vm snapshot section of web gui as well.
>
> vdsm log:
>
> jsonrpc/0::ERROR::2018-01-14
> 09:48:09,302::task::875::storage.TaskManager.Task::(_setError)
> (Task='37eba553-9c13-4e69-90f7-d0c987cc694c') Unexpected error
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
in
> _run
> return fn(*args, **kargs)
> File "<string>", line 2, in prepareImage
> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
> method
> ret = func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162,
in
> prepareImage
> raise se.VolumeDoesNotExist(leafUUID)
> VolumeDoesNotExist: Volume does not exist:
> (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
> jsonrpc/0::ERROR::2018-01-14
> 09:48:09,303::dispatcher::82::storage.Dispatcher::(wrapper) FINISH
> prepareImage error=Volume does not exist:
> (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
>
> agent log:
>
> MainThread::ERROR::2018-01-14
> 09:49:26,546::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Trying to restart agent
> MainThread::ERROR::2018-01-14
>
09:49:37,782::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> Failed to start necessary monitors
> MainThread::ERROR::2018-01-14
> 09:49:37,783::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 131, in _run_agent
> return action(he)
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 55, in action_proper
> return he.start_monitoring()
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 416, in start_monitoring
> self._initialize_broker()
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 535, in _initialize_broker
> m.get('options', {}))
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 83, in start_monitor
> .format(type, options, e))
> RequestError: Failed to start monitor ping, options {'addr':
'192.168.0.1'}:
> [Errno 2] No such file or directory
>
> MainThread::ERROR::2018-01-14
> 09:49:37,783::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Trying to restart agent
>
>
> broker log:
>
> StatusStorageThread::ERROR::2018-01-12
>
14:03:57,629::status_broker::85::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
> Failed to update state.
> Traceback (most recent call last):
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
> line 81, in run
> entry.data
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 212, in put_stats
> .format(str(e)))
> RequestError: failed to write metadata: [Errno 2] No such file or directory:
>
'/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
> StatusStorageThread::ERROR::2018-01-12
>
14:03:57,629::storage_broker::160::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats)
> Failed to read metadata from
>
/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
> Traceback (most recent call last):
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 151, in get_raw_stats
> f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
> OSError: [Errno 2] No such file or directory:
>
'/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
> StatusStorageThread::ERROR::2018-01-12
>
14:03:57,630::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
> Failed to read state.
> Traceback (most recent call last):
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
> line 88, in run
> self._storage_broker.get_raw_stats()
> File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 162, in get_raw_stats
> .format(str(e)))
> RequestError: failed to read metadata: [Errno 2] No such file or directory:
>
'/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>
> Syslog:
>
> Jan 12 16:52:34 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH
> prepareImage error=Volume does not exist:
> (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
> Jan 12 16:52:34 cultivar0 python: detected unhandled Python exception in
> '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
> Jan 12 16:52:34 cultivar0 abrt-server: Not saving repeating crash in
> '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
> Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service: main process
> exited, code=exited, status=1/FAILURE
> Jan 12 16:52:34 cultivar0 systemd: Unit ovirt-ha-broker.service entered
> failed state.
> Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service failed.
> Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service holdoff time
> over, scheduling restart.
> Jan 12 16:52:34 cultivar0 systemd: Cannot add dependency job for unit
> lvm2-lvmetad.socket, ignoring: Unit is masked.
> Jan 12 16:52:34 cultivar0 systemd: Started oVirt Hosted Engine High
> Availability Communications Broker.
> Jan 12 16:52:34 cultivar0 systemd: Starting oVirt Hosted Engine High
> Availability Communications Broker...
> Jan 12 16:52:36 cultivar0 journal: vdsm storage.TaskManager.Task ERROR
> (Task='73141dec-9d8f-4164-9c4e-67c43a102eff') Unexpected error#012Traceback
> (most recent call last):#012 File
> "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
> _run#012 return fn(*args, **kargs)#012 File "<string>", line 2,
in
> prepareImage#012 File
> "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
> method#012 ret = func(*args, **kwargs)#012 File
> "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in
> prepareImage#012 raise
> se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not
> exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
> Jan 12 16:52:36 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH
> prepareImage error=Volume does not exist:
> (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
> Jan 12 16:52:36 cultivar0 python: detected unhandled Python exception in
> '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
> Jan 12 16:52:36 cultivar0 abrt-server: Not saving repeating crash in
> '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
> Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service: main process
> exited, code=exited, status=1/FAILURE
> Jan 12 16:52:36 cultivar0 systemd: Unit ovirt-ha-broker.service entered
> failed state.
> Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service failed.
>
> Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service holdoff time
> over, scheduling restart.
> Jan 12 16:52:36 cultivar0 systemd: Cannot add dependency job for unit
> lvm2-lvmetad.socket, ignoring: Unit is masked.
> Jan 12 16:52:36 cultivar0 systemd: Started oVirt Hosted Engine High
> Availability Communications Broker.
> Jan 12 16:52:36 cultivar0 systemd: Starting oVirt Hosted Engine High
> Availability Communications Broker...
> Jan 12 16:52:37 cultivar0 journal: vdsm storage.TaskManager.Task ERROR
> (Task='bc7af1e2-0ab2-4164-ae88-d2bee03500f9') Unexpected error#012Traceback
> (most recent call last):#012 File
> "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
> _run#012 return fn(*args, **kargs)#012 File "<string>", line 2,
in
> prepareImage#012 File
> "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
> method#012 ret = func(*args, **kwargs)#012 File
> "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in
> prepareImage#012 raise
> se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not
> exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
> Jan 12 16:52:37 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH
> prepareImage error=Volume does not exist:
> (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
> Jan 12 16:52:37 cultivar0 python: detected unhandled Python exception in
> '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
> Jan 12 16:52:38 cultivar0 abrt-server: Not saving repeating crash in
> '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
> Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service: main process
> exited, code=exited, status=1/FAILURE
> Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered
> failed state.
> Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed.
> Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service holdoff time
> over, scheduling restart.
> Jan 12 16:52:38 cultivar0 systemd: Cannot add dependency job for unit
> lvm2-lvmetad.socket, ignoring: Unit is masked.
> Jan 12 16:52:38 cultivar0 systemd: start request repeated too quickly for
> ovirt-ha-broker.service
> Jan 12 16:52:38 cultivar0 systemd: Failed to start oVirt Hosted Engine High
> Availability Communications Broker.
> Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered
> failed state.
> Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed.
> Jan 12 16:52:40 cultivar0 systemd: ovirt-ha-agent.service holdoff time over,
> scheduling restart.
> Jan 12 16:52:40 cultivar0 systemd: Cannot add dependency job for unit
> lvm2-lvmetad.socket, ignoring: Unit is masked.
> Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High
> Availability Communications Broker.
> Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High
> Availability Communications Broker...
> Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High
> Availability Monitoring Agent.
> Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High
> Availability Monitoring Agent...
> Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to
> start necessary monitors
> Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call
> last):#012 File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 131, in _run_agent#012 return action(he)#012 File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 55, in action_proper#012 return he.start_monitoring()#012 File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 416, in start_monitoring#012 self._initialize_broker()#012 File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 535, in _initialize_broker#012 m.get('options', {}))#012 File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 83, in start_monitor#012 .format(type, options, e))#012RequestError:
> Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno
2] No
> such file or directory
> Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
> Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service: main process
> exited, code=exited, status=157/n/a
> Jan 12 16:52:42 cultivar0 systemd: Unit ovirt-ha-agent.service entered
> failed state.
> Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service failed.
>
>
>
> On Sun, Jan 14, 2018 at 9:46 AM, Yedidyah Bar David <didi(a)redhat.com> wrote:
>>
>> On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef(a)gmail.com> wrote:
>> > First, apologies for all the posts to this list lately, I've been
having
>> > a
>> > heck of a time after 4.2 upgrade and you've been helpful, I appreciate
>> > that.
>> >
>> > Since 4.2 upgrade I'm experiencing a few problems that I'm trying
to
>> > debug.
>> >
>> > Current status is engine and all hosts are upgraded to 4.2, and cluster
>> > and
>> > domain set to 4.2 compatibility. Hosted Engine VM is running and ui
>> > accessible etc, all VMs on hosts are running but no HA service. Web UI
>> > is
>> > giving a few errors when checking network and snapshots on the hosted
>> > engine
>> > VM only, it doesn't give errors on any of the others VMs that I spot
>> > checked.
>> >
>> > 1. HA-agent and HA-broker are continually crashing on all three hosts
>> > over
>> > and over every few seconds. I sent an email to users list with more
>> > details
>> > on this problem but unfortunately haven't heard anything back yet. The
>> > general error in the logs seems to be:
>> > VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not
>> > exist:
>> > (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume
doesn't
>> > exist,
>> > why not?
>>
>> If agent/broker logs do not reveal this, the next step is usually checking
>> vdsm logs and/or system logs. Can you please check/share these? Thanks.
>>
>> >
>> > 2. Error when clicking "network interfaces" in the web gui for
the
>> > hosted VM
>> > engine.
>> >
>> > 3. Similar to #2 above an error is given when clicking "snapshots"
in
>> > the
>> > web gui for the hosted engine VM.
>> >
>> > The errors for #2 and #3 are generic "cannot read property 'a'
of null".
>> > I've read previous postings on ovirt-mailing list that suggest you can
>> > install debug-info package to get a human readable error.. but this
>> > package
>> > does not seem to be compatible with 4.2, it expects 4.1: Requires:
>> > "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps
this
>> > package is no longer required? I do see some additional details in the
>> > ui.log that I can post if helpful.
>> >
>> > There is obviously something odd going on here with the hosted engine
>> > VM.
>> > All three errors appear to related to a problem with it, although it is
>> > indeed up and running. I'd really like to get HA broker and agent
back
>> > up
>> > and running, and fix these GUI errors related to hosted engine VM. All
>> > three problems may be connected to one common issue?
>> >
>> > Thanks in advance!
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users(a)ovirt.org
>> >
http://lists.ovirt.org/mailman/listinfo/users
>> >
>>
>>
>>
>> --
>> Didi
>
>
--
Didi