[ovirt-users] stuck host in hosted engine migration 3.6->4.0

Piotr Kliczewski piotr.kliczewski at gmail.com
Mon Jul 25 10:07:04 UTC 2016


Gervais,

I checked the logs and I see:

jsonrpc.Executor/1::ERROR::2016-07-19
16:19:10,283::task::868::Storage.TaskManager.Task::(_setError)
Task=`b27c8bbd-ca35-44ca-97ae-88c4e91f6eec`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2700, in getStorageDomainInfo
    dom = self.validateSdUUID(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 285, in validateSdUUID
    sdDom.validate()
  File "/usr/share/vdsm/storage/fileSD.py", line 485, in validate
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or
entirely inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',)

Thread-21821::ERROR::2016-07-19
16:19:14,348::api::195::root::(_getHaInfo) failed to retrieve Hosted
Engine HA info
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 174,
in _getHaInfo
    stats = instance.get_all_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 103, in get_all_stats
    self._configure_broker_conn(broker)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 180, in _configure_broker_conn
    dom_type=dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 176, in set_storage_domain
    .format(sd_type, options, e))
RequestError: Failed to set storage domain FilesystemBackend, options
{'dom_type': 'nfs3', 'sd_uuid':
'248f46f0-d793-4581-9810-c9d965e2f286'}: Request failed: <class
'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>


after couple of above issues vdsm was restarted and 'Connection reset
by peer' started to occur. In between connect reset I can see:

Thread-76::ERROR::2016-07-19
16:21:25,024::api::195::root::(_getHaInfo) failed to retrieve Hosted
Engine HA info
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 174,
in _getHaInfo
    stats = instance.get_all_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 102, in get_all_stats
    with broker.connection(self._retries, self._wait):
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 99, in connection
    self.connect(retries, wait)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 78, in connect
    raise BrokerConnectionError(error_msg)
BrokerConnectionError: Failed to connect to broker, the number of
errors has exceeded the limit (1)

and

Thread-315::ERROR::2016-07-19
16:26:58,541::vm::765::virt.vm::(_startUnderlyingVm)
vmId=`4013c829-c9d7-4b72-90d5-6fe58137504c`::The vm start process
failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 706, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1995, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 916, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: resource busy: Failed to acquire lock: error -243

and

Thread-6834::ERROR::2016-07-20
17:18:10,030::task::868::Storage.TaskManager.Task::(_setError)
Task=`f6d8d5df-a55f-4ccb-af11-f1b44b9757d0`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3473, in stopMonitoringDomain
    raise se.StorageDomainIsMemberOfPool(sdUUID)
StorageDomainIsMemberOfPool: Storage domain is member of pool:
'domain=248f46f0-d793-4581-9810-c9d965e2f286'

In the logs I can see that vdsm was restarted on 2016-07-21
14:55:03,607 and any issues stopped occurring.

Was there any hardware (storage) issue?

I can see from your previous email that the issues started to occur
again on 2016-07-22.
Do you see any errors like those above?

Thanks,
Piotr

On Fri, Jul 22, 2016 at 3:05 PM, Gervais de Montbrun
<gervais at demontbrun.com> wrote:
> Hi Simone,
>
> I did have the issue you link to below when doing a `hosted-engine --deploy`
> on this server when I was setting it up to run 3.6. I've commented on the
> bug with my experiences. I did get the host working in 3.6 and there were no
> errors, but this one has cropped up since upgrading to 4.0.1.
>
> I did not have the same issue on all of my hosts, but the error I am
> experiencing now:
>
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 09:59:56,062::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 10:00:11,240::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 10:00:21,158::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 10:00:21,441::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 10:00:26,717::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 10:00:31,856::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 10:00:36,982::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
> JsonRpc (StompReactor)::ERROR::2016-07-22
> 10:00:52,180::betterAsyncore::113::vds.dispatcher::(recv) SSL error during
> reading data: unexpected eof
>
>
> is happening on all of them.
> :-(
>
> Cheers,
> Gervais
>
>
>
> On Jul 22, 2016, at 5:35 AM, Simone Tiraboschi <stirabos at redhat.com> wrote:
>
> On Thu, Jul 21, 2016 at 8:08 PM, Gervais de Montbrun
> <gervais at demontbrun.com> wrote:
>
> Hi Martin
>
> Logs are attached.
>
> Thank you for any help you can offer.
> :-)
>
> Cheers,
> Gervais
>
>
> see also this one: https://bugzilla.redhat.com/show_bug.cgi?id=1358530
>
> the results are pretty similar.
>
> On Jul 21, 2016, at 10:20 AM, Martin Perina <mperina at redhat.com> wrote:
>
> So could you please share logs?
>
> Thanks
>
> Martin
>
> On Thu, Jul 21, 2016 at 3:17 PM, Gervais de Montbrun
> <gervais at demontbrun.com> wrote:
>
>
> Hi Oved,
>
> Thanks for the suggestion.
>
> I tried setting "management_ip = 0.0.0.0" but same result.
> BTW, management_ip='0.0.0.0' (as suggested in the post) doesn't work for
> me. vdsmd wouldn't start.
>
> Cheers,
> Gervais
>
>
>
> On Jul 20, 2016, at 10:50 AM, Oved Ourfali <oourfali at redhat.com> wrote:
>
> Also, this thread seems similar.
> Also talking about IPV4/IPV6 issue.
> Does it help?
>
> [1] http://lists.ovirt.org/pipermail/users/2016-June/040602.html
>
> On Wed, Jul 20, 2016 at 4:43 PM, Martin Perina <mperina at redhat.com> wrote:
>
>
> Hi,
>
> could you please create a bug and attach engine host logs (all from
> /var/log/ovirt-engine) and VDSM logs (from /var/log/vdsm)?
>
> Thanks
>
> Martin Perina
>
>
> On Wed, Jul 20, 2016 at 1:50 PM, Gervais de Montbrun
> <gervais at demontbrun.com
>
> wrote:
>
>
> Hi Qiong,
>
> I am experiencing the exact same issue. All four of my hosts are
> throwing
> the same error to the vdsm.log If you find a solution, please let me
> know
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



More information about the Users mailing list