On Thu, Dec 28, 2017 at 6:02 PM Dan Kenigsberg <danken(a)redhat.com> wrote:
On Thu, Dec 28, 2017 at 5:36 PM, Yaniv Kaul <ykaul(a)redhat.com>
wrote:
> Just saw this on[1]:
> 2017-12-28 10:09:24,980-0500 INFO (MainThread) [vdsm.api] FINISH
> prepareForShutdown return=None from=internal,
> task_id=1a83a4d3-43e3-4833-883b-dfd2fe4d76be (api:52)
> 2017-12-28 10:09:24,980-0500 INFO (MainThread) [vds] Stopping threads
> (vdsmd:159)
> 2017-12-28 10:09:24,980-0500 INFO (MainThread) [vds] Exiting (vdsmd:170)
> 2017-12-28 10:09:25,011-0500 INFO (mailbox-hsm)
> [storage.MailBox.HsmMailMonitor] HSM_MailboxMonitor - Incoming mail
> monitoring thread stopped, clearing outgoing mail (mailbox:511)
This seems to be the issue, the monitor thread is trying to access storage
*after* it was stopped. Relying on being able to access storage to clean up
things during shutdown is never going to work.
> 2017-12-28 10:09:25,011-0500 INFO (mailbox-hsm)
> [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM -
> ['/usr/bin/dd',
>
'of=/rhev/data-center/9b97451a-0312-4c69-85e2-f86d6d273636/mastersd/dom_md/inbox',
> 'iflag=fullblock', 'oflag=direct', 'conv=notrunc',
'bs=4096', 'count=1',
> 'seek=1'] (mailbox:387)
> 2017-12-28 10:09:25,012-0500 ERROR (mailbox-hsm)
> [storage.MailBox.HsmMailMonitor] FINISH thread <Thread(mailbox-hsm,
started
> daemon 139968009594624)> failed (concurrent:201)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line
> 194, in run
> ret = func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line
514,
> in _run
> self._sendMail() # Clear outgoing mailbox
> File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line
394,
> in _sendMail
> _mboxExecCmd(self._outCmd, data=self._outgoingMail)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line
84,
> in _mboxExecCmd
> return misc.execCmd(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/common/commands.py", line
53,
> in execCmd
> command = cmdutils.wrap_command(command, with_ioclass=ioclass,
> AttributeError: 'NoneType' object has no attribute 'wrap_command'
> 2017-12-28 10:09:28,965-0500 INFO (MainThread) [vds] (PID: 23232) I am
the
> actual vdsm 4.20.9-84.git615770f.el7.centos
lago-basic-suite-master-host-0
> (3.10.0-693.2.2.el7.x86_64) (vdsmd:148)
> 2017-12-28 10:09:28,966-0500 INFO (MainThread) [vds] VDSM will run with
cpu
> affinity: frozenset([1]) (vdsmd:254)
> 2017-12-28 10:09:28,970-0500 INFO (MainThread) [storage.HSM] START HSM
init
> (hsm:366)
>
>
> [1]
>
http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x8...
It's quite ugly, but I think that it is harmless.
prepareForShutdown calls self._pool.hsmMailer.stop() but never waits
for it to actually finish. It then exists and Python starts freeing
its modules, such as cmdutils. When hsmMailer wakes up, cmdutils is
already None.
The fix could be as simple as
https://gerrit.ovirt.org/85810 but I'm
not sure if it is worth the slowdown in vdsm shutdown.
We can wait for the mailer, but only after we stopped all other threads.
Nir