<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Thu, Dec 28, 2017 at 6:02 PM Dan Kenigsberg <<a href="mailto:danken@redhat.com">danken@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, Dec 28, 2017 at 5:36 PM, Yaniv Kaul <<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>> wrote:<br>
> Just saw this on[1]:<br>
> 2017-12-28 10:09:24,980-0500 INFO (MainThread) [vdsm.api] FINISH<br>
> prepareForShutdown return=None from=internal,<br>
> task_id=1a83a4d3-43e3-4833-883b-dfd2fe4d76be (api:52)<br>
> 2017-12-28 10:09:24,980-0500 INFO (MainThread) [vds] Stopping threads<br>
> (vdsmd:159)<br>
> 2017-12-28 10:09:24,980-0500 INFO (MainThread) [vds] Exiting (vdsmd:170)<br>
> 2017-12-28 10:09:25,011-0500 INFO (mailbox-hsm)<br>
> [storage.MailBox.HsmMailMonitor] HSM_MailboxMonitor - Incoming mail<br>
> monitoring thread stopped, clearing outgoing mail (mailbox:511)<br></blockquote><div><br></div><div>This seems to be the issue, the monitor thread is trying to access storage</div><div>*after* it was stopped. Relying on being able to access storage to clean up</div><div>things during shutdown is never going to work. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> 2017-12-28 10:09:25,011-0500 INFO (mailbox-hsm)<br>
> [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM -<br>
> ['/usr/bin/dd',<br>
> 'of=/rhev/data-center/9b97451a-0312-4c69-85e2-f86d6d273636/mastersd/dom_md/inbox',<br>
> 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1',<br>
> 'seek=1'] (mailbox:387)<br>
> 2017-12-28 10:09:25,012-0500 ERROR (mailbox-hsm)<br>
> [storage.MailBox.HsmMailMonitor] FINISH thread <Thread(mailbox-hsm, started<br>
> daemon 139968009594624)> failed (concurrent:201)<br>
> Traceback (most recent call last):<br>
> File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line<br>
> 194, in run<br>
> ret = func(*args, **kwargs)<br>
> File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line 514,<br>
> in _run<br>
> self._sendMail() # Clear outgoing mailbox<br>
> File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line 394,<br>
> in _sendMail<br>
> _mboxExecCmd(self._outCmd, data=self._outgoingMail)<br>
> File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line 84,<br>
> in _mboxExecCmd<br>
> return misc.execCmd(*args, **kwargs)<br>
> File "/usr/lib/python2.7/site-packages/vdsm/common/commands.py", line 53,<br>
> in execCmd<br>
> command = cmdutils.wrap_command(command, with_ioclass=ioclass,<br>
> AttributeError: 'NoneType' object has no attribute 'wrap_command'<br>
> 2017-12-28 10:09:28,965-0500 INFO (MainThread) [vds] (PID: 23232) I am the<br>
> actual vdsm 4.20.9-84.git615770f.el7.centos lago-basic-suite-master-host-0<br>
> (3.10.0-693.2.2.el7.x86_64) (vdsmd:148)<br>
> 2017-12-28 10:09:28,966-0500 INFO (MainThread) [vds] VDSM will run with cpu<br>
> affinity: frozenset([1]) (vdsmd:254)<br>
> 2017-12-28 10:09:28,970-0500 INFO (MainThread) [storage.HSM] START HSM init<br>
> (hsm:366)<br>
><br>
><br>
> [1]<br>
> <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/3113/artifact/exported-artifacts/basic-suite-master__logs/test_logs/basic-suite-master/post-005_network_by_label.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/3113/artifact/exported-artifacts/basic-suite-master__logs/test_logs/basic-suite-master/post-005_network_by_label.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log</a><br>
<br>
<br>
It's quite ugly, but I think that it is harmless.<br>
prepareForShutdown calls self._pool.hsmMailer.stop() but never waits<br>
for it to actually finish. It then exists and Python starts freeing<br>
its modules, such as cmdutils. When hsmMailer wakes up, cmdutils is<br>
already None.<br>
<br>
The fix could be as simple as <a href="https://gerrit.ovirt.org/85810" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/85810</a> but I'm<br>
not sure if it is worth the slowdown in vdsm shutdown.<br></blockquote><div><br></div><div>We can wait for the mailer, but only after we stopped all other threads.</div><div><br></div><div>Nir</div><div> </div></div></div>