
Hi, I just stumbled upon a defunct supervdsm process: root 2824 2815 0 Apr04 ? 00:01:01 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid qemu 31321 2824 0 May02 ? 00:00:00 [supervdsmServer] <defunct> any chance to debug this? might this be an instance of this closed (but never fixed) bug? https://bugzilla.redhat.com/show_bug.cgi?id=841486 -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On Fri, May 09, 2014 at 01:16:07PM +0000, Sven Kieske wrote:
Hi,
I just stumbled upon a defunct supervdsm process:
root 2824 2815 0 Apr04 ? 00:01:01 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid qemu 31321 2824 0 May02 ? 00:00:00 [supervdsmServer] <defunct>
any chance to debug this?
might this be an instance of this closed (but never fixed) bug? https://bugzilla.redhat.com/show_bug.cgi?id=841486
It seems so. Please reopen the bug with new information: is there something interesting in May02's supervdsm.log? Such as a Timeout during validateAccess? It might be a case for the zombiereaper module to solve. Dan.

I don't see anything in supervdsm.log just one error at all related to a vm network. in vdsm log I got these timeouts repeatedly: Thread-25::DEBUG::2014-05-02 09:02:43,652::fileSD::222::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/rhev/data-center/mnt/_home_DATA/9dc0fcb5-b0b0-47dd-b41f-d8709fd8cab2/dom_md/metadata bs=4096 count=1' (cwd None) Thread-25::DEBUG::2014-05-02 09:02:43,667::fileSD::222::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n495 bytes (495 B) copied, 0.000160501 s, 3.1 MB/s\n'; <rc> = 0 VM Channels Listener::DEBUG::2014-05-02 09:02:43,940::vmChannels::91::vds::(_handle_timeouts) Timeout on fileno 65. VM Channels Listener::DEBUG::2014-05-02 09:02:45,384::vmChannels::91::vds::(_handle_timeouts) Timeout on fileno 106. Am 09.05.2014 15:49, schrieb Dan Kenigsberg:
It seems so. Please reopen the bug with new information: is there something interesting in May02's supervdsm.log? Such as a Timeout during validateAccess? It might be a case for the zombiereaper module to solve.
Dan.
-- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On Fri, May 09, 2014 at 02:39:29PM +0000, Sven Kieske wrote:
I don't see anything in supervdsm.log
Still, I believe that the bug is worth reopening - it's a process leak, and it should be avoided. I believe that it can be easily solved by adding zombiereaper to supervdsm.
just one error at all related to a vm network.
in vdsm log I got these timeouts repeatedly:
Thread-25::DEBUG::2014-05-02 09:02:43,652::fileSD::222::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/rhev/data-center/mnt/_home_DATA/9dc0fcb5-b0b0-47dd-b41f-d8709fd8cab2/dom_md/metadata bs=4096 count=1' (cwd None) Thread-25::DEBUG::2014-05-02 09:02:43,667::fileSD::222::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n495 bytes (495 B) copied, 0.000160501 s, 3.1 MB/s\n'; <rc> = 0 VM Channels Listener::DEBUG::2014-05-02 09:02:43,940::vmChannels::91::vds::(_handle_timeouts) Timeout on fileno 65. VM Channels Listener::DEBUG::2014-05-02 09:02:45,384::vmChannels::91::vds::(_handle_timeouts) Timeout on fileno 106.
This is unrelated - that's Vdsm complaining about guest agents not heart-beating. It's log noise that has very small value (particularly if you never run guest agents). It should not appear repeatedly more than once per guest. Solving this isn't hard, but would never get priority without an acclaimed user's BZ (hint hint). Dan.

Well, you called for it: https://bugzilla.redhat.com/show_bug.cgi?id=1096312 Have a nice weekend! I also attached myself to the other BZ, please reopen. Am 09.05.2014 17:05, schrieb Dan Kenigsberg:
This is unrelated - that's Vdsm complaining about guest agents not heart-beating. It's log noise that has very small value (particularly if you never run guest agents). It should not appear repeatedly more than once per guest.
Solving this isn't hard, but would never get priority without an acclaimed user's BZ (hint hint).
-- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

I just wrote to this bugreport to get things going. why isn't it fixed yet? what does block this? can I do anything to speed things up? Thanks in advance. Am 09.05.2014 15:16, schrieb Sven Kieske:
Hi,
I just stumbled upon a defunct supervdsm process:
root 2824 2815 0 Apr04 ? 00:01:01 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid qemu 31321 2824 0 May02 ? 00:00:00 [supervdsmServer] <defunct>
any chance to debug this?
might this be an instance of this closed (but never fixed) bug? https://bugzilla.redhat.com/show_bug.cgi?id=841486
-- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On Wed, Jun 18, 2014 at 08:21:28AM +0000, Sven Kieske wrote:
I just wrote to this bugreport to get things going.
why isn't it fixed yet? what does block this? can I do anything to speed things up?
Does the suggested patch http://gerrit.ovirt.org/#/c/27627/ solves the problem and does not introduce new ones? If so, please mark it as verified. It awaits verification and more reviews, while we are busy in even-more-urgent 3.5 chores. Regards, Dan.

Am 18.06.2014 11:05, schrieb Dan Kenigsberg:
Does the suggested patch http://gerrit.ovirt.org/#/c/27627/ solves the problem and does not introduce new ones? If so, please mark it as verified.
It awaits verification and more reviews, while we are busy in even-more-urgent 3.5 chores.
Regards, Dan.
Thanks for pushing with me for the review in order to avoid log spam. You do not happen to know anything about the other bug(supervdsm defunct), though? Unfortunately I can't test the mentioned patch myself. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On 06/18/2014 05:20 AM, Sven Kieske wrote:
Am 18.06.2014 11:05, schrieb Dan Kenigsberg:
Does the suggested patch http://gerrit.ovirt.org/#/c/27627/ solves the problem and does not introduce new ones? If so, please mark it as verified.
It awaits verification and more reviews, while we are busy in even-more-urgent 3.5 chores.
Regards, Dan.
Thanks for pushing with me for the review in order to avoid log spam. You do not happen to know anything about the other bug(supervdsm defunct), though?
Unfortunately I can't test the mentioned patch myself.
If you're hitting the issue can you hand edit the .py file with what's shown below? http://gerrit.ovirt.org/#/c/27627/1/vdsm/virt/vmchannels.py

On Wed, Jun 18, 2014 at 09:20:06AM +0000, Sven Kieske wrote:
Am 18.06.2014 11:05, schrieb Dan Kenigsberg:
Does the suggested patch http://gerrit.ovirt.org/#/c/27627/ solves the problem and does not introduce new ones? If so, please mark it as verified.
It awaits verification and more reviews, while we are busy in even-more-urgent 3.5 chores.
Regards, Dan.
Thanks for pushing with me for the review in order to avoid log spam. You do not happen to know anything about the other bug(supervdsm defunct), though?
Oh sorry, this email thread is overloaded a bit. I'm adding Yaniv to the nag list. I hope he or Dima can prepare a quick fix for the pid leak.

On 06/18/2014 01:55 PM, Dan Kenigsberg wrote:
On Wed, Jun 18, 2014 at 09:20:06AM +0000, Sven Kieske wrote:
Am 18.06.2014 11:05, schrieb Dan Kenigsberg:
Does the suggested patch http://gerrit.ovirt.org/#/c/27627/ solves the problem and does not introduce new ones? If so, please mark it as verified.
It awaits verification and more reviews, while we are busy in even-more-urgent 3.5 chores.
Regards, Dan.
Thanks for pushing with me for the review in order to avoid log spam. You do not happen to know anything about the other bug(supervdsm defunct), though?
Oh sorry, this email thread is overloaded a bit. I'm adding Yaniv to the nag list. I hope he or Dima can prepare a quick fix for the pid leak.
I don't understand how is http://gerrit.ovirt.org/#/c/27627 related to the defunct problem. Sven, do you have constant flow that produce those defunct process? it can help. if you do, please update [1] with your steps, and logs might help too if you don't have such known flow [1] https://bugzilla.redhat.com/show_bug.cgi?id=841486 Thanks -- Yaniv Bronhaim.

Am 18.06.2014 13:22, schrieb ybronhei:
On 06/18/2014 01:55 PM, Dan Kenigsberg wrote:
On Wed, Jun 18, 2014 at 09:20:06AM +0000, Sven Kieske wrote:
Am 18.06.2014 11:05, schrieb Dan Kenigsberg:
Does the suggested patch http://gerrit.ovirt.org/#/c/27627/ solves the problem and does not introduce new ones? If so, please mark it as verified.
It awaits verification and more reviews, while we are busy in even-more-urgent 3.5 chores.
Regards, Dan.
Thanks for pushing with me for the review in order to avoid log spam. You do not happen to know anything about the other bug(supervdsm defunct), though?
Oh sorry, this email thread is overloaded a bit. I'm adding Yaniv to the nag list. I hope he or Dima can prepare a quick fix for the pid leak.
I don't understand how is http://gerrit.ovirt.org/#/c/27627 related to the defunct problem. Sven, do you have constant flow that produce those defunct process? it can help. if you do, please update [1] with your steps, and logs might help too if you don't have such known flow
[1] https://bugzilla.redhat.com/show_bug.cgi?id=841486
Thanks
I'm sorry for the confusion, the mentioned patch at: http://gerrit.ovirt.org/#/c/27627 does not fix the problem, but the log spam was discovered during debugging the defunct process. I'll try to send some logs on Friday and help debugging the defunct process. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
participants (4)
-
Andrew Cathrow
-
Dan Kenigsberg
-
Sven Kieske
-
ybronhei