[node-devel] Maybe the reason for install failed sometimes

Bohai (ricky) bohai at huawei.com
Mon Aug 26 02:19:37 UTC 2013



> -----Original Message-----
> From: Fabian Deutsch [mailto:fabiand at redhat.com]
> Sent: Friday, August 23, 2013 2:37 PM
> To: Bohai (ricky)
> Cc: node-devel at ovirt.org; Luohao (A); Haofeng; boh.ricky at gmail.com
> Subject: Re: [node-devel] Maybe the reason for install failed sometimes
> 
> Am Freitag, den 23.08.2013, 03:22 +0000 schrieb Bohai (ricky):
> > Hi,
> >
> > I failed to install the ovirt-node. About detail info, please see the summary of
> ovirt.log
> > at the end of the mail.
> >
> > I read the code, and found the call for subprocess_closefds like below:
> > 1426 def findfs(label):
> > 1427     system("partprobe /dev/mapper/*")
> > 1428     system("udevadm settle")
> > 1429     blkid_cmd = "/sbin/blkid -c /dev/null -l -o device -t LABEL=\"" +
> label + "\""
> > 1430     blkid = subprocess_closefds(blkid_cmd, shell=True, stdout=PIPE,
> stderr=STDOUT)
> > 1431     blkid_output = blkid.stdout.read().strip()
> > 1432     return blkid_output
> >
> > I think when we read the stdout the command has not finished, so the real
> blkid can't be read.
> >
> > After call of function subprocess_closefds, whether or not we need to wait for
> process to terminate?
> 
> Hey Ricky,
> 
> subprocess_closefds is just wrapping subprocess.Popen and adding
> close_fds=True. subprocess.Popen by default is waiting for the cmd to
> exit. At least it behaved like this in a small test.
> 

Hey Fabian,
Maybe I can't agree with you.

When I read the Popen's source code in Python-2.7.5/Lib/subprocess.py, 
I didn't find the default wait code.

In [1], the document only say the subprocess.call by default wait the command's complete.
The below is the source code of subprocess.call, actually it depends on the wait() function.

--------------------------------------------------------------------------------------------------
516 def call(*popenargs, **kwargs):
 517     """Run command with arguments.  Wait for command to complete, then
 518     return the returncode attribute.
 519
 520     The arguments are the same as for the Popen constructor.  Example:
 521
 522     retcode = call(["ls", "-l"])
 523     """
 524     return Popen(*popenargs, **kwargs).wait()
--------------------------------------------------------------------------------------------------

So, are you sure with the default wait of subprocess.Popen ?

> > For example, to call the "Popen.communicate" to wait the comand finished.
> 
> But maybe we should take the warning just above [1] serious.
> Using proc.stdout.read() can lead to dead-locks, maybe those are the
> problem. But if we had the problem of dead-locks I'd expect the whole
> python process to block - but maybe this assumption is wrong.
> 
> But at least that part of our codebase is targeted with the node-3.1
> feature (page has to be created) [2]
> 
> One idea is to use blivet [3] for the storage stuff, this is a module
> used by anaconda. Maybe even other parts of anaconda can be used to e.g.
> handle the grub and uefi stuff. This would mean joint efforts with the
> anaconda team.
> 
> Greetings
> fabian
> 
> ---
> [1]
> http://docs.python.org/2/library/subprocess.html#subprocess.Popen.stdin
> [2] http://www.ovirt.org/Features/Node/StorageAndInstallerModuleRewrite
> [3] https://git.fedorahosted.org/git/?p=blivet.git
> 
> > There are lots of call for subprocess_closefds like this in ovirt-node codes ,
> > I guess maybe they are the reason why install failed sometimes.
> >
> > How do you think about this?
> >
> >
> > --------------------------/tmp/ovirt.log-----------------------------------
> > 301 2013-08-23 01:16:17,938 - DEBUG - install - Trial 1 to find candidate
> (None)
> > 302 2013-08-23 01:16:19,290 - DEBUG - ovirtfunctions - partprobe
> > 303 2013-08-23 01:16:19,290 - DEBUG - ovirtfunctions -
> > 304 2013-08-23 01:16:19,469 - DEBUG - ovirtfunctions - partprobe
> /dev/mapper/*
> > 305 2013-08-23 01:16:19,469 - DEBUG - ovirtfunctions -
> > 306 2013-08-23 01:16:19,670 - DEBUG - ovirtfunctions - udevadm settle
> > 307 2013-08-23 01:16:19,670 - DEBUG - ovirtfunctions -
> > 308 2013-08-23 01:16:19,681 - DEBUG - install - Trial 2 to find candidate
> (RootBackup)
> > 309 2013-08-23 01:16:19,681 - DEBUG - install - Found candidate:
> RootBackup
> > 310 2013-08-23 01:16:19,719 - DEBUG - ovirtfunctions - partprobe
> /dev/mapper/*
> > 311 2013-08-23 01:16:19,719 - DEBUG - ovirtfunctions -
> > 312 2013-08-23 01:16:19,845 - DEBUG - ovirtfunctions - udevadm settle
> > 313 2013-08-23 01:16:19,845 - DEBUG - ovirtfunctions -
> > 314 2013-08-23 01:16:20,030 - INFO - install -
> > 315 2013-08-23 01:16:20,030 - INFO - install -
> > 316 2013-08-23 01:16:20,037 - DEBUG - install - Traceback (most recent call
> last):
> > 317   File "/usr/lib/python2.7/site-packages/ovirtnode/install.py", line 424,
> in ovirt_boot_setup
> > 318     self.partN = int(self.disk[-1:])
> > 319 ValueError: invalid literal for int() with base 10: ''
> > ------------------------------------------------------------------------------------
> >
> > Best regards to you.
> > Ricky
> >
> > _______________________________________________
> > node-devel mailing list
> > node-devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/node-devel
> 



More information about the node-devel mailing list