[node-devel] Maybe the reason for install failed sometimes

Fabian Deutsch fabiand at redhat.com
Fri Aug 23 06:36:38 UTC 2013


Am Freitag, den 23.08.2013, 03:22 +0000 schrieb Bohai (ricky):
> Hi, 
> 
> I failed to install the ovirt-node. About detail info, please see the summary of ovirt.log 
> at the end of the mail.
> 
> I read the code, and found the call for subprocess_closefds like below:
> 1426 def findfs(label):
> 1427     system("partprobe /dev/mapper/*")
> 1428     system("udevadm settle")
> 1429     blkid_cmd = "/sbin/blkid -c /dev/null -l -o device -t LABEL=\"" + label + "\""
> 1430     blkid = subprocess_closefds(blkid_cmd, shell=True, stdout=PIPE, stderr=STDOUT)
> 1431     blkid_output = blkid.stdout.read().strip()
> 1432     return blkid_output
> 
> I think when we read the stdout the command has not finished, so the real blkid can't be read.
> 
> After call of function subprocess_closefds, whether or not we need to wait for process to terminate?

Hey Ricky,

subprocess_closefds is just wrapping subprocess.Popen and adding
close_fds=True. subprocess.Popen by default is waiting for the cmd to
exit. At least it behaved like this in a small test.

> For example, to call the "Popen.communicate" to wait the comand finished.

But maybe we should take the warning just above [1] serious.
Using proc.stdout.read() can lead to dead-locks, maybe those are the
problem. But if we had the problem of dead-locks I'd expect the whole
python process to block - but maybe this assumption is wrong.

But at least that part of our codebase is targeted with the node-3.1
feature (page has to be created) [2]

One idea is to use blivet [3] for the storage stuff, this is a module
used by anaconda. Maybe even other parts of anaconda can be used to e.g.
handle the grub and uefi stuff. This would mean joint efforts with the
anaconda team.

Greetings
fabian

---
[1]
http://docs.python.org/2/library/subprocess.html#subprocess.Popen.stdin
[2] http://www.ovirt.org/Features/Node/StorageAndInstallerModuleRewrite
[3] https://git.fedorahosted.org/git/?p=blivet.git

> There are lots of call for subprocess_closefds like this in ovirt-node codes ,
> I guess maybe they are the reason why install failed sometimes.
> 
> How do you think about this?
> 
> 
> --------------------------/tmp/ovirt.log-----------------------------------
> 301 2013-08-23 01:16:17,938 - DEBUG - install - Trial 1 to find candidate (None)
> 302 2013-08-23 01:16:19,290 - DEBUG - ovirtfunctions - partprobe
> 303 2013-08-23 01:16:19,290 - DEBUG - ovirtfunctions -
> 304 2013-08-23 01:16:19,469 - DEBUG - ovirtfunctions - partprobe /dev/mapper/*
> 305 2013-08-23 01:16:19,469 - DEBUG - ovirtfunctions -
> 306 2013-08-23 01:16:19,670 - DEBUG - ovirtfunctions - udevadm settle
> 307 2013-08-23 01:16:19,670 - DEBUG - ovirtfunctions -
> 308 2013-08-23 01:16:19,681 - DEBUG - install - Trial 2 to find candidate (RootBackup)
> 309 2013-08-23 01:16:19,681 - DEBUG - install - Found candidate: RootBackup
> 310 2013-08-23 01:16:19,719 - DEBUG - ovirtfunctions - partprobe /dev/mapper/*
> 311 2013-08-23 01:16:19,719 - DEBUG - ovirtfunctions -
> 312 2013-08-23 01:16:19,845 - DEBUG - ovirtfunctions - udevadm settle
> 313 2013-08-23 01:16:19,845 - DEBUG - ovirtfunctions -
> 314 2013-08-23 01:16:20,030 - INFO - install -
> 315 2013-08-23 01:16:20,030 - INFO - install -
> 316 2013-08-23 01:16:20,037 - DEBUG - install - Traceback (most recent call last):
> 317   File "/usr/lib/python2.7/site-packages/ovirtnode/install.py", line 424, in ovirt_boot_setup
> 318     self.partN = int(self.disk[-1:])
> 319 ValueError: invalid literal for int() with base 10: ''
> ------------------------------------------------------------------------------------
> 
> Best regards to you.
> Ricky
> 
> _______________________________________________
> node-devel mailing list
> node-devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/node-devel





More information about the node-devel mailing list