[Users] Weird crash of oVirt-3.3.2-el6

Hi All, Just saw that something has gone 'wrong' in my oVirt installation and I can't make sense of it. Full logs are placed in a save place but I would like to start with the following from /var/log/message on ovirt04: Jan 30 07:31:52 ovirt04 vdsm TaskManager.Task ERROR Task=`2a424ca3-19f6-4e01-9e42-ec377655c6f1`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 3049, in get VolumeSize#012 apparentsize = str(volClass.getVSize(dom, imgUUID, volUUID, bs=1))#012 File "/usr/share/vdsm/storage/fileVolume.py", line 418, in getVSize#012 return int(sdobj.oop.os.stat(volPath ).st_size / bs)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 312, in callCrabRPCFunction#012 raise Exception("No free file handlers in pool")#012Exception: No free file handlers in pool From that point on it goes downhill. Ovirt04 was the SPM, that got transferred to ovirt02, all VM's on ovirt04 are set to paused, but only in the logs and the webui. In real life they are running fine (virsh -r list on ovirt04 confirms that). I can select all paused VM's and 'run' them and that clears up de webui and doesn't harm the VM's sofar as I can tell but shouldn't be necessary. Joop

Hi, afaik your error boils down to: "No free file handlers in pool" which means vdsm has to many open files. you can increase the limit by following this tutorial: http://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-desc... HTH Am 30.01.2014 10:30, schrieb Joop:
Hi All,
Just saw that something has gone 'wrong' in my oVirt installation and I can't make sense of it. Full logs are placed in a save place but I would like to start with the following from /var/log/message on ovirt04: Jan 30 07:31:52 ovirt04 vdsm TaskManager.Task ERROR Task=`2a424ca3-19f6-4e01-9e42-ec377655c6f1`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 3049, in get VolumeSize#012 apparentsize = str(volClass.getVSize(dom, imgUUID, volUUID, bs=1))#012 File "/usr/share/vdsm/storage/fileVolume.py", line 418, in getVSize#012 return int(sdobj.oop.os.stat(volPath ).st_size / bs)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 312, in callCrabRPCFunction#012 raise Exception("No free file handlers in pool")#012Exception: No free file handlers in pool
From that point on it goes downhill. Ovirt04 was the SPM, that got transferred to ovirt02, all VM's on ovirt04 are set to paused, but only in the logs and the webui. In real life they are running fine (virsh -r list on ovirt04 confirms that). I can select all paused VM's and 'run' them and that clears up de webui and doesn't harm the VM's sofar as I can tell but shouldn't be necessary.
Joop
-- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On 30-1-2014 10:42, Sven Kieske wrote:
Hi,
afaik your error boils down to:
"No free file handlers in pool" which means vdsm has to many open files.
you can increase the limit by following this tutorial:
http://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-desc...
Checking according to the blog I'm not even close to the limit from sysctl, beside that node now runs 24 VMs but it has had 70 VMs during my upgrade van 3.2.x to 3.3.2. I'm stripping the logs of a lot of entries from before and after the event so that it becomes manageable to look at. Joop

As far as I know, there is a limit for the whole system, and a limit to a specific user. What's the output of `lsof | wc -l`? What's the output of `ulimit -a` for the same user that oVirt runs as? ----- Original Message -----
From: "Joop" <jvdwege@xs4all.nl> To: users@ovirt.org Sent: Thursday, January 30, 2014 11:47:47 AM Subject: Re: [Users] Weird crash of oVirt-3.3.2-el6
On 30-1-2014 10:42, Sven Kieske wrote:
Hi,
afaik your error boils down to:
"No free file handlers in pool" which means vdsm has to many open files.
you can increase the limit by following this tutorial:
http://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-desc...
Checking according to the blog I'm not even close to the limit from sysctl, beside that node now runs 24 VMs but it has had 70 VMs during my upgrade van 3.2.x to 3.3.2. I'm stripping the logs of a lot of entries from before and after the event so that it becomes manageable to look at.
Joop
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 30-1-2014 12:02, Meital Bourvine wrote:
As far as I know, there is a limit for the whole system, and a limit to a specific user.
What's the output of `lsof | wc -l`? 7095 What's the output of `ulimit -a` for the same user that oVirt runs as?
How to obtain that since ulimit is a bash command(?). sudo -u vdsm bash ulimit -a returns permission denied There is a /etc/security/limits.d/99-vdsm.conf which has limits for 4096 proc and 3*4096 nofiles. Found this: return-limits(){ for process in $@; do process_pids=`ps -C $process -o pid --no-headers | cut -d " " -f 2` if [ -z $@ ]; then echo "[no $process running]" else for pid in $process_pids; do echo "[$process #$pid -- limits]" cat /proc/$pid/limits done fi done } running that: return-limits vdsm: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 4096 4096 processes Max open files 12288 12288 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 612118 612118 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Joop
participants (3)
-
Joop
-
Meital Bourvine
-
Sven Kieske