On Tue, Jan 12, 2016 at 9:32 AM, Markus Stockhausen <stockhausen@collogia.de> wrote:
Hi there,

we got a nasty situation yesterday in our OVirt 3.5.6 environment.
We ran a LSM that failed during the cleanup operation. To be precise
when the process deleted an image on the source NFS storage.

Can you share with us your NFS server details? 
Is the NFS connection healthy (can be seen with nfsstat)
Generally, delete on NFS should be a pretty quick operation. 
Y.
 

Engine log gives:

2016-01-11 20:49:45,120 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (org.ovirt.thread.pool-8-thread-14) [77277f0] START, DeleteImageGroupVDSCommand( storagePoolId = 94ed7a19-fade-4bd6-83f2-2cbb2f730b95, ignoreFailoverLimit = false, storageDomainId = 272ec473-6041-42ee-bd1a-732789dd18d4, imageGroupId = aed132ef-703a-44d0-b875-db8c0d2c1a92, postZeros = false, forceDelete = false), log id: b52d59c
...
2016-01-11 20:50:45,206 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (org.ovirt.thread.pool-8-thread-14) [77277f0] Failed in DeleteImageGroupVDS method

VDSM (SPM) log gives:

Thread-97::DEBUG::2016-01-11 20:49:45,737::fileSD::384::Storage.StorageDomain::(deleteImage) Removing file: /rhev/data-center/mnt/1.2.3.4:_var_nas2_OVirtIB/272ec473-6041-42ee-bd1a-732789dd18d4/images/_remojzBd1r/0d623afb-291e-4f4c-acba-caecb125c4ed
...
Thread-97::ERROR::2016-01-11 20:50:45,737::task::866::Storage.TaskManager.Task::(_setError) Task=`cd477878-47b4-44b1-85a3-b5da19543a5e`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1549, in deleteImage
    pool.deleteImage(dom, imgUUID, volsByImg)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1884, in deleteImage
    domain.deleteImage(domain.sdUUID, imgUUID, volsByImg)
  File "/usr/share/vdsm/storage/fileSD.py", line 385, in deleteImage
    self.oop.os.remove(volPath)
  File "/usr/share/vdsm/storage/outOfProcess.py", line 245, in remove
    self._iop.unlink(path)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 455, in unlink
    return self._sendCommand("unlink", {"path": path}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 385, in _sendCommand
    raise Timeout(os.strerror(errno.ETIMEDOUT))
Timeout: Connection timed out

Reading the docs I got the idea that vdsm default 60 second timeout
for IO operations might be changed within /etc/vdsm/vdsm.conf

[irs]
process_pool_timeout = 180

Can anyone confirm that this will solve the problem?

Markus






_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users