[Users] oVirt storage is down and doesn't come up

Limor Gavish lgavish at gmail.com
Mon Apr 15 14:10:16 UTC 2013


Thank you very much for your reply.
I ran the commands you asked (see below) but a directory named as the uuid
of the master domain is not mounted. We tried to restart the VDSM and the
entire machine it didn't help.
We succeeded to manually mount "/home/BO_Ovirt_Storage" to a temporary
directory.

*postgres=#* \connect engine;
You are now connected to database "engine" as user "postgres".
*engine=#* select current_database();
 current_database
------------------
 engine
(1 row)
*engine=#* select sds.id, ssc.connection from storage_domain_static sds
join storage_server_connections ssc on sds.storage=ssc.id where sds.id
='1083422e-a5db-41b6-b667-b9ef1ef244f0';
                  id                  |                 connection
--------------------------------------+--------------------------------------------
 1083422e-a5db-41b6-b667-b9ef1ef244f0 |
bufferoverflow.home:/home/BO_Ovirt_Storage
(1 row)

*[wil at bufferoverflow ~]**$ mount*
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs
(rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
securityfs on /sys/kernel/security type securityfs
(rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup
(rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
(rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup
(rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup
(rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup
(rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup
(rw,nosuid,nodev,noexec,relatime,perf_event)
/dev/sda3 on / type ext4 (rw,relatime,data=ordered)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs
(rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
tmpfs on /tmp type tmpfs (rw)
configfs on /sys/kernel/config type configfs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
/dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
kernelpanic.home:/home/KP_Data_Domain on
/rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100)
bufferoverflow.home:/home/BO_ISO_Domain on
/rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108)

*[wil at bufferoverflow ~]$* ls -la /home/
total 36
drwxr-xr-x.  6 root root  4096 Mar 22 11:25 .
dr-xr-xr-x. 19 root root  4096 Apr 12 18:53 ..
drwxr-xr-x.  3 vdsm kvm   4096 Mar 27 17:33 BO_ISO_Domain
drwxr-xr-x.  3 vdsm kvm   4096 Mar 27 17:33 BO_Ovirt_Storage
drwx------.  2 root root 16384 Mar  6 09:11 lost+found
drwx------. 27 wil  wil   4096 Apr 15 01:50 wil
*[wil at bufferoverflow ~]$* cd /home/BO_Ovirt_Storage/
*[wil at bufferoverflow BO_Ovirt_Storage]$ *ls -la
total 12
drwxr-xr-x. 3 vdsm kvm  4096 Mar 27 17:33 .
drwxr-xr-x. 6 root root 4096 Mar 22 11:25 ..
drwxr-xr-x  5 vdsm kvm  4096 Mar 20 23:06
1083422e-a5db-41b6-b667-b9ef1ef244f0
-rwxr-xr-x  1 vdsm kvm     0 Mar 27 17:33 __DIRECT_IO_TEST__

Thanks,
Limor


On Mon, Apr 15, 2013 at 4:02 PM, Tal Nisan <tnisan at redhat.com> wrote:

> **
> Hi Limor,
> First we should probably start with checking which mount is the master
> storage domain that appears as not found, this should be checked against
> the oVirt server database, please run
>
> select sds.id, ssc.connection from storage_domain_static sds join
> storage_server_connections ssc on sds.storage=ssc.id
> where sds.id='1083422e-a5db-41b6-b667-b9ef1ef244f0';
>
> You can run this via psql or a Postgres ui if you have one.
> In the results you will see the storage connection in the format of
> %hostname%:/%mountName%, then in the VDSM server check in the mount list
> that you see that it is mounted, the mount itself should contain a
> directory named as the uuid of the master domain, let me know the result.
>
> Tal.
>
>
>
>
> On 04/12/2013 07:29 PM, Limor Gavish wrote:
>
> Hi,
>
>  For some reason, without doing anything, all the storage domains became
> down and restarting VDSM or the entire machine do not bring it up.
> I am not using lvm
> The following errors appear several times in vdsm.log (full logs are
> attached):
>
>  Thread-22::WARNING::2013-04-12
> 19:00:08,597::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['
>  Volume group "1083422e-a5db-41b6-b667-b9ef1ef244f0" not found']
> Thread-22::DEBUG::2013-04-12
> 19:00:08,598::lvm::402::OperationMutex::(_reloadvgs) Operation 'lvm reload
> operation' released the operation mutex
> Thread-22::DEBUG::2013-04-12
> 19:00:08,681::resourceManager::615::ResourceManager::(releaseResource)
> Trying to release resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3'
> Thread-22::DEBUG::2013-04-12
> 19:00:08,681::resourceManager::634::ResourceManager::(releaseResource)
> Released resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' (0 active
> users)
> Thread-22::DEBUG::2013-04-12
> 19:00:08,681::resourceManager::640::ResourceManager::(releaseResource)
> Resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' is free, finding
> out if anyone is waiting for it.
> Thread-22::DEBUG::2013-04-12
> 19:00:08,682::resourceManager::648::ResourceManager::(releaseResource) No
> one is waiting for resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3',
> Clearing records.
>  Thread-22::ERROR::2013-04-12
> 19:00:08,682::task::850::TaskManager.Task::(_setError)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Unexpected error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 857, in _run
>     return fn(*args, **kargs)
>   File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
>     res = f(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 939, in connectStoragePool
>     masterVersion, options)
>   File "/usr/share/vdsm/storage/hsm.py", line 986, in _connectStoragePool
>     res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 695, in connect
>     self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 1232, in __rebuild
>     masterVersion=masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 1576, in getMasterDomain
>      raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
> StoragePoolMasterNotFound: Cannot find master domain:
> 'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3,
> msdUUID=1083422e-a5db-41b6-b667-b9ef1ef244f0'
> Thread-22::DEBUG::2013-04-12
> 19:00:08,685::task::869::TaskManager.Task::(_run)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Task._run:
> e35a22ac-771a-4916-851f-2fe9d60a0ae6
> ('5849b030-626e-47cb-ad90-3ce782d831b3', 1,
> '5849b030-626e-47cb-ad90-3ce782d831b3',
> '1083422e-a5db-41b6-b667-b9ef1ef244f0', 3942) {} failed - stopping task
> Thread-22::DEBUG::2013-04-12
> 19:00:08,685::task::1194::TaskManager.Task::(stop)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::stopping in state preparing
> (force False)
> Thread-22::DEBUG::2013-04-12
> 19:00:08,685::task::974::TaskManager.Task::(_decref)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::ref 1 aborting True
> Thread-22::INFO::2013-04-12
> 19:00:08,686::task::1151::TaskManager.Task::(prepare)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::aborting: Task is aborted:
> 'Cannot find master domain' - code 304
>
>  *[wil at bufferoverflow ~]$ **sudo vgs --noheadings --units b --nosuffix
> --separator \| -o
> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free
> *
>   No volume groups found
>
>  *[wil at bufferoverflow ~]$ **mount*
> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
> securityfs on /sys/kernel/security type securityfs
> (rw,nosuid,nodev,noexec,relatime)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> devpts on /dev/pts type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
> cgroup on /sys/fs/cgroup/systemd type cgroup
> (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
> cgroup on /sys/fs/cgroup/cpuset type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuset)
> cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
> cgroup on /sys/fs/cgroup/memory type cgroup
> (rw,nosuid,nodev,noexec,relatime,memory)
> cgroup on /sys/fs/cgroup/devices type cgroup
> (rw,nosuid,nodev,noexec,relatime,devices)
> cgroup on /sys/fs/cgroup/freezer type cgroup
> (rw,nosuid,nodev,noexec,relatime,freezer)
> cgroup on /sys/fs/cgroup/net_cls type cgroup
> (rw,nosuid,nodev,noexec,relatime,net_cls)
> cgroup on /sys/fs/cgroup/blkio type cgroup
> (rw,nosuid,nodev,noexec,relatime,blkio)
> cgroup on /sys/fs/cgroup/perf_event type cgroup
> (rw,nosuid,nodev,noexec,relatime,perf_event)
> /dev/sda3 on / type ext4 (rw,relatime,data=ordered)
> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
> debugfs on /sys/kernel/debug type debugfs (rw,relatime)
> sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
> hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
> systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
> mqueue on /dev/mqueue type mqueue (rw,relatime)
> tmpfs on /tmp type tmpfs (rw)
> configfs on /sys/kernel/config type configfs (rw,relatime)
> binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
> /dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
> /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
> kernelpanic.home:/home/KP_Data_Domain on
> /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100)
> bufferoverflow.home:/home/BO_ISO_Domain on
> /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108)
>
>  *[wil at bufferoverflow ~]$ **sudo find / -name
> 5849b030-626e-47cb-ad90-3ce782d831b3*
> /run/vdsm/pools/5849b030-626e-47cb-ad90-3ce782d831b3
>
>  *[wil at bufferoverflow ~]$* *sudo find / -name
> 1083422e-a5db-41b6-b667-b9ef1ef244f0*
> /home/BO_Ovirt_Storage/1083422e-a5db-41b6-b667-b9ef1ef244f0
>
>  I will extremely appreciate any help,
> Limor Gavish
>
>
> _______________________________________________
> Users mailing listUsers at ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130415/9c2b9bb5/attachment-0001.html>


More information about the Users mailing list