1. we do not have the logs from before the problem.
2.
--------
$ tree /rhev/data-center/
/rhev/data-center/
âââ hsm-tasks
âââ mnt
âââ bufferoverflow.home:_home_BO__ISO__Domain
â  âââ 45d24e2a-705e-440f-954c-fda3cab61298
â  â  âââ dom_md
â  â  â  âââ ids
â  â  â  âââ inbox
â  â  â  âââ leases
â  â  â  âââ metadata
â  â  â  âââ outbox
â  â  âââ images
â  â  âââ 11111111-1111-1111-1111-111111111111
â  â  âââ Fedora-18-x86_64-DVD.iso
â  â  âââ Fedora-18-x86_64-Live-Desktop.iso
â  âââ __DIRECT_IO_TEST__
âââ bufferoverflow.home:_home_BO__Ovirt__Storage
âââ kernelpanic.home:_home_KP__Data__Domain
âââ a8286508-db45-40d7-8645-e573f6bacdc7
â  âââ dom_md
â  â  âââ ids
â  â  âââ inbox
â  â  âââ leases
â  â  âââ metadata
â  â  âââ outbox
â  âââ images
â  âââ 0df45336-de35-4dc0-9958-95b27d5d4701
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
â  â  âââ b245184f-f8e3-479b-8559-8b6af2473b7c
â  â  âââ b245184f-f8e3-479b-8559-8b6af2473b7c.lease
â  â  âââ b245184f-f8e3-479b-8559-8b6af2473b7c.meta
â  âââ 0e1ebaf7-3909-44cd-8560-d05a63eb4c4e
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
â  â  âââ 562b9043-bde8-4595-bbea-fa8871f0e19e
â  â  âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.lease
â  â  âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.meta
â  âââ 32ebb85a-0dde-47fe-90c7-7f4fb2c0f1e5
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
â  â  âââ 4774095e-db3d-4561-8284-53eabfd28f66
â  â  âââ 4774095e-db3d-4561-8284-53eabfd28f66.lease
â  â  âââ 4774095e-db3d-4561-8284-53eabfd28f66.meta
â  âââ a7e13a25-1694-4509-9e6b-e88583a4d970
â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
âââ __DIRECT_IO_TEST__
16 directories, 35 files
--------------------
3. We have 3 domains:
BO_Ovirt_Storage (data domain, on the same machine as engine and vdsm, via
NFS)
BO_ISO_Domain (ISO domain, same machine via NFS)
KP_Data_Domain (data domain on an NFS mount on a different machine)
Yuval
On Wed, Apr 17, 2013 at 4:28 PM, Yeela Kaplan <ykaplan(a)redhat.com> wrote:
Hi Limor,
1) Your log starts exactly after the vdsm restart. I need to see the full
vdsm log from before the domains went down in order to understand the
problem. Can you attach them?
2) can you send the printout of 'tree /rhev/data-center/'
3) how many domains are attached to your DC, and what type are they(ISO,
export,data) and (The DC is nfs right)?
Thanks,
Yeela
----- Original Message -----
> From: "Limor Gavish" <lgavish(a)gmail.com>
> To: "Tal Nisan" <tnisan(a)redhat.com>
> Cc: "Yuval M" <yuvalme(a)gmail.com>, users(a)ovirt.org, "Nezer
Zaidenberg" <
nzaidenberg(a)mac.com>
> Sent: Monday, April 15, 2013 5:10:16 PM
> Subject: Re: [Users] oVirt storage is down and doesn't come up
>
> Thank you very much for your reply.
> I ran the commands you asked (see below) but a directory named as the
uuid of
> the master domain is not mounted. We tried to restart the VDSM and the
> entire machine it didn't help.
> We succeeded to manually mount " /home/BO_Ovirt_Storage" to a temporary
> directory.
>
> postgres=# \connect engine;
> You are now connected to database "engine" as user "postgres".
> engine=# select current_database();
> current_database
> ------------------
> engine
> (1 row)
> engine=# select sds.id , ssc.connection from storage_domain_static sds
join
> storage_server_connections ssc on sds.storage= ssc.id where sds.id
> ='1083422e-a5db-41b6-b667-b9ef1ef244f0';
> id | connection
>
--------------------------------------+--------------------------------------------
> 1083422e-a5db-41b6-b667-b9ef1ef244f0 |
> bufferoverflow.home:/home/BO_Ovirt_Storage
> (1 row)
>
> [wil@bufferoverflow ~] $ mount
> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
> securityfs on /sys/kernel/security type securityfs
> (rw,nosuid,nodev,noexec,relatime)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> devpts on /dev/pts type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
> cgroup on /sys/fs/cgroup/systemd type cgroup
>
(rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
> cgroup on /sys/fs/cgroup/cpuset type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuset)
> cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
> cgroup on /sys/fs/cgroup/memory type cgroup
> (rw,nosuid,nodev,noexec,relatime,memory)
> cgroup on /sys/fs/cgroup/devices type cgroup
> (rw,nosuid,nodev,noexec,relatime,devices)
> cgroup on /sys/fs/cgroup/freezer type cgroup
> (rw,nosuid,nodev,noexec,relatime,freezer)
> cgroup on /sys/fs/cgroup/net_cls type cgroup
> (rw,nosuid,nodev,noexec,relatime,net_cls)
> cgroup on /sys/fs/cgroup/blkio type cgroup
> (rw,nosuid,nodev,noexec,relatime,blkio)
> cgroup on /sys/fs/cgroup/perf_event type cgroup
> (rw,nosuid,nodev,noexec,relatime,perf_event)
> /dev/sda3 on / type ext4 (rw,relatime,data=ordered)
> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
> debugfs on /sys/kernel/debug type debugfs (rw,relatime)
> sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
> hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
> systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
> mqueue on /dev/mqueue type mqueue (rw,relatime)
> tmpfs on /tmp type tmpfs (rw)
> configfs on /sys/kernel/config type configfs (rw,relatime)
> binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
> /dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
> /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
> kernelpanic.home:/home/KP_Data_Domain on
> /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
>
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100)
> bufferoverflow.home:/home/BO_ISO_Domain on
> /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
>
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108)
>
> [wil@bufferoverflow ~]$ ls -la /home/
> total 36
> drwxr-xr-x. 6 root root 4096 Mar 22 11:25 .
> dr-xr-xr-x. 19 root root 4096 Apr 12 18:53 ..
> drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 BO_ISO_Domain
> drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 BO_Ovirt_Storage
> drwx------. 2 root root 16384 Mar 6 09:11 lost+found
> drwx------. 27 wil wil 4096 Apr 15 01:50 wil
> [wil@bufferoverflow ~]$ cd /home/BO_Ovirt_Storage/
> [wil@bufferoverflow BO_Ovirt_Storage]$ ls -la
> total 12
> drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 .
> drwxr-xr-x. 6 root root 4096 Mar 22 11:25 ..
> drwxr-xr-x 5 vdsm kvm 4096 Mar 20 23:06
1083422e-a5db-41b6-b667-b9ef1ef244f0
> -rwxr-xr-x 1 vdsm kvm 0 Mar 27 17:33 __DIRECT_IO_TEST__
>
> Thanks,
> Limor
>
>
> On Mon, Apr 15, 2013 at 4:02 PM, Tal Nisan < tnisan(a)redhat.com > wrote:
>
>
>
> Hi Limor,
> First we should probably start with checking which mount is the master
> storage domain that appears as not found, this should be checked against
the
> oVirt server database, please run
>
> select sds.id , ssc.connection from storage_domain_static sds join
> storage_server_connections ssc on sds.storage= ssc.id
> where sds.id ='1083422e-a5db-41b6-b667-b9ef1ef244f0';
>
> You can run this via psql or a Postgres ui if you have one.
> In the results you will see the storage connection in the format of
> %hostname%:/%mountName%, then in the VDSM server check in the mount list
> that you see that it is mounted, the mount itself should contain a
directory
> named as the uuid of the master domain, let me know the result.
>
> Tal.
>
>
>
>
> On 04/12/2013 07:29 PM, Limor Gavish wrote:
>
>
>
> Hi,
>
> For some reason, without doing anything, all the storage domains became
down
> and restarting VDSM or the entire machine do not bring it up.
> I am not using lvm
> The following errors appear several times in vdsm.log (full logs are
> attached):
>
> Thread-22::WARNING::2013-04-12
> 19:00:08,597::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['
> Volume group "1083422e-a5db-41b6-b667-b9ef1ef244f0" not found']
> Thread-22::DEBUG::2013-04-12
> 19:00:08,598::lvm::402::OperationMutex::(_reloadvgs) Operation 'lvm
reload
> operation' released the operation mutex
> Thread-22::DEBUG::2013-04-12
> 19:00:08,681::resourceManager::615::ResourceManager::(releaseResource)
> Trying to release resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3'
> Thread-22::DEBUG::2013-04-12
> 19:00:08,681::resourceManager::634::ResourceManager::(releaseResource)
> Released resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' (0
active
> users)
> Thread-22::DEBUG::2013-04-12
> 19:00:08,681::resourceManager::640::ResourceManager::(releaseResource)
> Resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' is free, finding
out
> if anyone is waiting for it.
> Thread-22::DEBUG::2013-04-12
> 19:00:08,682::resourceManager::648::ResourceManager::(releaseResource) No
> one is waiting for resource
'Storage.5849b030-626e-47cb-ad90-3ce782d831b3',
> Clearing records.
> Thread-22::ERROR::2013-04-12
> 19:00:08,682::task::850::TaskManager.Task::(_setError)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Unexpected error
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/task.py", line 857, in _run
> return fn(*args, **kargs)
> File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
> res = f(*args, **kwargs)
> File "/usr/share/vdsm/storage/hsm.py", line 939, in connectStoragePool
> masterVersion, options)
> File "/usr/share/vdsm/storage/hsm.py", line 986, in _connectStoragePool
> res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
> File "/usr/share/vdsm/storage/sp.py", line 695, in connect
> self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
> File "/usr/share/vdsm/storage/sp.py", line 1232, in __rebuild
> masterVersion=masterVersion)
> File "/usr/share/vdsm/storage/sp.py", line 1576, in getMasterDomain
> raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
> StoragePoolMasterNotFound: Cannot find master domain:
> 'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3,
> msdUUID=1083422e-a5db-41b6-b667-b9ef1ef244f0'
> Thread-22::DEBUG::2013-04-12
> 19:00:08,685::task::869::TaskManager.Task::(_run)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Task._run:
> e35a22ac-771a-4916-851f-2fe9d60a0ae6
> ('5849b030-626e-47cb-ad90-3ce782d831b3', 1,
> '5849b030-626e-47cb-ad90-3ce782d831b3',
> '1083422e-a5db-41b6-b667-b9ef1ef244f0', 3942) {} failed - stopping task
> Thread-22::DEBUG::2013-04-12
> 19:00:08,685::task::1194::TaskManager.Task::(stop)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::stopping in state preparing
> (force False)
> Thread-22::DEBUG::2013-04-12
> 19:00:08,685::task::974::TaskManager.Task::(_decref)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::ref 1 aborting True
> Thread-22::INFO::2013-04-12
> 19:00:08,686::task::1151::TaskManager.Task::(prepare)
> Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::aborting: Task is aborted:
> 'Cannot find master domain' - code 304
>
> [wil@bufferoverflow ~]$ sudo vgs --noheadings --units b --nosuffix
> --separator \| -o
>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free
> No volume groups found
>
> [wil@bufferoverflow ~]$ mount
> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
> securityfs on /sys/kernel/security type securityfs
> (rw,nosuid,nodev,noexec,relatime)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> devpts on /dev/pts type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
> cgroup on /sys/fs/cgroup/systemd type cgroup
>
(rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
> cgroup on /sys/fs/cgroup/cpuset type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuset)
> cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
> cgroup on /sys/fs/cgroup/memory type cgroup
> (rw,nosuid,nodev,noexec,relatime,memory)
> cgroup on /sys/fs/cgroup/devices type cgroup
> (rw,nosuid,nodev,noexec,relatime,devices)
> cgroup on /sys/fs/cgroup/freezer type cgroup
> (rw,nosuid,nodev,noexec,relatime,freezer)
> cgroup on /sys/fs/cgroup/net_cls type cgroup
> (rw,nosuid,nodev,noexec,relatime,net_cls)
> cgroup on /sys/fs/cgroup/blkio type cgroup
> (rw,nosuid,nodev,noexec,relatime,blkio)
> cgroup on /sys/fs/cgroup/perf_event type cgroup
> (rw,nosuid,nodev,noexec,relatime,perf_event)
> /dev/sda3 on / type ext4 (rw,relatime,data=ordered)
> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
> debugfs on /sys/kernel/debug type debugfs (rw,relatime)
> sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
> hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
> systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
> mqueue on /dev/mqueue type mqueue (rw,relatime)
> tmpfs on /tmp type tmpfs (rw)
> configfs on /sys/kernel/config type configfs (rw,relatime)
> binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
> /dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
> /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
> kernelpanic.home:/home/KP_Data_Domain on
> /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
>
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100)
> bufferoverflow.home:/home/BO_ISO_Domain on
> /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
>
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108)
>
> [wil@bufferoverflow ~]$ sudo find / -name
> 5849b030-626e-47cb-ad90-3ce782d831b3
> /run/vdsm/pools/5849b030-626e-47cb-ad90-3ce782d831b3
>
> [wil@bufferoverflow ~]$ sudo find / -name
> 1083422e-a5db-41b6-b667-b9ef1ef244f0
> /home/BO_Ovirt_Storage/1083422e-a5db-41b6-b667-b9ef1ef244f0
>
> I will extremely appreciate any help,
> Limor Gavish
> _______________________________________________
> Users mailing list Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>