On Mon, Feb 10, 2020 at 2:27 PM Jorick Astrego <jorick(a)netbulae.eu> wrote:
On 2/10/20 11:09 AM, Amit Bawer wrote:
compared it with host having nfs domain working
this
On Mon, Feb 10, 2020 at 11:11 AM Jorick Astrego <jorick(a)netbulae.eu>
wrote:
>
> On 2/9/20 10:27 AM, Amit Bawer wrote:
>
>
>
> On Thu, Feb 6, 2020 at 11:07 AM Jorick Astrego <jorick(a)netbulae.eu>
> wrote:
>
>> Hi,
>>
>> Something weird is going on with our ovirt node 4.3.8 install mounting a
>> nfs share.
>>
>> We have a NFS domain for a couple of backup disks and we have a couple
>> of 4.2 nodes connected to it.
>>
>> Now I'm adding a fresh cluster of 4.3.8 nodes and the backupnfs mount
>> doesn't work.
>>
>> (annoying you cannot copy the text from the events view)
>>
>> The domain is up and working
>>
>> ID:f5d2f7c6-093f-46d6-a844-224d92db5ef9
>> Size: 10238 GiB
>> Available:2491 GiB
>> Used:7747 GiB
>> Allocated: 3302 GiB
>> Over Allocation Ratio:37%
>> Images:7
>> Path:*.*.*.*:/data/ovirt
>> NFS Version: AUTO
>> Warning Low Space Indicator:10% (1023 GiB)
>> Critical Space Action Blocker:5 GiB
>>
>> But somehow the node appears to thin thinks it's an LVM volume? It tries
>> to find the VGs volume group but fails... which is not so strange as it is
>> an NFS volume:
>>
>> 2020-02-05 14:17:54,190+0000 WARN (monitor/f5d2f7c) [storage.LVM]
>> Reloading VGs failed (vgs=[u'f5d2f7c6-093f-46d6-a844-224d92db5ef9'] rc=5
>> out=[] err=[' Volume group "f5d2f7c6-093f-46d6-a844-224d92db5ef9"
not
>> found', ' Cannot process volume group
>> f5d2f7c6-093f-46d6-a844-224d92db5ef9']) (lvm:470)
>> 2020-02-05 14:17:54,201+0000 ERROR (monitor/f5d2f7c) [storage.Monitor]
>> Setting up monitor for f5d2f7c6-093f-46d6-a844-224d92db5ef9 failed
>> (monitor:330)
>> Traceback (most recent call last):
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
line
>> 327, in _setupLoop
>> self._setupMonitor()
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
line
>> 349, in _setupMonitor
>> self._produceDomain()
>> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 159, in
>> wrapper
>> value = meth(self, *a, **kw)
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
line
>> 367, in _produceDomain
>> self.domain = sdCache.produce(self.sdUUID)
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
110,
>> in produce
>> domain.getRealDomain()
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
51,
>> in getRealDomain
>> return self._cache._realProduce(self._sdUUID)
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
134,
>> in _realProduce
>> domain = self._findDomain(sdUUID)
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
151,
>> in _findDomain
>> return findMethod(sdUUID)
>> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
176,
>> in _findUnfetchedDomain
>> raise se.StorageDomainDoesNotExist(sdUUID)
>> StorageDomainDoesNotExist: Storage domain does not exist:
>> (u'f5d2f7c6-093f-46d6-a844-224d92db5ef9',)
>>
>> The volume is actually mounted fine on the node:
>>
>> On NFS server
>>
>> Feb 5 15:47:09 back1en rpc.mountd[4899]: authenticated mount request
>> from *.*.*.*:673 for /data/ovirt (/data/ovirt)
>>
>> On the host
>>
>> mount|grep nfs
>>
>> *.*.*.*:/data/ovirt on /rhev/data-center/mnt/*.*.*.*:_data_ovirt type
>> nfs
>>
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=*.*.*.*,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=*.*.*.*)
>>
>> And I can see the files:
>>
>> ls -alrt /rhev/data-center/mnt/*.*.*.*:_data_ovirt
>> total 4
>> drwxr-xr-x. 5 vdsm kvm 61 Oct 26 2016
>> 1ed0a635-67ee-4255-aad9-b70822350706
>>
>>
> What ls -lart for 1ed0a635-67ee-4255-aad9-b70822350706 is showing?
>
> ls -arlt 1ed0a635-67ee-4255-aad9-b70822350706/
> total 4
> drwxr-xr-x. 2 vdsm kvm 93 Oct 26 2016 dom_md
> drwxr-xr-x. 5 vdsm kvm 61 Oct 26 2016 .
> drwxr-xr-x. 4 vdsm kvm 40 Oct 26 2016 master
> drwxr-xr-x. 5 vdsm kvm 4096 Oct 26 2016 images
> drwxrwxrwx. 3 root root 86 Feb 5 14:37 ..
>
On a working nfs domain host we have following storage hierarchy,
feece142-9e8d-42dc-9873-d154f60d0aac is the nfs domain in my case
/rhev/data-center/
├── edefe626-3ada-11ea-9877-525400b37767
...
│ ├── feece142-9e8d-42dc-9873-d154f60d0aac ->
/rhev/data-center/mnt/10.35.18.45:
_exports_data/feece142-9e8d-42dc-9873-d154f60d0aac
│ └── mastersd ->
/rhev/data-center/mnt/blockSD/a6a14714-6eaa-4054-9503-0ea3fcc38531
└── mnt
├── 10.35.18.45:_exports_data
│ └── feece142-9e8d-42dc-9873-d154f60d0aac
│ ├── dom_md
│ │ ├── ids
│ │ ├── inbox
│ │ ├── leases
│ │ ├── metadata
│ │ ├── outbox
│ │ └── xleases
│ └── images
│ ├── 915e6f45-ea13-428c-aab2-fb27798668e5
│ │ ├── b83843d7-4c5a-4872-87a4-d0fe27a2c3d2
│ │ ├── b83843d7-4c5a-4872-87a4-d0fe27a2c3d2.lease
│ │ └── b83843d7-4c5a-4872-87a4-d0fe27a2c3d2.meta
│ ├── b3be4748-6e18-43c2-84fb-a2909d8ee2d6
│ │ ├── ac46e91d-6a50-4893-92c8-2693c192fbc8
│ │ ├── ac46e91d-6a50-4893-92c8-2693c192fbc8.lease
│ │ └── ac46e91d-6a50-4893-92c8-2693c192fbc8.meta
│ ├── b9edd81a-06b0-421c-85a3-f6618c05b25a
│ │ ├── 9b9e1d3d-fc89-4c08-87b6-557b17a4b5dd
│ │ ├── 9b9e1d3d-fc89-4c08-87b6-557b17a4b5dd.lease
│ │ └── 9b9e1d3d-fc89-4c08-87b6-557b17a4b5dd.meta
│ ├── f88a6f36-fcb2-413c-8fd6-c2b090321542
│ │ ├── d8f8b2d7-7232-4feb-bce4-dbf0d37dba9b
│ │ ├── d8f8b2d7-7232-4feb-bce4-dbf0d37dba9b.lease
│ │ └── d8f8b2d7-7232-4feb-bce4-dbf0d37dba9b.meta
│ └── fe59753e-f3b5-4840-8d1d-31c49c2448f0
│ ├── ad0107bc-46d2-4977-b6c3-082adbf3083d
│ ├── ad0107bc-46d2-4977-b6c3-082adbf3083d.lease
│ └── ad0107bc-46d2-4977-b6c3-082adbf3083d.meta
Maybe I got confused by your ls command output, but I was looking to see
how the dir tree for your nfs domain looks like
which should be rooted under /rhev/data-center/mnt/<nfs server>:<exported
path>
In your output, only 1ed0a635-67ee-4255-aad9-b70822350706 is there which
is not the nfs domain f5d2f7c6-093f-46d6-a844-224d92db5ef9 at question.
So to begin with, there is a need to figure why in your case the
f5d2f7c6-093f-46d6-a844-224d92db5ef9 folder is not to be found on the nfs
storage mounted on that node,
which should be there as far as I understood since this is the same nfs
mount path and server shared between all hosts which are connected to this
SD.
Maybe compare the mount options between the working nodes to the
non-working node and check the export options on the nfs server itself,
maybe it has some specific client ip exports settings?
Hmm, I didn't notice that.
It's okay, I almost missed that as well and went looking for format
versions problems. Glad to hear you have managed to sort this out.
I did a check on the NFS server and I found the
"1ed0a635-67ee-4255-aad9-b70822350706" in the exportdom path
(/data/exportdom).
This was an old NFS export domain that has been deleted for a while now. I
remember finding somewhere an issue with old domains still being active
after removal but I cannot find it now.
I unexported the directory on the nfs server and now I have to correct
mount and it activates fine.
Thanks!
Met vriendelijke groet, With kind regards,
Jorick Astrego
*Netbulae Virtualization Experts *
------------------------------
Tel: 053 20 30 270 info(a)netbulae.eu Staalsteden 4-3A KvK 08198180
Fax: 053 20 30 271
www.netbulae.eu 7547 TA Enschede BTW NL821234584B01
------------------------------