On 2/6/20 6:22 PM, Amit Bawer wrote:
On Thu, Feb 6, 2020 at 2:54 PM Jorick Astrego <jorick(a)netbulae.eu
<mailto:jorick@netbulae.eu>> wrote:
On 2/6/20 1:44 PM, Amit Bawer wrote:
>
>
> On Thu, Feb 6, 2020 at 1:07 PM Jorick Astrego <jorick(a)netbulae.eu
> <mailto:jorick@netbulae.eu>> wrote:
>
> Here you go, this is from the activation I just did a couple
> of minutes ago.
>
> I was hoping to see how it was first connected to host, but it
> doesn't go that far back. Anyway, the storage domain type is set
> from engine and vdsm never try to guess it as far as I saw.
I put the host in maintenance and activated it again, this should
give you some more info. See attached log.
> Could you query the engine db about the misbehaving domain and
> paste the results?
>
> # su - postgres
> Last login: Thu Feb 6 07:17:52 EST 2020 on pts/0
> -bash-4.2$ LD_LIBRARY_PATH=/opt/rh/rh-postgresql10/root/lib64/
> /opt/rh/rh-postgresql10/root/usr/bin/psql engine
> psql (10.6)
> Type "help" for help.
> engine=# select * from storage_domain_static where id =
> 'f5d2f7c6-093f-46d6-a844-224d92db5ef9' ;
engine=# select * from storage_domain_static where id =
'f5d2f7c6-093f-46d6-a844-224d92db5ef9' ;
id |
storage | storage_name | storage_domain_type |
storage_type | storage_domain_format_type |
_create_date | _update_date |
recoverable | la
st_time_used_as_master | storage_description | storage_comment
| wipe_after_delete | warning_low_space_indicator |
critical_space_action_blocker | first_metadata_device |
vg_metadata_device | discard_after_delete | backup |
warning_low_co
nfirmed_space_indicator | block_size
--------------------------------------+--------------------------------------+--------------+---------------------+--------------+----------------------------+-------------------------------+-----------------------------+-------------+---
-----------------------+---------------------+-----------------+-------------------+-----------------------------+-------------------------------+-----------------------+--------------------+----------------------+--------+---------------
------------------------+------------
f5d2f7c6-093f-46d6-a844-224d92db5ef9 |
b8b456f0-27c3-49b9-b5e9-9fa81fb3cdaa | backupnfs
| 1 | 1 |
4 | 2018-01-19 13:31:25.899738+01 |
2019-02-14 14:36:22.3171+01 | t |
1530772724454 | |
| f | 10
| 5 |
| | f | f
|
0 | 512
(1 row)
Thanks for sharing,
The storage_type in db is indeed NFS (1), storage_domain_format_type
is 4 - for ovirt 4.3 the storage_domain_format_type is 5 by default
and usually datacenter upgrade is required for 4.2 to 4.3 migration,
which not sure if possible in your current setup since you have 4.2
nodes using this storage as well.
Regarding the repeating monitor failure for the SD:
2020-02-05 14:17:54,190+0000 WARN (monitor/f5d2f7c) [storage.LVM]
Reloading VGs failed (vgs=[u'f5d2f7c6-093f-46d6-a844-224d92db5ef9']
rc=5 out=[] err=[' Volume group
"f5d2f7c6-093f-46d6-a844-224d92db5ef9" not found', ' Cannot process
volume group f5d2f7c6-093f-46d6-a844-224d92db5ef9']) (lvm:470)
This error means that the monitor has tried to query the SD as a VG
first and failed, this is expected for the fallback code called for
finding a domain missing from SD cache:
def_findUnfetchedDomain(self, sdUUID):
...
formod in(blockSD, glusterSD, localFsSD, nfsSD):
try:
returnmod.findDomain(sdUUID)
exceptse.StorageDomainDoesNotExist:
pass
exceptException:
self.log.error(
"Error while looking for domain `%s`",
sdUUID, exc_info=True)
raisese.StorageDomainDoesNotExist(sdUUID)
2020-02-05 14:17:54,201+0000 ERROR (monitor/f5d2f7c) [storage.Monitor]
Setting up monitor for f5d2f7c6-093f-46d6-a844-224d92db5ef9 failed
(monitor:330)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
line 327, in _setupLoop
self._setupMonitor()
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
line 349, in _setupMonitor
self._produceDomain()
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 159, in
wrapper
value = meth(self, *a, **kw)
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
line 367, in _produceDomain
self.domain = sdCache.produce(self.sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
110, in produce
domain.getRealDomain()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
51, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
134, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
151, in _findDomain
return findMethod(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line
176, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'f5d2f7c6-093f-46d6-a844-224d92db5ef9',)
This error part means it failed to query the domain for any possible
type, either for NFS.
Are you able to create a new NFS storage domain on the same storage
server (but on another export path not to harm the existing one)?
If you do succeed to connect to it from the 4.3 datacenter, it could
mean the v4 format is an issue;
otherwise it could mean there is an issue with a different NFS
settings required for 4.3.
Well this will be a problem either way, when I add a new NFS it will not
be storage_domain_format_type 5 as the DC is still on 4.2.
Also when I do add a type 5 nfs domain, the 4.2 nodes will try to mount
it, fail and then become non-responsive taking the whole running cluster
down?
Regards,
Jorick Astrego
Met vriendelijke groet, With kind regards,
Jorick Astrego
Netbulae Virtualization Experts
----------------
Tel: 053 20 30 270 info(a)netbulae.eu Staalsteden 4-3A KvK 08198180
Fax: 053 20 30 271
www.netbulae.eu 7547 TA Enschede BTW NL821234584B01
----------------