Dear Amit.
That is not exactly the issue we're facing; I have two clusters an old
4.2 cluster and a fresh 4.3 cluster. The DC is set to 4.2 because of this.
I wish to bring the 4.3 cluster online and migrate everything to it,
then remove the 4.2 cluster.
This should be a path that can be taken imho.
Otherwise I could upgrade the current 4.2 cluster althought it gets
decomissioned but then I have the same problem with hosts upgraded to
4.3 not being able to connect to the NFS and able to be activated. Until
I upgrade them all and set the cluster and DC to 4.3.
I could detach the NFS domain for a bit, but we use this mount for
backup disks of several VM's and I'd rather not migrate everything in a
hurry or do without these backups for a longer time.
Regards,
Jorick Astrego
On 2/8/20 11:51 AM, Amit Bawer wrote:
I doubt if you can use 4.3.8 nodes with a 4.2 cluster without
upgrading it first. But myabe members of this list could say differently.
On Friday, February 7, 2020, Jorick Astrego <jorick(a)netbulae.eu
<mailto:jorick@netbulae.eu>> wrote:
On 2/6/20 6:22 PM, Amit Bawer wrote:
>
>
> On Thu, Feb 6, 2020 at 2:54 PM Jorick Astrego <jorick(a)netbulae.eu
> <mailto:jorick@netbulae.eu>> wrote:
>
>
> On 2/6/20 1:44 PM, Amit Bawer wrote:
>>
>>
>> On Thu, Feb 6, 2020 at 1:07 PM Jorick Astrego
>> <jorick(a)netbulae.eu <mailto:jorick@netbulae.eu>> wrote:
>>
>> Here you go, this is from the activation I just did a
>> couple of minutes ago.
>>
>> I was hoping to see how it was first connected to host, but
>> it doesn't go that far back. Anyway, the storage domain type
>> is set from engine and vdsm never try to guess it as far as
>> I saw.
>
> I put the host in maintenance and activated it again, this
> should give you some more info. See attached log.
>
>> Could you query the engine db about the misbehaving domain
>> and paste the results?
>>
>> # su - postgres
>> Last login: Thu Feb 6 07:17:52 EST 2020 on pts/0
>> -bash-4.2$
>> LD_LIBRARY_PATH=/opt/rh/rh-postgresql10/root/lib64/
>> /opt/rh/rh-postgresql10/root/usr/bin/psql engine
>> psql (10.6)
>> Type "help" for help.
>> engine=# select * from storage_domain_static where id =
>> 'f5d2f7c6-093f-46d6-a844-224d92db5ef9' ;
>
>
> engine=# select * from storage_domain_static where id =
> 'f5d2f7c6-093f-46d6-a844-224d92db5ef9' ;
> id |
> storage | storage_name |
> storage_domain_type | storage_type |
> storage_domain_format_type |
> _create_date | _update_date |
> recoverable | la
> st_time_used_as_master | storage_description |
> storage_comment | wipe_after_delete |
> warning_low_space_indicator |
> critical_space_action_blocker | first_metadata_device |
> vg_metadata_device | discard_after_delete | backup |
> warning_low_co
> nfirmed_space_indicator | block_size
>
--------------------------------------+--------------------------------------+--------------+---------------------+--------------+----------------------------+-------------------------------+-----------------------------+-------------+---
>
-----------------------+---------------------+-----------------+-------------------+-----------------------------+-------------------------------+-----------------------+--------------------+----------------------+--------+---------------
> ------------------------+------------
> f5d2f7c6-093f-46d6-a844-224d92db5ef9 |
> b8b456f0-27c3-49b9-b5e9-9fa81fb3cdaa | backupnfs
> | 1 | 1 |
> 4 | 2018-01-19
> 13:31:25.899738+01 | 2019-02-14 14:36:22.3171+01 |
> t |
> 1530772724454 |
> | | f
> | 10
> | 5 |
> | | f | f
> |
> 0 | 512
> (1 row)
>
>
>
> Thanks for sharing,
>
> The storage_type in db is indeed NFS (1),
> storage_domain_format_type is 4 - for ovirt 4.3 the
> storage_domain_format_type is 5 by default and usually datacenter
> upgrade is required for 4.2 to 4.3 migration, which not sure if
> possible in your current setup since you have 4.2 nodes using
> this storage as well.
>
> Regarding the repeating monitor failure for the SD:
>
> 2020-02-05 14:17:54,190+0000 WARN (monitor/f5d2f7c)
> [storage.LVM] Reloading VGs failed
> (vgs=[u'f5d2f7c6-093f-46d6-a844-224d92db5ef9'] rc=5 out=[]
> err=[' Volume group "f5d2f7c6-093f-46d6-a844-224d92db5ef9" not
> found', ' Cannot process volume group
> f5d2f7c6-093f-46d6-a844-224d92db5ef9']) (lvm:470)
>
> This error means that the monitor has tried to query the SD as a
> VG first and failed, this is expected for the fallback code
> called for finding a domain missing from SD cache:
>
> def_findUnfetchedDomain(self, sdUUID):
> ...
> formod in(blockSD, glusterSD, localFsSD, nfsSD):
> try:
> returnmod.findDomain(sdUUID)
> exceptse.StorageDomainDoesNotExist:
> pass
> exceptException:
> self.log.error(
> "Error while looking for domain `%s`",
> sdUUID, exc_info=True)
> raisese.StorageDomainDoesNotExist(sdUUID)
>
> 2020-02-05 14:17:54,201+0000 ERROR (monitor/f5d2f7c)
> [storage.Monitor] Setting up monitor for
> f5d2f7c6-093f-46d6-a844-224d92db5ef9 failed (monitor:330)
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
> 327, in _setupLoop
> self._setupMonitor()
> File
> "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
> 349, in _setupMonitor
> self._produceDomain()
> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line
> 159, in wrapper
> value = meth(self, *a, **kw)
> File
> "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
> 367, in _produceDomain
> self.domain = sdCache.produce(self.sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 110, in produce
> domain.getRealDomain()
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 51, in getRealDomain
> return self._cache._realProduce(self._sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 134, in _realProduce
> domain = self._findDomain(sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 151, in _findDomain
> return findMethod(sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 176, in _findUnfetchedDomain
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'f5d2f7c6-093f-46d6-a844-224d92db5ef9',)
>
> This error part means it failed to query the domain for any
> possible type, either for NFS.
>
> Are you able to create a new NFS storage domain on the same
> storage server (but on another export path not to harm the
> existing one)?
> If you do succeed to connect to it from the 4.3 datacenter, it
> could mean the v4 format is an issue;
> otherwise it could mean there is an issue with a different NFS
> settings required for 4.3.
Well this will be a problem either way, when I add a new NFS it
will not be storage_domain_format_type 5 as the DC is still on 4.2.
Also when I do add a type 5 nfs domain, the 4.2 nodes will try to
mount it, fail and then become non-responsive taking the whole
running cluster down?
Regards,
Jorick Astrego
Met vriendelijke groet, With kind regards,
Jorick Astrego
*
Netbulae Virtualization Experts *
------------------------------------------------------------------------
Tel: 053 20 30 270 info(a)netbulae.eu <mailto:info@netbulae.eu>
Staalsteden 4-3A
<
https://www.google.com/maps/search/Staalsteden+4-3A?entry=gmail&sourc...
KvK 08198180
Fax: 053 20 30 271
www.netbulae.eu <
http://www.netbulae.eu> 7547
TA Enschede BTW NL821234584B01
------------------------------------------------------------------------
Met vriendelijke groet, With kind regards,
Jorick Astrego
Netbulae Virtualization Experts
----------------
Tel: 053 20 30 270 info(a)netbulae.eu Staalsteden 4-3A KvK 08198180
Fax: 053 20 30 271