Hi Gadi,
Thanks for the response.
I've only just had chance to restore that node back, which completed
fine and the DC and it's storage domains activated ok. So the
ovirt-node is now fully functional again on the 3.3.2 version; the
ovirt-engine server remains untouched on 3.4 and is working fine.
I collected the requested info, but it doesn't list the ID you pulled
out of the vdsm.log:
[root@ovirt-node ~]# vdsClient -s 0 getStorageDomainsList
e637bb04-a8b7-4c77-809c-d58051494c52
7b083758-45f9-4896-913d-11fe02043e6e
c2c4ade6-049e-4159-a294-a0c151f4983d
0a897f2e-1b01-4577-9f91-cd136ef4a978
I tried anyway to list against that ID and I get the storage domain
doesn't exist error, but maybe that's because it really doesn't exist!?
[root@ovirt-node ~]# vdsClient -s 0 getStorageDomainInfo
9a4a80a1-5377-4a94-ade3-e58183e916ae
Storage domain does not exist: ('9a4a80a1-5377-4a94-ade3-e58183e916ae',)
I grabbed the info against all the Storage Domain IDs from the list
without issue:
[root@ovirt-node ~]#
## Storage Domain: e637bb04-a8b7-4c77-809c-d58051494c52 ##
uuid = e637bb04-a8b7-4c77-809c-d58051494c52
pool = ['c713062f-300f-4256-9ac8-2d3fcfcdb002']
lver = -1
version = 3
role = Regular
remotePath = /vdsm_store/s2data1_s2usr_boot1
spm_id = -1
type = LOCALFS
class = Data
master_ver = 0
name = s2data1_s2usr_boot1
####
## Storage Domain: 7b083758-45f9-4896-913d-11fe02043e6e ##
uuid = 7b083758-45f9-4896-913d-11fe02043e6e
pool = ['f027ec99-913f-4f00-ac95-ad484c9c6a4b',
'c713062f-300f-4256-9ac8-2d3fcfcdb002']
lver = -1
version = 0
role = Regular
remotePath = ovirt-engine:/iso
spm_id = -1
type = NFS
class = Iso
master_ver = 0
name = ISO1_ZFS
####
## Storage Domain: c2c4ade6-049e-4159-a294-a0c151f4983d ##
uuid = c2c4ade6-049e-4159-a294-a0c151f4983d
pool = ['c713062f-300f-4256-9ac8-2d3fcfcdb002']
lver = -1
version = 3
role = Regular
remotePath = /vdsm_store/s2data1_s2mgt_app1
spm_id = -1
type = LOCALFS
class = Data
master_ver = 0
name = s2data1_s2mgt_app1
####
## Storage Domain: 0a897f2e-1b01-4577-9f91-cd136ef4a978 ##
uuid = 0a897f2e-1b01-4577-9f91-cd136ef4a978
pool = ['c713062f-300f-4256-9ac8-2d3fcfcdb002']
lver = 1
version = 3
role = Master
remotePath = /vdsm_store/s2data1_s2mgt_boot1
spm_id = 1
type = LOCALFS
class = Data
master_ver = 1
name = s2data1_s2mgt_boot1
####
I checked the logfile for today and since the restore there are no
errors about this storage domain not existing:
[root@ovirt-node vdsm]# grep "04-29" vdsm.log|grep
"StorageDomainDoesNotExist"
[root@ovirt-node vdsm]#
I also checked the IDs listed on the ovirt-engine server just on the
off-chance, but the ID throwing the error doesn't exist on that either:
[root@ovirt-engine ~]# vdsClient -s 0 getStorageDomainsList
feb04d94-4ea8-471c-b759-3ed95943e9a3
b3c02266-2426-4285-b4dc-0acba75af530
7b083758-45f9-4896-913d-11fe02043e6e
79178b6b-8d98-45e4-93f2-3ce1d7a270a5
None of the storage domains exist on the root VG partitions, they are
completely separate disks in separate VGs. I did delete some Storage
Domains on both the ovirt-engine and the ovirt-node a number of weeks
back, but if it was due to this I would have expected the issues on both
servers, not just one.
Have you found anything else that looks interesting in the log?
Thanks, Paul
On 28/04/2014 07:07, Gadi Ickowicz wrote:
Hi Paul,
I am still looking into this log, but from a quick first assessment, it looks like (for
some reason I don't know yet...) there is a storage domain that is missing. This is
visible in the following error traceback in the vdsm log:
Thread-29::ERROR::2014-04-27
12:43:05,825::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain) Error
while collecting domain 9a4a80a1-5377-4a94-ade3-e58183e916ae m
Traceback (most recent call last):
File "/usr/share/vdsm/storage/domainMonitor.py", line 204, in
_monitorDomain
self.domain = sdCache.produce(self.sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
domain.getRealDomain()
File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
('9a4a80a1-5377-4a94-ade3-e58183e916ae',)
However, I do not see any information in the vdsm log itself about what this domain is
(yet - it may be there and I am still looking). The reason vdsm is trying to access this
storage domain (9a4a80a1-5377-4a94-ade3-e58183e916ae) is that it appears to be part of the
storage pool (datacenter) according to the pool's metadata, as seen in the following
lines from the initial connection to the datacenter, when vdsm first starts up:
Thread-13::DEBUG::2014-04-27
12:42:05,604::persistentDict::234::Storage.PersistentDict::(refresh) read lines
(FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=s2data1_s2mgt_boot1',
'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60',
'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1',
'POOL_DESCRIPTION=BDS_DataCentre2',
'POOL_DOMAINS=7b083758-45f9-4896-913d-11fe02043e6e:Active,e637bb04-a8b7-4c77-809c-d58051494c52:Active,0a897ff2e-1b01-4577-9f91-cd136ef4a978:Active,c2c4ade6-049e-4159-a294-a0c151f4983d:Active,9a4a80a1-5377-4a94-ade3-e58183e916ae:Active',
'POOL_SPM_ID=-1', 'POOL_SPM_LVER=0',
'POOL_UUID=c713062f-300f-4256-9ac8-2d3fcfcdb002',
'REMOTE_PATH=/vdsm_store/s2data1_s2mgt_boot1', 'ROLE=Master',
'SDUUID=0a897f2e-1b01-4577-9f91-cd136ef4a978', 'TYPE=LOCALFS',
'VERSION=3', '_SHA_CKSUM=afe618d7596d75d0fb96453bcdd34a1255534454']
Is it possible you had another domain attached to this datacenter before the upgrade that
is somehow on the root vg partitions and gets destroyed along the upgrade process, and
then reverting brings them back?
If you have this system currently back to 3.3.2 it should be up, could you run the
following commands on the ovirt-node:
vdsClient -s 0 getStorageDomainsList <- This lists all storage domains' ids
that vdsm (the ovirt-node) can currently see (hopefully the domain in question's ID
should be listed there)
vdsClient -s 0 getStorageDomainInfo 9a4a80a1-5377-4a94-ade3-e58183e916ae <- displays
information about the storage domain. If this succeeds we should know a bit more about the
domain
Thanks,
Gadi Ickowicz
----- Original Message -----
From: regpm(a)mccleary.me.uk
To: "Gadi Ickowicz" <gickowic(a)redhat.com>, users(a)ovirt.org
Sent: Sunday, April 27, 2014 3:10:27 PM
Subject: Re: [ovirt-users] Storage Domain Not Found After upgrade from 3.3.2 to 3.4
Hi,
Yes, you're correct, Gadi. I have a single Ovirt engine, which has two
Datacenters; one is local storage on the Engine server and the other is
local storage on the Ovirt Node. I've renamed the servers in the
attached vdsm log output (from the ovirt-node) to ovirt-engine
(10.50.0.18) and ovirt-node (10.50.0.19).
BDS_DataCentre1 is on the ovirt-engine server and this works fine after
the upgrade.
BDS_DataCentre2 is on the ovirt-node and this is the one that fails to
activate due to the storage domains not being accessible.
The Master storage domain on the ovirt-node is s2data1_s2mgt_boot1.
There are two other storage domains as well: s2data1_s2mgt_app1 and
s2data1_s2usr_boot1. The underlying filesystems are mounted fine, and
as I said, if I restore the server (root vg partitions; the storage
domain filesystems are not touched) then it works fine again. So the
upgrade is not playing nicely for some reason, but it's not clear to me
from the log what the issue is.
Thanks,
Paul
On 27/04/2014 07:56, Gadi Ickowicz wrote:
> Hi,
>
> Could you please attach the vdsm log as a file (it is easier to read) for the failing
node?
>
> Also - I am a bit confused regarding what exactly your setup is - Do you have only a
single engine (the all-in-one) and 2 dcs, one for the all in one and one for the oVirt
node, which is the one that is failing?
>
> Thanks,
> Gadi Ickowicz
>
> ----- Original Message -----
> From: regpm(a)mccleary.me.uk
> To: users(a)ovirt.org
> Sent: Friday, April 25, 2014 10:46:35 PM
> Subject: [ovirt-users] Storage Domain Not Found After upgrade from 3.3.2 to 3.4
>
> Hi,
>
> I have an All-in-one installation engine node and an Ovirt node. They
> each use local storage and are thus configured in separate Data
> Centres. I upgraded both from 3.3.2 to 3.4 and this completed without
> error. I completed the engine-setup upgrade and this complete ok. I
> rebooted both the servers and the Ovirt engine node worked fine and I
> could start it's VMs. The Ovirt Node's datacenter is not activating,
> which seems to be due to none of the storage domains coming online.
> I've checked and the storage is mounted and available fine on the Ovirt
> node.
>
> Looking at the vdsm.log I can see errors stating that it can't find the
> storage domain. I have restored the entire node back to the pre-upgrade
> state and it works fine. It breaks again when I upgrade it. The
> approach I used was to put the Ovirt node in maintenance mode and run
> yum update. Anybody have similar issues or understand the log errors below?
>
> < SNIP SNIP original log output>
>
>
> Thanks, Paul
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users