Ok, getting somewhere here. 

did a rpcinfo -p and found no nfs entries in portmap. 

systemctl stop nfs
systemctl start nfs

Suddenly shares are mounted and datacenter is up again. 

was able to add export domain over NFS.  

Why would nfs shit the bed?  

still can't seem to get iscsi mounted properly now, and that's where all the disks are located :/


On Thu, Oct 4, 2018 at 11:00 AM Vincent Royer <vincent@epicenergy.ca> wrote:
Thanks for your help Oliver, 

To give you some background here:

Host 1 on Ovirt 4.2 attached to NFS storage
Host 2 I upgraded to Ovirt 4.2.5 and then 4.2.6, since then it has had troubles with NFS due to this bug https://bugzilla.redhat.com/show_bug.cgi?id=1595549.   The host was up and could run the hosted engine, but could not migrate any VMs to it. 

I decided to switch from NFS to ISCSI so that I could stay on current releases.   So I began the work of attaching iscsi domain. 

The iscsi domain attached, and I transferred most of the disks to it.  Then it started melting down saying that Host 1 could not mount it, and the whole DC went down. 

Current status is data center "non responsive".  Keeps trying "Reconstructing master domain on Data Center"  over and over again but always fails.  Master domain status is "inactive".  Clicking activate fails.  The new ISCSI domain, I put in maintenance until I figure the rest out.  I can't add or remove any other domains, Ovirt says I need to attach the master first. 

Both hosts are "UP".   Host 1 health is "bad"   Host 2 health is "ok", and it is running HE.  Host 1 (the 4.2 host) says "this host needs to be reinstalled".  But the reinstall option is grayed out. 

I am weary about updating host1, because of the NFS storage bug... I fear it won't ever be able to attach the old domain again. 

If I try mounting the NFS shares in cockpit from either node, they say "mount.nfs: Remote I/O error".   However on another blank centos machine sitting on the same network, I can mount the shares normally. 

Vincent Royer
778-825-1057


SUSTAINABLE MOBILE ENERGY SOLUTIONS





On Thu, Oct 4, 2018 at 1:04 AM Oliver Riesener <Oliver.Riesener@hs-bremen.de> wrote:

When your hosts are up and running and your Domain didn't go active within minutes

* Activate your Storage Domain under:

Storage -> Storage Domain -> (Open your Domain)  -> Data Center -> (Right Click Your Data Center Name) -> Activate.

On 10/4/18 9:50 AM, Oliver Riesener wrote:

Hi Vincent,

OK you master domain, isn't avail a the moment, but no panic.

First off all we need the status from your hosts. No HOSTS -> No Storage !

* Do you reboot them hard, without Confirm "Host has been rebooted"

* Are they actived in the DataCenter / Cluster ? Green Arrow ?


On 10/4/18 7:46 AM, Vincent Royer wrote:
I was attempting to migrate from nfs to iscsi storage domains.  I have reached a state where I can no longer activate the old master storage domain, and thus no others will activate either. 

I'm ready to give up on the installation and just move to an HCI deployment instead.  Wipe all the hosts clean and start again. 

My plan was to create and use an export domain, then wipe the nodes and set them up HCI where I could re-import.  But without being able to activate a master domain, I can't create the export domain.

I'm not sure why it can't find the master anymore, as nothing has happened to the NFS storage, but the error in vdsm says it just can't find it:

StoragePoolMasterNotFound: Cannot find master domain: u'spUUID=5a77bed1-0238-030c-0122-0000000003b3, msdUUID=d3165759-07c2-46ae-b7b8-b6226a929d68'
2018-10-03 22:40:33,751-0700 INFO  (jsonrpc/3) [storage.TaskManager.Task] (Task='83f33db5-90f3-4064-87df-0512ab9b6378') aborting: Task is aborted: "Cannot find master domain: u'spUUID=5a77bed1-0238-030c-0122-0000000003b3, msdUUID=d3165759-07c2-46ae-b7b8-b6226a929d68'" - code 304 (task:1181)
2018-10-03 22:40:33,751-0700 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH connectStoragePool error=Cannot find master domain: u'spUUID=5a77bed1-0238-030c-0122-0000000003b3, msdUUID=d3165759-07c2-46ae-b7b8-b6226a929d68' (dispatcher:82)
2018-10-03 22:40:33,751-0700 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StoragePool.connect failed (error 304) in 0.17 seconds (__init__:573)
2018-10-03 22:40:34,200-0700 INFO  (jsonrpc/1) [api.host] START getStats() from=::ffff:172.16.100.13,39028 (api:46)

When I look in cockpit on the hosts, the storage domain is mounted and seems fine. 



_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LTZ6SIFYDFEMSZ4ACUNVC5KETWG7BBIZ/
-- 
Mit freundlichem Gruß


Oliver Riesener

--
Hochschule Bremen
Elektrotechnik und Informatik
Oliver Riesener
Neustadtswall 30
D-28199 Bremen

Tel: 0421 5905-2405, Fax: -2400
e-mail:oliver.riesener@hs-bremen.de
Tel: 0421 5905-2405, Fax: -2400
e-mail:oliver.riesener@hs-bremen.de
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/V72KMULZJAT3XIR3GBTOCA5RLACVQSRC/