Alright, some combination of messing around got me mostly back up and running here. Iscsi is working reliably, VMs are running on the new iscsi domain.  the old NFS domain is empty save a couple of OVF_STORE disks. 

The procedure now, as I understand it, is to shut down all vms, put all storage domains into maintenance except the old NFS and new SCSI domain, then push the NFS one into maintenance, which *should* promote the ISCSI domain to master.  After that, I'll try to move the HE storage to iscsi too, although it would be very helpful to have access to this document for that procedure. 

thanks for your assistance!

On Thu, Oct 4, 2018 at 2:23 PM Oliver Riesener <Oliver.Riesener@hs-bremen.de> wrote:
Hi Vincent,

nice to hear the news :-)
 
I have read the BZ and see you run into NFS trouble and solved it now.

I took a look on my centos server for nfs data domains and
see server running V4 and the clients (node mounts with protocol vers=4.1)

I run the latest (and greatest) ovirt stable 4.2.6.4-1 on centos 7.5+ with
engine installed and a ovirt-node 4.2.6.4.

* If you can migrate your running VMs and can switch your SPM, 
  i would upgrade and reboot the hosts one by one, now.

* reboot seem to be a minimum, remember you do that `virt. thing´,
  therefor you can access and boot your bare metal and host os ;-)


ok, back to iSCSI, i have also a EQUALOGIC running as iSCSI target over years.

* I have allowed multi host access to the volumes which ovirt uses.
  Access control lists contains raw IP addresses from my ovirt-hosts.

  ovirt handles the volume access virtuos with multipathd and lvm vg’s and lv’s.
  unused lvs are offline (host specific) and released volumes are deactivated.
  
* Also it’s possible you have to reinstall (from GUI) your hosts,
  to upgrade or install the needed packages, which handles iSCSI Client access.

* If you then free from errors and your iscsi data domain still missing, we talk
  about vg activation and domain import.

Sheers

Oliver


Am 04.10.2018 um 22:00 schrieb Vincent Royer <vincent@epicenergy.ca>:

Ok, getting somewhere here. 

did a rpcinfo -p and found no nfs entries in portmap. 

systemctl stop nfs
systemctl start nfs

Suddenly shares are mounted and datacenter is up again. 

was able to add export domain over NFS.  

Why would nfs shit the bed?  

still can't seem to get iscsi mounted properly now, and that's where all the disks are located :/


On Thu, Oct 4, 2018 at 11:00 AM Vincent Royer <vincent@epicenergy.ca> wrote:
Thanks for your help Oliver, 

To give you some background here:

Host 1 on Ovirt 4.2 attached to NFS storage
Host 2 I upgraded to Ovirt 4.2.5 and then 4.2.6, since then it has had troubles with NFS due to this bug https://bugzilla.redhat.com/show_bug.cgi?id=1595549.   The host was up and could run the hosted engine, but could not migrate any VMs to it. 

I decided to switch from NFS to ISCSI so that I could stay on current releases.   So I began the work of attaching iscsi domain. 

The iscsi domain attached, and I transferred most of the disks to it.  Then it started melting down saying that Host 1 could not mount it, and the whole DC went down. 

Current status is data center "non responsive".  Keeps trying "Reconstructing master domain on Data Center"  over and over again but always fails.  Master domain status is "inactive".  Clicking activate fails.  The new ISCSI domain, I put in maintenance until I figure the rest out.  I can't add or remove any other domains, Ovirt says I need to attach the master first. 

Both hosts are "UP".   Host 1 health is "bad"   Host 2 health is "ok", and it is running HE.  Host 1 (the 4.2 host) says "this host needs to be reinstalled".  But the reinstall option is grayed out. 

I am weary about updating host1, because of the NFS storage bug... I fear it won't ever be able to attach the old domain again. 

If I try mounting the NFS shares in cockpit from either node, they say "mount.nfs: Remote I/O error".   However on another blank centos machine sitting on the same network, I can mount the shares normally. 

Vincent Royer
778-825-1057


SUSTAINABLE MOBILE ENERGY SOLUTIONS





On Thu, Oct 4, 2018 at 1:04 AM Oliver Riesener <Oliver.Riesener@hs-bremen.de> wrote:

When your hosts are up and running and your Domain didn't go active within minutes

* Activate your Storage Domain under:

Storage -> Storage Domain -> (Open your Domain)  -> Data Center -> (Right Click Your Data Center Name) -> Activate.

On 10/4/18 9:50 AM, Oliver Riesener wrote:

Hi Vincent,

OK you master domain, isn't avail a the moment, but no panic.

First off all we need the status from your hosts. No HOSTS -> No Storage !

* Do you reboot them hard, without Confirm "Host has been rebooted"

* Are they actived in the DataCenter / Cluster ? Green Arrow ?


On 10/4/18 7:46 AM, Vincent Royer wrote:
I was attempting to migrate from nfs to iscsi storage domains.  I have reached a state where I can no longer activate the old master storage domain, and thus no others will activate either. 

I'm ready to give up on the installation and just move to an HCI deployment instead.  Wipe all the hosts clean and start again. 

My plan was to create and use an export domain, then wipe the nodes and set them up HCI where I could re-import.  But without being able to activate a master domain, I can't create the export domain.

I'm not sure why it can't find the master anymore, as nothing has happened to the NFS storage, but the error in vdsm says it just can't find it:

StoragePoolMasterNotFound: Cannot find master domain: u'spUUID=5a77bed1-0238-030c-0122-0000000003b3, msdUUID=d3165759-07c2-46ae-b7b8-b6226a929d68'
2018-10-03 22:40:33,751-0700 INFO  (jsonrpc/3) [storage.TaskManager.Task] (Task='83f33db5-90f3-4064-87df-0512ab9b6378') aborting: Task is aborted: "Cannot find master domain: u'spUUID=5a77bed1-0238-030c-0122-0000000003b3, msdUUID=d3165759-07c2-46ae-b7b8-b6226a929d68'" - code 304 (task:1181)
2018-10-03 22:40:33,751-0700 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH connectStoragePool error=Cannot find master domain: u'spUUID=5a77bed1-0238-030c-0122-0000000003b3, msdUUID=d3165759-07c2-46ae-b7b8-b6226a929d68' (dispatcher:82)
2018-10-03 22:40:33,751-0700 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StoragePool.connect failed (error 304) in 0.17 seconds (__init__:573)
2018-10-03 22:40:34,200-0700 INFO  (jsonrpc/1) [api.host] START getStats() from=::ffff:172.16.100.13,39028 (api:46)

When I look in cockpit on the hosts, the storage domain is mounted and seems fine. 



_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LTZ6SIFYDFEMSZ4ACUNVC5KETWG7BBIZ/
-- 
Mit freundlichem Gruß


Oliver Riesener

--
Hochschule Bremen
Elektrotechnik und Informatik
Oliver Riesener
Neustadtswall 30
D-28199 Bremen

Tel: 0421 5905-2405, Fax: -2400
e-mail:oliver.riesener@hs-bremen.de
Tel: 0421 5905-2405, Fax: -2400
e-mail:oliver.riesener@hs-bremen.de
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/V72KMULZJAT3XIR3GBTOCA5RLACVQSRC/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OZ3TOO554D4YAZJ6FPGI4SAJ6CKWZRFH/