[ovirt-users] Recovering iSCSI domain (Was: Changing iSCSI LUN host IP and changing master domain)

21 Oct 2014

      Somehow my NFS domain got to be master again.  I went into the database and
updated the connections for NFS and I noticed that once I updated the IP
for the ISCSI in the "storage_server_connections" table that the interface
kept moving "(master)" between the iSCSI and NFS domain...very odd.

I did these commands and now NFS is up.

update storage_server_connections set connection='10.0.0.10:/tank/ovirt/data'
where id='a89fa66b-8737-4bb8-a089-d9067f61b58a';
update storage_server_connections set
connection='10.0.0.10:/tank/ovirt/import_export'
where id='521a8477-9e88-4f2d-96e2-d3667ec407df';
update storage_server_connections set
connection='192.168.202.245:/tank/ovirt/iso'
where id='fb55cfea-c7ef-49f2-b77f-16ddd2de0f7a';
update storage_server_connections set connection='10.0.0.10' where
id='d6da7fbf-5056-44a7-9fc8-e76a1ff9f525';

Once I activated the NFS master domain all my other domains went to active,
including iSCSI.

My concern now is whether the iSCSI domain is usable.  The API path at
"/api/storagedomains/4eeb8415-c912-44bf-b482-2673849705c9/storageconnections"
shows

<storage_connections/>

If I go to edit the iSCSI domain and check the LUN the warning I get is
this:

This operation might be unrecoverable and destructive!
The following LUNs are already in use:
- 1IET_00010001 (Used by VG: 3nxXNr-bIHu-9YS5-Kfzc-A2Na-sMhb-jihwdt)

That alone makes me very hesitant to approve the operation.  I could use
some wisdom if this is safe or not.

Thanks,
- Trey

On Tue, Oct 21, 2014 at 3:17 PM, Trey Dockendorf <treydock@gmail.com> wrote:
...
John,
Thanks for reply.  The Discover function in GUI works...it's once I try
and login (Click the array next to target) that things just hang
indefinitely.
# iscsiadm -m session
tcp: [2] 10.0.0.10:3260,1
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi
# iscsiadm -m node
10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi
# multipath -ll
1IET_00010001 dm-3 IET,VIRTUAL-DISK
size=500G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 8:0:0:1 sdd 8:48 active ready running
1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0
size=466G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 3:0:0:0 sdc 8:32 active ready running
The first entry, 1IET_00010001 is the iSCSI LUN.
The log when I click the array in the interface for the target is this:
Thread-14::DEBUG::2014-10-21
15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99]
flowID [7177dafe]
Thread-14::DEBUG::2014-10-21
15:12:49,901::task::595::TaskManager.Task::(_updateState)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init ->
state preparing
Thread-14::INFO::2014-10-21
15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect:
connectStorageServer(domType=3,
spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection':
'10.0.0.10', 'iqn': 'iqn.2014-04.edu.tamu.brazos.)
Thread-14::DEBUG::2014-10-21
15:12:49,902::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo
-n /sbin/iscsiadm -m node -T
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
10.0.0.10:3260,1 --op=new' (cwd None)
Thread-14::DEBUG::2014-10-21
15:12:56,684::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> =
''; <rc> = 0
Thread-14::DEBUG::2014-10-21
15:12:56,685::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo
-n /sbin/iscsiadm -m node -T
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
10.0.0.10:3260,1 -l' (cwd None)
Thread-14::DEBUG::2014-10-21
15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> =
''; <rc> = 0
Thread-14::DEBUG::2014-10-21
15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo
-n /sbin/iscsiadm -m node -T
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
10.0.0.10:3260,1 -n node.startup -v manual --op)
Thread-14::DEBUG::2014-10-21
15:12:56,767::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> =
''; <rc> = 0
Thread-14::DEBUG::2014-10-21
15:12:56,767::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload
operation' got the operation mutex
Thread-14::DEBUG::2014-10-21
15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
/sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"]
ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3)
Thread-14::DEBUG::2014-10-21
15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  No
volume groups found\n'; <rc> = 0
Thread-14::DEBUG::2014-10-21
15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload
operation' released the operation mutex
Thread-14::DEBUG::2014-10-21
15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD uuids: ()
Thread-14::DEBUG::2014-10-21
15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs: {}
Thread-14::INFO::2014-10-21
15:12:56,974::logUtils::47::dispatcher::(wrapper) Run and protect:
connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id':
'00000000-0000-0000-0000-000000000000'}]}
Thread-14::DEBUG::2014-10-21
15:12:56,974::task::1185::TaskManager.Task::(prepare)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::finished: {'statuslist':
[{'status': 0, 'id': '00000000-0000-0000-0000-000000000000'}]}
Thread-14::DEBUG::2014-10-21
15:12:56,975::task::595::TaskManager.Task::(_updateState)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state preparing ->
state finished
Thread-14::DEBUG::2014-10-21
15:12:56,975::resourceManager::940::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-14::DEBUG::2014-10-21
15:12:56,975::resourceManager::977::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-14::DEBUG::2014-10-21
15:12:56,975::task::990::TaskManager.Task::(_decref)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::ref 0 aborting False
Thread-13::DEBUG::2014-10-21
15:13:18,281::task::595::TaskManager.Task::(_updateState)
Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::moving from state init ->
state preparing
Thread-13::INFO::2014-10-21
15:13:18,281::logUtils::44::dispatcher::(wrapper) Run and protect:
repoStats(options=None)
Thread-13::INFO::2014-10-21
15:13:18,282::logUtils::47::dispatcher::(wrapper) Run and protect:
repoStats, Return response: {}
Thread-13::DEBUG::2014-10-21
15:13:18,282::task::1185::TaskManager.Task::(prepare)
Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::finished: {}
Thread-13::DEBUG::2014-10-21
15:13:18,282::task::595::TaskManager.Task::(_updateState)
Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::moving from state preparing ->
state finished
Thread-13::DEBUG::2014-10-21
15:13:18,282::resourceManager::940::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-13::DEBUG::2014-10-21
15:13:18,282::resourceManager::977::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-13::DEBUG::2014-10-21
15:13:18,283::task::990::TaskManager.Task::(_decref)
Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::ref 0 aborting False
The lines prefixed with "Thread-13" just repeat over and over only
changing the Task value.
Unsure what could be done to restore things.  The iscsi connection is good
and I'm able to see the logical volumes:
# lvscan
  ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/metadata'
[512.00 MiB] inherit
  ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/leases'
[2.00 GiB] inherit
  ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/ids'
[128.00 MiB] inherit
  ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/inbox'
[128.00 MiB] inherit
  ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/outbox'
[128.00 MiB] inherit
  ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/master'
[1.00 GiB] inherit
  inactive
 '/dev/4eeb8415-c912-44bf-b482-2673849705c9/aced9726-5a28-4d52-96f5-89553ba770af'
[100.00 GiB] inherit
  inactive
 '/dev/4eeb8415-c912-44bf-b482-2673849705c9/87bf28aa-be25-4a93-9b23-f70bfd8accc0'
[1.00 GiB] inherit
  inactive
 '/dev/4eeb8415-c912-44bf-b482-2673849705c9/27256587-bf87-4519-89e7-260e13697de3'
[20.00 GiB] inherit
  inactive
 '/dev/4eeb8415-c912-44bf-b482-2673849705c9/ac2cb7f9-1df9-43dc-9fda-8a9958ef970f'
[20.00 GiB] inherit
  inactive
 '/dev/4eeb8415-c912-44bf-b482-2673849705c9/d8c41f05-006a-492b-8e5f-101c4e113b28'
[100.00 GiB] inherit
  inactive
 '/dev/4eeb8415-c912-44bf-b482-2673849705c9/83f17e9b-183e-4bad-ada5-bcef1c5c8e6a'
[20.00 GiB] inherit
  inactive
 '/dev/4eeb8415-c912-44bf-b482-2673849705c9/cf79052e-b4ef-4bda-96dc-c53b7c2acfb5'
[20.00 GiB] inherit
  ACTIVE            '/dev/vg_ovirtnode02/lv_swap' [46.59 GiB] inherit
  ACTIVE            '/dev/vg_ovirtnode02/lv_root' [418.53 GiB] inherit
Thanks,
- Trey
On Tue, Oct 21, 2014 at 2:49 PM, Sandra Taylor <jtt77777@gmail.com> wrote:
...
Hi Trey,
Sorry for your trouble.
Don't know if I can help but I run iscsi here as my primary domain so
I've had some experience with it.
I don't know the answer to the master domain question.
Does iscsi show connected  using iscsiadm -m session and   -m node  ?
in the vdsm log there should be the iscsiadm commands that were
executed to connect.
Does multipath -ll show anything?
-John
...
I was able to get iSCSI over TCP working...but now the task of adding
...
LUN to the GUI has been stuck at the "spinning" icon for about 20
minutes.
I see these entries in vdsm.log over and over with the Task value
changing:
Thread-14::DEBUG::2014-10-21
14:16:50,086::task::595::TaskManager.Task::(_updateState)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state init ->
state
preparing
Thread-14::INFO::2014-10-21
14:16:50,086::logUtils::44::dispatcher::(wrapper) Run and protect:
repoStats(options=None)
Thread-14::INFO::2014-10-21
14:16:50,086::logUtils::47::dispatcher::(wrapper) Run and protect:
repoStats, Return response: {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::task::1185::TaskManager.Task::(prepare)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::finished: {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::task::595::TaskManager.Task::(_updateState)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state
On Tue, Oct 21, 2014 at 3:18 PM, Trey Dockendorf <treydock@gmail.com>
wrote:
the
preparing ->
...
state finished
Thread-14::DEBUG::2014-10-21
14:16:50,087::resourceManager::940::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::resourceManager::977::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::task::990::TaskManager.Task::(_decref)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::ref 0 aborting False
What is there I can do to get my storage back online?  Right now my
iSCSI is
master (something I did not want) which is odd considering the NFS data
domain was added as master when I setup oVirt.  Nothing will come back
until
I get the master domain online and unsure what to do now.
Thanks,
- Trey
On Tue, Oct 21, 2014 at 12:58 PM, Trey Dockendorf <treydock@gmail.com>
wrote:
...
I had a catastrophic failure of the IB switch that was used by all my
storage domains.  I had one data domain that was NFS and one that was
iSCSI.
...
I managed to get the iSCSI LUN detached using the docs [1] but now I
noticed
that somehow my master domain went from the NFS domain to the iSCSI
domain
and I'm unable to switch them back.
How does one change the master?  Right now I am having issues getting
iSCSI over TCP to work, so am sort of stuck with 30 VMs down and an
entire
cluster inaccessible.
Thanks,
- Trey
[1] http://www.ovirt.org/Features/Manage_Storage_Connections
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users