[ovirt-users] Changing iSCSI LUN host IP and changing master domain

Trey Dockendorf treydock at gmail.com
Tue Oct 21 18:14:19 EDT 2014


John,

Thanks again for the reply.  Yes the API at the path you mentioned shows
the domain.  This has to have been a bug as things began working after I
changed values in the database.  Somehow setting the new IP for the storage
connection in the database for both NFS and iSCSI resulted in the NFS
domain becoming master again and at that point the iSCSI "magically" went
active once NFS (master) was active.  I don't pretend to know how this
happened and even my boss laughed when I shrugged to the question "how did
you fix it?".  I'd be glad to supply the devs with whatever information I
can, but I can't change much now as the goal of today was to get back
online and that's been achieved.

One thing I may have done that could have been a cause of iSCSI not coming
back was once I lost the IB fabric, in order to disconnect iSCSI that was
over ISER, I issued the "vgchange -an <domain ID>" and then logged out of
the iscsi session on each ovirt node.  One of my hosts would not
re-activate once everything was back online and doing a "vgchange -ay
<domain ID>" then removing the host from maintenance worked.  Since I had
to switch from one network to another and from iSER to iSCSI, I wanted all
active connections closed and the only way I could make the block devices
disconnect cleanly was to disable the volume group on the LUN.

Thanks,
- Trey

On Tue, Oct 21, 2014 at 4:06 PM, Sandra Taylor <jtt77777 at gmail.com> wrote:

> Trey,
> The thread that keeps repeating is the call to repoStats. I believe
> it's part of the storage monitoring and in my environment it repeats
> every 15 seconds
> Mine looks like
> Thread-168::INFO::2014-10-21
> 15:02:42,616::logUtils::44::dispatcher::(wrapper) Run and protect:
> repoStats(options=None)
> Thread-168::INFO::2014-10-21
> 15:02:42,617::logUtils::47::dispatcher::(wrapper) Run and protect:
> repoStats, Return response: {'86f0a388-dc9d-4e44-a599-b3f2c9e58922':
> {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.00066814',
> 'lastCheck': '1.8', 'valid': True}}
>
> but yours isn't returning anything , that's the the response: {}
>
> But I think that the problem is that the hsm isn't finding volume
> groups in its call to lvm vgs, and thus no storage domains (below in
> the No volume groups found and  Found SD uuids: () )
>
> Thread-14::DEBUG::2014-10-21
> 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
> /sbin/lvm vgs --config " devices { preferred_names =
> [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0
> disable_after_error_count=3)
> Thread-14::DEBUG::2014-10-21
> 15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '
> No volume groups found\n'; <rc> = 0
> Thread-14::DEBUG::2014-10-21
> 15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm
> reload operation' released the operation mutex
> Thread-14::DEBUG::2014-10-21
> 15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD
> uuids: ()
> Thread-14::DEBUG::2014-10-21
> 15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs:
> {}
>
> But I don't really know how that's possible considering you show what
> looks to be an domain in the lvscan.
> The only thing that comes to mind is that there was a bug in some of
> the iscsi initiator tools where there was an error returned if a
> session was already logged in but that doesn't look to be the case by
> the logs. Or maybe something like lvmetad caching but vdsm uses its
> own config to turn lvmetad off  (at /var/run/vdsm/lvm I think)
>
> Does the storage domain with that id exist ?
> It should be seen at
> /api/storagedomains/4eeb8415-c912-44bf-b482-2673849705c9
>
> -John
>
>
>
> On Tue, Oct 21, 2014 at 4:17 PM, Trey Dockendorf <treydock at gmail.com>
> wrote:
> > John,
> >
> > Thanks for reply.  The Discover function in GUI works...it's once I try
> and
> > login (Click the array next to target) that things just hang
> indefinitely.
> >
> > # iscsiadm -m session
> > tcp: [2] 10.0.0.10:3260,1
> > iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi
> >
> > # iscsiadm -m node
> > 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi
> >
> > # multipath -ll
> > 1IET_00010001 dm-3 IET,VIRTUAL-DISK
> > size=500G features='0' hwhandler='0' wp=rw
> > `-+- policy='round-robin 0' prio=1 status=active
> >   `- 8:0:0:1 sdd 8:48 active ready running
> > 1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0
> > size=466G features='0' hwhandler='0' wp=rw
> > `-+- policy='round-robin 0' prio=1 status=active
> >   `- 3:0:0:0 sdc 8:32 active ready running
> >
> > The first entry, 1IET_00010001 is the iSCSI LUN.
> >
> > The log when I click the array in the interface for the target is this:
> >
> > Thread-14::DEBUG::2014-10-21
> > 15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99]
> > flowID [7177dafe]
> > Thread-14::DEBUG::2014-10-21
> > 15:12:49,901::task::595::TaskManager.Task::(_updateState)
> > Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init ->
> state
> > preparing
> > Thread-14::INFO::2014-10-21
> > 15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect:
> > connectStorageServer(domType=3,
> > spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection':
> > '10.0.0.10', 'iqn': 'iqn.2014-04.edu.tamu.brazos.)
> > Thread-14::DEBUG::2014-10-21
> > 15:12:49,902::iscsiadm::92::Storage.Misc.excCmd::(_runCmd)
> '/usr/bin/sudo -n
> > /sbin/iscsiadm -m node -T
> > iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
> > 10.0.0.10:3260,1 --op=new' (cwd None)
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,684::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS:
> <err> =
> > ''; <rc> = 0
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,685::iscsiadm::92::Storage.Misc.excCmd::(_runCmd)
> '/usr/bin/sudo -n
> > /sbin/iscsiadm -m node -T
> > iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
> > 10.0.0.10:3260,1 -l' (cwd None)
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS:
> <err> =
> > ''; <rc> = 0
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd)
> '/usr/bin/sudo -n
> > /sbin/iscsiadm -m node -T
> > iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
> > 10.0.0.10:3260,1 -n node.startup -v manual --op)
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,767::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS:
> <err> =
> > ''; <rc> = 0
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,767::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm
> reload
> > operation' got the operation mutex
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
> > /sbin/lvm vgs --config " devices { preferred_names =
> [\\"^/dev/mapper/\\"]
> > ignore_suspended_devices=1 write_cache_state=0
> disable_after_error_count=3)
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  No
> > volume groups found\n'; <rc> = 0
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm
> reload
> > operation' released the operation mutex
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD
> uuids: ()
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs: {}
> > Thread-14::INFO::2014-10-21
> > 15:12:56,974::logUtils::47::dispatcher::(wrapper) Run and protect:
> > connectStorageServer, Return response: {'statuslist': [{'status': 0,
> 'id':
> > '00000000-0000-0000-0000-000000000000'}]}
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,974::task::1185::TaskManager.Task::(prepare)
> > Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::finished: {'statuslist':
> > [{'status': 0, 'id': '00000000-0000-0000-0000-000000000000'}]}
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,975::task::595::TaskManager.Task::(_updateState)
> > Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state preparing
> ->
> > state finished
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,975::resourceManager::940::ResourceManager.Owner::(releaseAll)
> > Owner.releaseAll requests {} resources {}
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,975::resourceManager::977::ResourceManager.Owner::(cancelAll)
> > Owner.cancelAll requests {}
> > Thread-14::DEBUG::2014-10-21
> > 15:12:56,975::task::990::TaskManager.Task::(_decref)
> > Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::ref 0 aborting False
> > Thread-13::DEBUG::2014-10-21
> > 15:13:18,281::task::595::TaskManager.Task::(_updateState)
> > Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::moving from state init ->
> state
> > preparing
> > Thread-13::INFO::2014-10-21
> > 15:13:18,281::logUtils::44::dispatcher::(wrapper) Run and protect:
> > repoStats(options=None)
> > Thread-13::INFO::2014-10-21
> > 15:13:18,282::logUtils::47::dispatcher::(wrapper) Run and protect:
> > repoStats, Return response: {}
> > Thread-13::DEBUG::2014-10-21
> > 15:13:18,282::task::1185::TaskManager.Task::(prepare)
> > Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::finished: {}
> > Thread-13::DEBUG::2014-10-21
> > 15:13:18,282::task::595::TaskManager.Task::(_updateState)
> > Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::moving from state preparing
> ->
> > state finished
> > Thread-13::DEBUG::2014-10-21
> > 15:13:18,282::resourceManager::940::ResourceManager.Owner::(releaseAll)
> > Owner.releaseAll requests {} resources {}
> > Thread-13::DEBUG::2014-10-21
> > 15:13:18,282::resourceManager::977::ResourceManager.Owner::(cancelAll)
> > Owner.cancelAll requests {}
> > Thread-13::DEBUG::2014-10-21
> > 15:13:18,283::task::990::TaskManager.Task::(_decref)
> > Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::ref 0 aborting False
> >
> > The lines prefixed with "Thread-13" just repeat over and over only
> changing
> > the Task value.
> >
> > Unsure what could be done to restore things.  The iscsi connection is
> good
> > and I'm able to see the logical volumes:
> >
> > # lvscan
> >   ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/metadata'
> > [512.00 MiB] inherit
> >   ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/leases'
> [2.00
> > GiB] inherit
> >   ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/ids'
> [128.00
> > MiB] inherit
> >   ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/inbox'
> > [128.00 MiB] inherit
> >   ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/outbox'
> > [128.00 MiB] inherit
> >   ACTIVE            '/dev/4eeb8415-c912-44bf-b482-2673849705c9/master'
> [1.00
> > GiB] inherit
> >   inactive
> >
> '/dev/4eeb8415-c912-44bf-b482-2673849705c9/aced9726-5a28-4d52-96f5-89553ba770af'
> > [100.00 GiB] inherit
> >   inactive
> >
> '/dev/4eeb8415-c912-44bf-b482-2673849705c9/87bf28aa-be25-4a93-9b23-f70bfd8accc0'
> > [1.00 GiB] inherit
> >   inactive
> >
> '/dev/4eeb8415-c912-44bf-b482-2673849705c9/27256587-bf87-4519-89e7-260e13697de3'
> > [20.00 GiB] inherit
> >   inactive
> >
> '/dev/4eeb8415-c912-44bf-b482-2673849705c9/ac2cb7f9-1df9-43dc-9fda-8a9958ef970f'
> > [20.00 GiB] inherit
> >   inactive
> >
> '/dev/4eeb8415-c912-44bf-b482-2673849705c9/d8c41f05-006a-492b-8e5f-101c4e113b28'
> > [100.00 GiB] inherit
> >   inactive
> >
> '/dev/4eeb8415-c912-44bf-b482-2673849705c9/83f17e9b-183e-4bad-ada5-bcef1c5c8e6a'
> > [20.00 GiB] inherit
> >   inactive
> >
> '/dev/4eeb8415-c912-44bf-b482-2673849705c9/cf79052e-b4ef-4bda-96dc-c53b7c2acfb5'
> > [20.00 GiB] inherit
> >   ACTIVE            '/dev/vg_ovirtnode02/lv_swap' [46.59 GiB] inherit
> >   ACTIVE            '/dev/vg_ovirtnode02/lv_root' [418.53 GiB] inherit
> >
> > Thanks,
> > - Trey
> >
> >
> >
> > On Tue, Oct 21, 2014 at 2:49 PM, Sandra Taylor <jtt77777 at gmail.com>
> wrote:
> >>
> >> Hi Trey,
> >> Sorry for your trouble.
> >> Don't know if I can help but I run iscsi here as my primary domain so
> >> I've had some experience with it.
> >> I don't know the answer to the master domain question.
> >>
> >> Does iscsi show connected  using iscsiadm -m session and   -m node  ?
> >> in the vdsm log there should be the iscsiadm commands that were
> >> executed to connect.
> >> Does multipath -ll show anything?
> >>
> >> -John
> >>
> >> On Tue, Oct 21, 2014 at 3:18 PM, Trey Dockendorf <treydock at gmail.com>
> >> wrote:
> >> > I was able to get iSCSI over TCP working...but now the task of adding
> >> > the
> >> > LUN to the GUI has been stuck at the "spinning" icon for about 20
> >> > minutes.
> >> >
> >> > I see these entries in vdsm.log over and over with the Task value
> >> > changing:
> >> >
> >> > Thread-14::DEBUG::2014-10-21
> >> > 14:16:50,086::task::595::TaskManager.Task::(_updateState)
> >> > Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state init ->
> >> > state
> >> > preparing
> >> > Thread-14::INFO::2014-10-21
> >> > 14:16:50,086::logUtils::44::dispatcher::(wrapper) Run and protect:
> >> > repoStats(options=None)
> >> > Thread-14::INFO::2014-10-21
> >> > 14:16:50,086::logUtils::47::dispatcher::(wrapper) Run and protect:
> >> > repoStats, Return response: {}
> >> > Thread-14::DEBUG::2014-10-21
> >> > 14:16:50,087::task::1185::TaskManager.Task::(prepare)
> >> > Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::finished: {}
> >> > Thread-14::DEBUG::2014-10-21
> >> > 14:16:50,087::task::595::TaskManager.Task::(_updateState)
> >> > Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state
> preparing
> >> > ->
> >> > state finished
> >> > Thread-14::DEBUG::2014-10-21
> >> >
> 14:16:50,087::resourceManager::940::ResourceManager.Owner::(releaseAll)
> >> > Owner.releaseAll requests {} resources {}
> >> > Thread-14::DEBUG::2014-10-21
> >> > 14:16:50,087::resourceManager::977::ResourceManager.Owner::(cancelAll)
> >> > Owner.cancelAll requests {}
> >> > Thread-14::DEBUG::2014-10-21
> >> > 14:16:50,087::task::990::TaskManager.Task::(_decref)
> >> > Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::ref 0 aborting False
> >> >
> >> > What is there I can do to get my storage back online?  Right now my
> >> > iSCSI is
> >> > master (something I did not want) which is odd considering the NFS
> data
> >> > domain was added as master when I setup oVirt.  Nothing will come back
> >> > until
> >> > I get the master domain online and unsure what to do now.
> >> >
> >> > Thanks,
> >> > - Trey
> >> >
> >> > On Tue, Oct 21, 2014 at 12:58 PM, Trey Dockendorf <treydock at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> I had a catastrophic failure of the IB switch that was used by all my
> >> >> storage domains.  I had one data domain that was NFS and one that was
> >> >> iSCSI.
> >> >> I managed to get the iSCSI LUN detached using the docs [1] but now I
> >> >> noticed
> >> >> that somehow my master domain went from the NFS domain to the iSCSI
> >> >> domain
> >> >> and I'm unable to switch them back.
> >> >>
> >> >> How does one change the master?  Right now I am having issues getting
> >> >> iSCSI over TCP to work, so am sort of stuck with 30 VMs down and an
> >> >> entire
> >> >> cluster inaccessible.
> >> >>
> >> >> Thanks,
> >> >> - Trey
> >> >>
> >> >> [1] http://www.ovirt.org/Features/Manage_Storage_Connections
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Users mailing list
> >> > Users at ovirt.org
> >> > http://lists.ovirt.org/mailman/listinfo/users
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20141021/8e4e3152/attachment-0001.html>


More information about the Users mailing list