Re: [ovirt-users] Gluster: VM disk stuck in transfer; georep gone wonky

20 Mar 2018

      Unfortunately, I came under heavy pressure to get this vm back up.  So, i
did more googling and attempted to recover myself.  I've gotten closer, but
still not quite.

I found this post:

http://lists.ovirt.org/pipermail/users/2015-November/035686.html

Which gave me the unlock tool, which was successful in unlocking the disk.
Unfortunately, it did not delete the task, nor did ovirt do so on its own
after the disk was unlocked.

So I found the taskcleaner.sh in the same directory and attempted to clean
the task out....except it doesn't seem to see the task (none of the show
tasks options seemed to work or the delete all options).  I did still have
the task uuid from the gui, so i attempted to use that, but all I got back
was a "t" on one line and a "0" on the next, so I have no idea what that
was supposed to mean.  In any case, the web UI still shows the task, still
won't let me start the VM and appears convinced its still copying.  I've
tried restarting the engine and vdsm on the SPM, neither have helped.  I
can't find any evidence of the task on the command line; only in the UI.

I'd create a new VM if i could rescue the image, but I'm not sure I can
manage to get this image accepted in another VM

How do i recover now?

--Jim

On Mon, Mar 19, 2018 at 9:38 AM, Jim Kusznir <jim@palousetech.com> wrote:
...
Hi all:
Sorry for yet another semi-related message to the list.  In my attempts to
troubleshoot and verify some suspicions on the nature of the performance
problems I posted under "Major Performance Issues with gluster", I
attempted to move one of my problem VM's back to the original storage
(SSD-backed).  It appeared to be moving fine, but last night froze at 84%.
This morning (8hrs later), its still at 84%.
I need to get that VM back up and running, but I don't know how...It seems
to be stuck in limbo.
The only thing I explicitly did last night as well that may have caused an
issue is finally set up and activated georep to an offsite backup machine.
That too seems to have gone a bit wonky.  On the ovirt server side, it
shows normal with all but data-hdd show a last sync'ed time of 3am (which
matches my bandwidth graphs for the WAN connections involved).  data-hdd
(the new disk-backed storage with most of my data in it) shows not yet
synced, but I'm also not currently seeing bandwidth usage anymore.
I logged into the georep destination box, and found system load a bit
high, a bunch of gluster and rsync processes running, and both data and
data-hdd using MORE disk space than the origional (data-hdd using 4x more
disk space than is on the master node).  Not sure what to do about this; I
paused the replication from the cluster, but that hasn't seem to had an
effect on the georep destination.
I promise I'll stop trying things until I get guidance from the list!
Please do help; I need the VM HDD unstuck so I can start it.
Thanks!
--Jim