[ovirt-users] Hung task finalizing live migration

Gervais de Montbrun gervais at demontbrun.com
Sat Sep 10 14:39:02 UTC 2016


Hi Maton,

I have seen tasks in a weird state on my cluster also. I've had a vm get "stuck" during a migration where it says "migrating to" in the web GUI, but it has finished migrating hours ago... If I click "Cancel Migraton" the gui tells me that it is not migrating, but I can't do any action on the vm because I am then told that the vm can't be acted upon while it is migrating. I also try to kill the task, but there are none listed

What has worked for me has been to put my hosted-engine in global maintenance mode, then ssh into the hosted engine and run the "engine-setup" command. I am not saying the is the best course of action, but when the engine comes back online the task is cleared.

Cheers,
Gervais



> On Sep 10, 2016, at 11:06 AM, Maton, Brett <matonb at ltresources.co.uk> wrote:
> 
> Anyone know how to fix this broken task ?
> 
> It's persisted through a reboot of all hosts and the engine, something needs deleting from the database to clear the task and release the locked disk
> 
> On 8 September 2016 at 13:25, Maton, Brett <matonb at ltresources.co.uk <mailto:matonb at ltresources.co.uk>> wrote:
> Thanks for the pointer Mikhail, however I don't get any tasks listed with that command:
> 
> vdsClient -s 0 getAllTasksStatuses
> 
> /usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
>   from vdsm import utils, vdscli, constants
> 
> {'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {}}
> 
> 
> On 8 September 2016 at 09:51, Краснобаев Михаил <milo1 at ya.ru <mailto:milo1 at ya.ru>> wrote:
> Hi,
>  
> There is a way to cancel a running task  -  look here http://lists.ovirt.org/pipermail/users/2014-November/028946.html <http://lists.ovirt.org/pipermail/users/2014-November/028946.html>
> I was able to stop snapshot deletion this way.
>  
> Best, Mikhail.
>  
> 08.09.2016, 08:14, "Maton, Brett" <matonb at ltresources.co.uk <mailto:matonb at ltresources.co.uk>>:
>> Any suggestions ?
>> 
>> THe task has been hung for 5 days now, I can't start the machine or destroy it.
>> 
>> 
>> On 7 September 2016 at 06:49, Maton, Brett <matonb at ltresources.co.uk <mailto:matonb at ltresources.co.uk>> wrote:
>> Sorry just hit reply....
>> 
>> I'm seeing these errors in the logs which look related to the problem:
>> 
>> 
>> 2016-09-07 06:46:35,123 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler6) [19c58c0d] Failed invoking callback end method 'onFailed' for command '07608003-ca05-4e2e-b917-85ce525c011b' with exception 'null', the callback is marked for end method retries
>> 2016-09-07 06:46:45,184 ERROR [org.ovirt.engine.core.bll.Com <http://org.ovirt.engine.core.bll.com/>mandsFactory] (DefaultQuartzScheduler7) [19c58c0d] Error in invocating CTOR of command 'LiveMigrateDisk': null
>> 2016-09-07 06:46:45,185 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler7) [19c58c0d] Failed invoking callback end method 'onFailed' for command '07608003-ca05-4e2e-b917-85ce525c011b' with exception 'null', the callback is marked for end method retries
>> 
>> On 5 September 2016 at 06:46, Nir Soffer <nsoffer at redhat.com <mailto:nsoffer at redhat.com>> wrote:
>> Hi Maton,
>> 
>> Please reply to the list, not to me directly.
>> 
>> Ala, can you look at this? is this a known issue?
>> 
>> Thanks,
>> Nir
>> 
>> On Mon, Sep 5, 2016 at 8:43 AM, Maton, Brett <matonb at ltresources.co.uk <mailto:matonb at ltresources.co.uk>> wrote:
>> > Log files as requested
>> >
>> > https://ufile.io/4fc35 <https://ufile.io/4fc35> vdsm log
>> > https://ufile.io/e9836 <https://ufile.io/e9836> engine 03-Sep
>> > https://ufile.io/15f37 <https://ufile.io/15f37> engine 04-Sep
>> >
>> > vdsm log stops on the 01-Sep...
>> >
>> > Couple of entries from the event log:
>> >
>> > Sep 3, 2016 7:31:07 PM    Snapshot 'Auto-generated for Live Storage
>> > Migration' deletion for VM 'lv01' has been completed.
>> > Sep 3, 2016 6:46:46 PM    Snapshot 'Auto-generated for Live Storage
>> > Migration' deletion for VM 'lv01' was initiated by SYSTEM
>> >
>> > And the related tasks
>> >
>> > Removing Snapshot Auto-generated for Live Storage Migration of VM lv01
>> > Sep 3, 2016 6:46:44 PM        N/A    29f45ca9
>> > Validating    Sep 3, 2016 6:46:44 PM    until    Sep 3, 2016 6:46:44 PM
>> > Executing    Sep 3, 2016 6:46:44 PM    until    Sep 3, 2016 7:31:06 PM
>> >
>> > Finalizing    Sep 3, 2016 7:31:06 PM        N/A
>> >
>> >
>> >
>> > On 4 September 2016 at 14:27, Nir Soffer <nsoffer at redhat.com <mailto:nsoffer at redhat.com>> wrote:
>> >>
>> >> On Sun, Sep 4, 2016 at 12:40 PM, Maton, Brett <matonb at ltresources.co.uk <mailto:matonb at ltresources.co.uk>>
>> >> wrote:
>> >>>
>> >>> How do I fix / kill a hung vdsm task?
>> >>>
>> >>> It seems to have completed the task but is stuck finalising.
>> >>>
>> >>> Removing Snapshot Auto-generated for Live Storage Migration
>> >>> Validating
>> >>> Executing
>> >>> (hour glass) Finalizing
>> >>>
>> >>> Task has been 'stuck' finalising for over 13 hours
>> >>
>> >>
>> >> Can you share engine and vdsm logs since the time the merge was started?
>> >>
>> >> Nir
>> >
>> >
>> ,
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org <mailto:Users at ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> 
>  
> -- 
> С уважением, Краснобаев Михаил.
>  
>  
> 
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160910/cc832e89/attachment-0001.html>


More information about the Users mailing list