[JIRA] (OVIRT-609) Jenkins snapshot creation failed

Evgheni Dereveanchin (oVirt JIRA) jira at ovirt-jira.atlassian.net
Fri Jun 24 09:14:00 UTC 2016


    [ https://ovirt-jira.atlassian.net/browse/OVIRT-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604#comment-17604 ] 

Evgheni Dereveanchin commented on OVIRT-609:
--------------------------------------------

Here are host logs. As suspected, I did not find any errors, the VM was suspended  at 09:06:47 MST and resumed at 09:17:26 MST with Thread-7164775 returning successfully after copying 33498670080 bytes of RAM to the storage domain.

{quote}Thread-7164775::DEBUG::2016-06-23 09:06:45,911::BindingXMLRPC::1133::vds::(wrapper) client [66.187.230.60]::call vmSnapshot with ('e7a7b735-0310-4f88-9ed9-4fed85835a01', [{'baseVolumeID': 'f37836c6-4bbe-4c8d-abf4-275cf461262e', 'domainID': 'ba023ff2-4e0e-4a32-86f3-923414206667', 'volumeID': '3b105e9b-53fe-4452-be71-2ac2182ecfec', 'imageID': '140adf46-fce4-4dba-980d-37d91416b12b'}], 'ba023ff2-4e0e-4a32-86f3-923414206667,00000002-0002-0002-0002-000000000150,2beb0ee6-b70b-4f48-bdd9-d89650383d61,daef68b9-5967-4047-9b17-1f55b68e5d8a,3580f2a1-a55a-47d0-9e67-627afbc0f2da,6c20093d-a5f3-407a-8986-ca26a488cb20') {}
...
Thread-7164775::DEBUG::2016-06-23 09:06:47,459::vm::4432::vm.Vm::(snapshot) vmId=`e7a7b735-0310-4f88-9ed9-4fed85835a01`::<domainsnapshot>
        <disks>
                <disk name="vda" snapshot="external" type="file">
                        <source file="/rhev/data-center/00000002-0002-0002-0002-000000000150/ba023ff2-4e0e-4a32-86f3-923414206667/images/140adf46-fce4-4dba-980d-37d91416b12b/3b105e9b-53fe-4452-be71-2ac2182ecfec" type="file"/>
                </disk>
        </disks>
        <memory file="/rhev/data-center/00000002-0002-0002-0002-000000000150/ba023ff2-4e0e-4a32-86f3-923414206667/images/2beb0ee6-b70b-4f48-bdd9-d89650383d61/daef68b9-5967-4047-9b17-1f55b68e5d8a" snapshot="external"/>
</domainsnapshot>
...
libvirtEventLoop::DEBUG::2016-06-23 09:06:47,645::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`e7a7b735-0310-4f88-9ed9-4fed85835a01`::event Suspended detail 0 opaque None
...
Thread-7164775::DEBUG::2016-06-23 09:17:26,338::outOfProcess::169::Storage.oop::(padToBlockSize) Truncating file /rhev/data-center/00000002-0002-0002-0002-000000000150/ba023ff2-4e0e-4a32-86f3-923414206667/images/2beb0ee6-b70
b-4f48-bdd9-d89650383d61/daef68b9-5967-4047-9b17-1f55b68e5d8a to 33498670080 bytes
...
libvirtEventLoop::DEBUG::2016-06-23 09:17:26,317::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`e7a7b735-0310-4f88-9ed9-4fed85835a01`::event Resumed detail 0 opaque None
...
Thread-7164775::DEBUG::2016-06-23 09:17:26,450::BindingXMLRPC::1140::vds::(wrapper) return vmSnapshot with {'status': {'message': 'Done', 'code': 0}, 'quiesce': False}
{quote}

On Engine the process timed out after 3 minutes and in reality it took 11 minutes. This suggests the snapshot is likely completely healthy, I'll take a sosreport from the host just in case we need to further investigate this, maybe [~landgraf] can check the logs for more clues.

> Jenkins snapshot creation failed
> --------------------------------
>
>                 Key: OVIRT-609
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-609
>             Project: oVirt - virtualization made easy
>          Issue Type: Bug
>            Reporter: Evgheni Dereveanchin
>            Assignee: infra
>
> [~ngoldin at redhat.com] issued a live snapshot creation on the Jenkins VM to prepare it for cluster move. This failed and it's not really clear why. Relevant event logs below, suggesting that the hypervisor  started dumping VM memory to the snapshot which caused a storage slowdown.
> {quote}2016-Jun-23, 18:06 Snapshot 'ngoldin_before_cluster_move' creation for VM 'jenkins-phx-ovirt-org' was initiated by admin.
> 2016-Jun-23, 18:09 Failed to create live snapshot 'ngoldin_before_cluster_move' for VM 'jenkins-phx-ovirt-org'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency.
> 2016-Jun-23, 18:13 Host ovirt-srv02 has network interface which exceeded the defined threshold [95%] (em1: transmit rate[100%], receive rate [0%])
> 2016-Jun-23, 18:13 Storage domain Production experienced a high latency of 18.7802 seconds from host ovirt-srv11. This may cause performance and functional issues. Please consult your Storage Administrator.{quote}



--
This message was sent by Atlassian JIRA
(v1000.98.4#100004)



More information about the Infra mailing list