Re: [JIRA] (OVIRT-609) Jenkins snapshot creation failed

24 Jun 2016

      ...
Jenkins snapshot creation failed
--------------------------------
Key: OVIRT-609
                URL: https://ovirt-jira.atlassian.net/browse/OVIRT-609
            Project: oVirt - virtualization made easy
         Issue Type: Bug
           Reporter: Evgheni Dereveanchin
           Assignee: infra
[~ngoldin@redhat.com] issued a live snapshot creation on the Jenkins VM
to prepare it for cluster move. This failed and it's not really clear why.
Relevant event logs below, suggesting that the hypervisor  started dumping
VM memory to the snapshot which caused a storage slowdown.
{quote}2016-Jun-23, 18:06 Snapshot 'ngoldin_before_cluster_move' creation
for VM 'jenkins-phx-ovirt-org' was initiated by admin.
2016-Jun-23, 18:09 Failed to create live snapshot
'ngoldin_before_cluster_move' for VM 'jenkins-phx-ovirt-org'. VM restart is
recommended. Note that using the created snapshot might cause data
inconsistency.
2016-Jun-23, 18:13 Host ovirt-srv02 has network interface which exceeded
I suggest to halt work on production dc until we move at least a few
hypervisors to use the vdsm scratch pad hook for local disk and migrate
thier vms to use it,  so we'll see a significant improvement in storage
performance before moving on with production dc.
On Jun 24, 2016 11:01 AM, "Evgheni Dereveanchin (oVirt JIRA)" <
jira@ovirt-jira.atlassian.net> wrote:

    [
https://ovirt-jira.atlassian.net/browse/OVIRT-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600#comment-17600
]

Evgheni Dereveanchin commented on OVIRT-609:
--------------------------------------------

Here are some relevant messages from engine.log:
{quote}
grep 1394b752 /var/log/ovirt-engine/engine.log
2016-06-23 09:06:34,099 INFO
[org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand]
(ajp--127.0.0.1-8702-1) [1394b752] Lock Acquired to object EngineLock
[exclusiveLocks= key: e7a7b735-0310-4f88-9ed9-4fed85835a01 value: VM
2016-06-23 09:06:35,708 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-8-thread-15) Correlation ID: 1394b752, Job ID:
a8fab0bf-d45e-46eb-8314-e22db8e6a3f4, Call Stack: null, Custom Event ID:
-1, Message: Snapshot 'ngoldin_before_cluster_move' creation for VM
'jenkins-phx-ovirt-org' was initiated by admin.
2016-06-23 09:09:46,038 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-8-thread-14) Correlation ID: 1394b752, Job ID:
a8fab0bf-d45e-46eb-8314-e22db8e6a3f4, Call Stack:
org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
java.util.concurrent.TimeoutException (Failed with error VDS_NETWORK_ERROR
and code 5022)
2016-06-23 09:09:47,859 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-8-thread-14) Correlation ID: 1394b752, Job ID:
a8fab0bf-d45e-46eb-8314-e22db8e6a3f4, Call Stack:
org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
java.util.concurrent.TimeoutException (Failed with error VDS_NETWORK_ERROR
and code 5022){quote}

Looks like VDSM was slow to respond (probably due to storage slowness)
while the snapshot is likely to have completed fine. I'll review host logs
and share my findings.

the defined threshold [95%] (em1: transmit rate[100%], receive rate [0%])
...
2016-Jun-23, 18:13 Storage domain Production experienced a high latency
of 18.7802 seconds from host ovirt-srv11. This may cause performance and
functional issues. Please consult your Storage Administrator.{quote}
--
This message was sent by Atlassian JIRA
(v1000.98.4#100004)
_______________________________________________
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra

Re: [JIRA] (OVIRT-609) Jenkins snapshot creation failed

Eyal Edri