upgrade from 3.5 to 3.6 causing problems with migration

Hi. Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly. Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!
2015-11-06 10:09:16,065 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1). 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1). 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1). 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1). 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).
The complete engine log, virt1, virt2, and virt3 vdsm logs are here: http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015 Jason.

On Fri, Nov 6, 2015 at 5:21 PM, Jason Keltz <jas@cse.yorku.ca> wrote:
Hi.
Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly. Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!
2015-11-06 10:09:16,065 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1). 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1). 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1). 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1). 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).
The complete engine log, virt1, virt2, and virt3 vdsm logs are here:
Is vdsmd service still active on that hosts?
Jason.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------080707060106030307090400 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 11/06/2015 02:02 PM, Simone Tiraboschi wrote:
On Fri, Nov 6, 2015 at 5:21 PM, Jason Keltz <jas@cse.yorku.ca <mailto:jas@cse.yorku.ca>> wrote:
Hi.
Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly. Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!
2015-11-06 10:09:16,065 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1). 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1). 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1). 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1). 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).
The complete engine log, virt1, virt2, and virt3 vdsm logs are here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015 <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/11062015>
Is vdsmd service still active on that hosts?
Hi Simone.. Yes.. virt1: sh-4.2# systemctl -l status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Thu 2015-11-05 22:47:46 EST; 15h ago Main PID: 16520 (vdsm) CGroup: /system.slice/vdsmd.service ├─16520 /usr/bin/python /usr/share/vdsm/vdsm ├─30038 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10 ├─30055 /usr/libexec/ioprocess --read-pipe-fd 76 --write-pipe-fd 75 --max-threads 10 --max-queued-requests 10 └─30062 /usr/libexec/ioprocess --read-pipe-fd 81 --write-pipe-fd 84 --max-threads 10 --max-queued-requests 10 Nov 06 10:09:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.com.redhat.rhevm.vdsm already removed Nov 06 10:09:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.org.qemu.guest_agent.0 already removed Nov 06 10:10:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.com.redhat.rhevm.vdsm already removed Nov 06 10:10:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.org.qemu.guest_agent.0 already removed Nov 06 10:42:36 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.com.redhat.rhevm.vdsm already removed Nov 06 10:42:36 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.org.qemu.guest_agent.0 already removed Nov 06 10:43:57 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.com.redhat.rhevm.vdsm already removed Nov 06 10:43:57 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.org.qemu.guest_agent.0 already removed Nov 06 10:52:08 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.com.redhat.rhevm.vdsm already removed Nov 06 10:52:08 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.org.qemu.guest_agent.0 already removed virt2: # systemctl -l status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Fri 2015-11-06 10:40:36 EST; 3h 55min ago Process: 5035 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 6427 (vdsm) CGroup: /system.slice/vdsmd.service ├─ 6334 /usr/libexec/ioprocess --read-pipe-fd 39 --write-pipe-fd 38 --max-threads 10 --max-queued-requests 10 ├─ 6342 /usr/libexec/ioprocess --read-pipe-fd 48 --write-pipe-fd 46 --max-threads 10 --max-queued-requests 10 ├─ 6354 /usr/libexec/ioprocess --read-pipe-fd 55 --write-pipe-fd 54 --max-threads 10 --max-queued-requests 10 ├─ 6427 /usr/bin/python /usr/share/vdsm/vdsm ├─15071 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 63 --max-threads 10 --max-queued-requests 10 ├─15125 /usr/libexec/ioprocess --read-pipe-fd 73 --write-pipe-fd 72 --max-threads 10 --max-queued-requests 10 └─15140 /usr/libexec/ioprocess --read-pipe-fd 83 --write-pipe-fd 82 --max-threads 10 --max-queued-requests 10 Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 parse_server_challenge() Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 ask_user_info() Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 make_client_response() Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 client step 3 Nov 06 10:42:34 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.com.redhat.rhevm.vdsm already removed Nov 06 10:42:34 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.org.qemu.guest_agent.0 already removed Nov 06 10:43:56 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.com.redhat.rhevm.vdsm already removed Nov 06 10:43:56 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.org.qemu.guest_agent.0 already removed Nov 06 10:52:08 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.com.redhat.rhevm.vdsm already removed Nov 06 10:52:08 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.org.qemu.guest_agent.0 already removed virt3: # systemctl -l status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Fri 2015-11-06 10:06:28 EST; 4h 30min ago Process: 12442 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 12446 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 12561 (vdsm) CGroup: /system.slice/vdsmd.service ├─ 9066 /usr/libexec/ioprocess --read-pipe-fd 40 --write-pipe-fd 36 --max-threads 10 --max-queued-requests 10 ├─ 9074 /usr/libexec/ioprocess --read-pipe-fd 48 --write-pipe-fd 47 --max-threads 10 --max-queued-requests 10 ├─ 9085 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --max-queued-requests 10 ├─12561 /usr/bin/python /usr/share/vdsm/vdsm ├─12726 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 63 --max-threads 10 --max-queued-requests 10 ├─12744 /usr/libexec/ioprocess --read-pipe-fd 72 --write-pipe-fd 74 --max-threads 10 --max-queued-requests 10 └─12750 /usr/libexec/ioprocess --read-pipe-fd 83 --write-pipe-fd 82 --max-threads 10 --max-queued-requests 10 Nov 06 10:09:15 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.com.redhat.rhevm.vdsm already removed Nov 06 10:09:15 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.org.qemu.guest_agent.0 already removed Nov 06 10:10:17 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.com.redhat.rhevm.vdsm already removed Nov 06 10:10:17 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.org.qemu.guest_agent.0 already removed Nov 06 10:15:05 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/d66235b4-7c8d-4da2-a572-82ecc9c53a59.com.redhat.rhevm.vdsm already removed Nov 06 10:15:05 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/d66235b4-7c8d-4da2-a572-82ecc9c53a59.org.qemu.guest_agent.0 already removed Nov 06 10:19:13 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/c64d80f9-a3bd-44f3-89c4-f1b9b509cdde.com.redhat.rhevm.vdsm already removed Nov 06 10:19:13 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/c64d80f9-a3bd-44f3-89c4-f1b9b509cdde.org.qemu.guest_agent.0 already removed Nov 06 10:42:56 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.com.redhat.rhevm.vdsm already removed Nov 06 10:42:56 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.org.qemu.guest_agent.0 already removed --- I don't know what those warnings mean... Jason. --------------080707060106030307090400 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 11/06/2015 02:02 PM, Simone Tiraboschi wrote:<br> </div> <blockquote cite="mid:CAN8-ONp4EyCOCgGCHrqWD4Eu_F4NgGXjB-NcoeuM8P9NYDrC9g@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Nov 6, 2015 at 5:21 PM, Jason Keltz <span dir="ltr"><<a moz-do-not-send="true" href="mailto:jas@cse.yorku.ca" target="_blank">jas@cse.yorku.ca</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi.<br> <br> Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly.<br> Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!<br> <br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2015-11-06 10:09:16,065 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1).<br> 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1).<br> 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state.<br> 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state.<br> 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1).<br> 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1).<br> 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).<br> </blockquote> <br> <br> The complete engine log, virt1, virt2, and virt3 vdsm logs are here:<br> <br> <a moz-do-not-send="true" href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/11062015" rel="noreferrer" target="_blank">http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015</a><br> <br> </blockquote> <div><br> </div> <div>Is vdsmd service still active on that hosts?</div> <div> </div> </div> </div> </div> </blockquote> <br> Hi Simone..<br> <br> Yes..<br> <br> virt1:<br> sh-4.2# systemctl -l status vdsmd<br> vdsmd.service - Virtual Desktop Server Manager<br> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)<br> Active: active (running) since Thu 2015-11-05 22:47:46 EST; 15h ago<br> Main PID: 16520 (vdsm)<br> CGroup: /system.slice/vdsmd.service<br> ├─16520 /usr/bin/python /usr/share/vdsm/vdsm<br> ├─30038 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10<br> ├─30055 /usr/libexec/ioprocess --read-pipe-fd 76 --write-pipe-fd 75 --max-threads 10 --max-queued-requests 10<br> └─30062 /usr/libexec/ioprocess --read-pipe-fd 81 --write-pipe-fd 84 --max-threads 10 --max-queued-requests 10<br> <br> Nov 06 10:09:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:09:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.org.qemu.guest_agent.0 already removed<br> Nov 06 10:10:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:10:15 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.org.qemu.guest_agent.0 already removed<br> Nov 06 10:42:36 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:42:36 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.org.qemu.guest_agent.0 already removed<br> Nov 06 10:43:57 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:43:57 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.org.qemu.guest_agent.0 already removed<br> Nov 06 10:52:08 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:52:08 virt1.cs.yorku.ca vdsm[16520]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.org.qemu.guest_agent.0 already removed<br> <br> virt2:<br> <br> # systemctl -l status vdsmd<br> vdsmd.service - Virtual Desktop Server Manager<br> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)<br> Active: active (running) since Fri 2015-11-06 10:40:36 EST; 3h 55min ago<br> Process: 5035 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)<br> Main PID: 6427 (vdsm)<br> CGroup: /system.slice/vdsmd.service<br> ├─ 6334 /usr/libexec/ioprocess --read-pipe-fd 39 --write-pipe-fd 38 --max-threads 10 --max-queued-requests 10<br> ├─ 6342 /usr/libexec/ioprocess --read-pipe-fd 48 --write-pipe-fd 46 --max-threads 10 --max-queued-requests 10<br> ├─ 6354 /usr/libexec/ioprocess --read-pipe-fd 55 --write-pipe-fd 54 --max-threads 10 --max-queued-requests 10<br> ├─ 6427 /usr/bin/python /usr/share/vdsm/vdsm<br> ├─15071 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 63 --max-threads 10 --max-queued-requests 10<br> ├─15125 /usr/libexec/ioprocess --read-pipe-fd 73 --write-pipe-fd 72 --max-threads 10 --max-queued-requests 10<br> └─15140 /usr/libexec/ioprocess --read-pipe-fd 83 --write-pipe-fd 82 --max-threads 10 --max-queued-requests 10<br> <br> Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 parse_server_challenge()<br> Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 ask_user_info()<br> Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 make_client_response()<br> Nov 06 10:40:37 virt2.eecs.yorku.ca python[6427]: DIGEST-MD5 client step 3<br> Nov 06 10:42:34 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:42:34 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8.org.qemu.guest_agent.0 already removed<br> Nov 06 10:43:56 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:43:56 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.org.qemu.guest_agent.0 already removed<br> Nov 06 10:52:08 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:52:08 virt2.eecs.yorku.ca vdsm[6427]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/3bb83ad5-dc00-445d-8708-2db18c53e0e6.org.qemu.guest_agent.0 already removed<br> <br> virt3:<br> <br> <br> # systemctl -l status vdsmd<br> vdsmd.service - Virtual Desktop Server Manager<br> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)<br> Active: active (running) since Fri 2015-11-06 10:06:28 EST; 4h 30min ago<br> Process: 12442 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)<br> Process: 12446 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)<br> Main PID: 12561 (vdsm)<br> CGroup: /system.slice/vdsmd.service<br> ├─ 9066 /usr/libexec/ioprocess --read-pipe-fd 40 --write-pipe-fd 36 --max-threads 10 --max-queued-requests 10<br> ├─ 9074 /usr/libexec/ioprocess --read-pipe-fd 48 --write-pipe-fd 47 --max-threads 10 --max-queued-requests 10<br> ├─ 9085 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --max-queued-requests 10<br> ├─12561 /usr/bin/python /usr/share/vdsm/vdsm<br> ├─12726 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 63 --max-threads 10 --max-queued-requests 10<br> ├─12744 /usr/libexec/ioprocess --read-pipe-fd 72 --write-pipe-fd 74 --max-threads 10 --max-queued-requests 10<br> └─12750 /usr/libexec/ioprocess --read-pipe-fd 83 --write-pipe-fd 82 --max-threads 10 --max-queued-requests 10<br> <br> Nov 06 10:09:15 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:09:15 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.org.qemu.guest_agent.0 already removed<br> Nov 06 10:10:17 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:10:17 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/aa487207-7ff4-465a-9d9b-2a103d50dc77.org.qemu.guest_agent.0 already removed<br> Nov 06 10:15:05 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/d66235b4-7c8d-4da2-a572-82ecc9c53a59.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:15:05 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/d66235b4-7c8d-4da2-a572-82ecc9c53a59.org.qemu.guest_agent.0 already removed<br> Nov 06 10:19:13 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/c64d80f9-a3bd-44f3-89c4-f1b9b509cdde.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:19:13 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/c64d80f9-a3bd-44f3-89c4-f1b9b509cdde.org.qemu.guest_agent.0 already removed<br> Nov 06 10:42:56 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.com.redhat.rhevm.vdsm already removed<br> Nov 06 10:42:56 virt3.eecs.yorku.ca vdsm[12561]: vdsm root WARNING File: /var/lib/libvirt/qemu/channels/62ff4ada-ee98-491e-bfb5-7adda7b513ee.org.qemu.guest_agent.0 already removed<br> <br> --- I don't know what those warnings mean...<br> <br> Jason.<br> <br> <br> <br> </body> </html> --------------080707060106030307090400--

--Apple-Mail=_886616C5-AEAD-4497-909B-866BF9541A33 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi everyone, I have been playing with latest OVirt the last little = while. I go back with RHEV since the early days when it was in BETA and = I was running it in production :} It has certainly come a long, long = way... I see lots of upgrade testing / new feature discussion going on here, I = wanted to share it with the community to take advantage of and ideally = make it even better. I plan to integrate this setup with Openstack and add IDM and other = features as soon as possible. If you want to try it out, you don=E2=80=99t need real hardware and = still get all the functionally of KVM. I used Ravello Systems to build it - https://ravellosystems.com - You = can utilize the free trial for 2 weeks. You can find the blueprint here -> = https://www.ravellosystems.com/repo/blueprints/64554219 Thanks for helping with the original issues I ran into getting it all = working properly. Kyle Here is the description: A perfect nested lab environment to learn (test upgrades / new features) = on OVirt (RHEV) and ManageIQ (Cloudforms)=20 I have been testing openstack blueprints from the Ravello REPO quite a = bit and saw no-one had build an OVirt setup. It's been a while since I used OVirt / RHEV (I had been using it since = back in the day when when it was BETA!) I already had an ESXi setup working with ManageIQ / Cloudforms so I = figured why not add OVirt / RHEV as well... Openstack will be next... Here is a blueprint (built on Fedora22) that includes an 2 node OVirt = cluster - OVirt Engine Version: 3.5.4.2-1.fc20 (Upstream Redhat = Enterprise Virtualization - RHEV) It also includes a ManageIQ (Upstream Cloudforms) instance to manage / = orchestrate the OVirt environment. Also a Fedora 22 Desktop to use as local management / jumpbox to your = environment. 1 OVirt Manager - oVirt Engine Version: 3.5.4.2-1.fc20 2 OVirt KVM Hypervisors 1 Ovirt hypervisor template (used to add more nodes to your cluster) 1 FreeNAS appliance to offer up shared NFS storage for the cluster 1 ManageIQ appliance 1 Fedora 22 Desktop (Configured with VNC access) http://www.ovirt.org/Home <http://www.ovirt.org/Home> http://manageiq.org/ <http://manageiq.org/> http://www.freenas.org/ <http://www.freenas.org/> Give the services and webui a few minutes to start up after booting. You will find id/password information in the description of each virtual = machine in the Ravello UI. Some vm's required a sshkey. If you want build you own hypervisor nodes - see some additional steps = below (this has already been done on the template I provided). There=E2=80=99s 2 small issues that you need to work around: Due to an issue with the the user-mode CPU detection in libvirt, this = patch needs to be applied to /usr/share/libvirt/cpu_map.xml. This patch = forces the CPU type to be an Opteron G2 independent of the CPUID. To = apply this patch, log on to the hypervisor in rescue mode, apply the = patch, and then issue the command =E2=80=9Cpersist = /usr/share/libvirt/cpu_map.xml=E2=80=9D. This needs to be done after = step 8 above. https://gist.github.com/geertj/56425d0fdc7c54d4bc9f = <https://gist.github.com/geertj/56425d0fdc7c54d4bc9f> Here is a good blog that details a similar setup for RHEV. = https://www.ravellosystems.com/blog/run-red-hat-enterprise-virtualization-= kvm-ec2/ = <https://www.ravellosystems.com/blog/run-red-hat-enterprise-virtualization= -kvm-ec2/> Feel free to ask questions or provide comments Have fun Kyle --Apple-Mail=_886616C5-AEAD-4497-909B-866BF9541A33 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; = -webkit-line-break: after-white-space;" class=3D""><div class=3D"">Hi = everyone, I have been playing with latest OVirt the last little while. = I go back with RHEV since the early days when it was in BETA and I = was running it in production :} It has certainly come a long, long = way...</div><div class=3D""><br class=3D""></div><div class=3D"">I see = lots of upgrade testing / new feature discussion going on here, I wanted = to share it with the community to take advantage of and ideally make it = even better.</div><div class=3D"">I plan to integrate this setup with = Openstack and add IDM and other features as soon as possible.</div><div = class=3D""><br class=3D""></div><div class=3D"">If you want to try it = out, you don=E2=80=99t need real hardware and still get all the = functionally of KVM.</div><div class=3D"">I used Ravello Systems to = build it - <b class=3D""><a href=3D"https://ravellosystems.com" = class=3D"">https://ravellosystems.com</a> </b>- You can utilize the free = trial for 2 weeks.</div><div class=3D""><br class=3D""></div><div = class=3D"">You can find the blueprint here -> <b class=3D""><a = href=3D"https://www.ravellosystems.com/repo/blueprints/64554219" = class=3D"">https://www.ravellosystems.com/repo/blueprints/64554219</a></b>= </div><div class=3D""><br class=3D""></div><div class=3D"">Thanks for = helping with the original issues I ran into getting it all working = properly.</div><div class=3D""><br class=3D""></div><div = class=3D"">Kyle</div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div = class=3D""><div class=3D"user-content"> <div class=3D"ng-isolate-scope = markdown" marked=3D"documentation" opts=3D"markedOptions"><p = class=3D""><strong class=3D"">Here is the description:</strong></p><p = class=3D""><strong class=3D"">A perfect nested lab environment to learn = (test upgrades / new features) on OVirt (RHEV) and ManageIQ (Cloudforms) </strong><br class=3D"">I have been testing openstack blueprints from = the Ravello REPO quite a bit and saw no-one had build an OVirt setup.<br = class=3D"">It's been a while since I used OVirt / RHEV (I had been using = it since back in the day when when it was BETA!)<br class=3D"">I already had an ESXi setup working with ManageIQ / Cloudforms so I=20 figured why not add OVirt / RHEV as well... Openstack will be = next...</p><p class=3D"">Here is a blueprint (built on Fedora22) that = includes an 2 node OVirt cluster - OVirt Engine Version: 3.5.4.2-1.fc20 (Upstream Redhat=20 Enterprise Virtualization - RHEV)<br class=3D"">It also includes a = ManageIQ (Upstream Cloudforms) instance to manage / orchestrate the = OVirt environment.<br class=3D"">Also a Fedora 22 Desktop to use as = local management / jumpbox to your environment.</p><p class=3D""><strong = class=3D"">1 OVirt Manager - oVirt Engine Version: 3.5.4.2-1.fc20<br = class=3D"">2 OVirt KVM Hypervisors<br class=3D"">1 Ovirt hypervisor = template (used to add more nodes to your cluster)<br class=3D"">1 = FreeNAS appliance to offer up shared NFS storage for the cluster<br = class=3D"">1 ManageIQ appliance<br class=3D"">1 Fedora 22 Desktop = (Configured with VNC access)</strong></p><p class=3D""><strong = class=3D""><a target=3D"_blank" rel=3D"nofollow" = href=3D"http://www.ovirt.org/Home" = class=3D"">http://www.ovirt.org/Home</a><br class=3D""><a = target=3D"_blank" rel=3D"nofollow" href=3D"http://manageiq.org/" = class=3D"">http://manageiq.org/</a><br class=3D""><a target=3D"_blank" = rel=3D"nofollow" href=3D"http://www.freenas.org/" = class=3D"">http://www.freenas.org/</a></strong></p><p class=3D"">Give = the services and webui a few minutes to start up after booting.<br = class=3D"">You will find id/password information in the description of = each virtual machine in the Ravello UI. Some vm's required a = sshkey.</p><p class=3D"">If you want build you own hypervisor nodes - = see some additional=20 steps below (this has already been done on the template I provided).</p> <hr class=3D""><p class=3D"">There=E2=80=99s 2 small issues that you = need to work around:</p><p class=3D"">Due to an issue with the the = user-mode CPU detection in libvirt, this patch needs to be applied to /usr/share/libvirt/cpu_map.xml. This patch forces the CPU type to be an Opteron G2 independent of the CPUID. To=20 apply this patch, log on to the hypervisor in rescue mode, apply the=20 patch, and then issue the command =E2=80=9Cpersist=20 /usr/share/libvirt/cpu_map.xml=E2=80=9D. This needs to be done after = step 8=20 above.</p><p class=3D""><a target=3D"_blank" rel=3D"nofollow" = href=3D"https://gist.github.com/geertj/56425d0fdc7c54d4bc9f" = class=3D"">https://gist.github.com/geertj/56425d0fdc7c54d4bc9f</a></p><p = class=3D"">Here is a good blog that details a similar setup for = RHEV.</p> <h5 = id=3D"-a-target-_blank-rel-nofollow-href-https-www-ravellosystems-com-blog= -run-red-hat-enterprise-virtualization-kvm-ec2-https-www-ravellosystems-co= m-blog-run-red-hat-enterprise-virtualization-kvm-ec2-a-" class=3D""><a = target=3D"_blank" rel=3D"nofollow" = href=3D"https://www.ravellosystems.com/blog/run-red-hat-enterprise-virtual= ization-kvm-ec2/" = class=3D"">https://www.ravellosystems.com/blog/run-red-hat-enterprise-virt= ualization-kvm-ec2/</a></h5><p class=3D"">Feel free to ask questions or = provide comments</p><p class=3D"">Have fun<br class=3D"">Kyle</p><div = class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div> </div></div></div></body></html>= --Apple-Mail=_886616C5-AEAD-4497-909B-866BF9541A33--

On Fri, Nov 06, 2015 at 02:37:56PM -0500, Jason Keltz wrote:
On 11/06/2015 02:02 PM, Simone Tiraboschi wrote:
On Fri, Nov 6, 2015 at 5:21 PM, Jason Keltz <jas@cse.yorku.ca <mailto:jas@cse.yorku.ca>> wrote:
Hi.
Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly. Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!
2015-11-06 10:09:16,065 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1). 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1). 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1). 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1). 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).
The complete engine log, virt1, virt2, and virt3 vdsm logs are here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015 <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/11062015>
Is vdsmd service still active on that hosts?
Hi Simone..
Yes..
It seems that 3.6's vdsm-4.17.10.1 cannot consume a Random Number Generator device that was created on 3.5. Please open a bug on this - it should be fixed ASAP. Thread-156::DEBUG::2015-11-06 10:43:19,134::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.migrationCreate' in bridge with [u'77766cb4-0625-4432-a16f-def5e702102a', {u'guestFQDN': u'', u'acpiEnable': u'true', u'emulatedMachine': u'rhel6.5.0', u'afterMigrationStatus': u'', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', u'vmId': u'77766cb4-0625-4432-a16f-def5e702102a', u'memGuaranteedSize': 666, u'transparentHugePages': u'true', u'displaySecurePort': u'5905', u'spiceSslCipherSuite': u'DEFAULT', u'cpuType': u'SandyBridge', u'smp': u'1', u'migrationDest': u'libvirt', u'custom': {u'device_652f5624-9607-43b3-9cb1-d4b60145858cdevice_f6cf6e79-c87b-4859-ac35-90f21e402d06': u'VmDevice {vmId=77766cb4-0625-4432-a16f-def5e702102a, deviceId=f6cf6e79-c87b-4859-ac35-90f21e402d06, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=1}, managed=false, plugged=true, readOnly=false, deviceAlias=channel0, customProperties={}, snapshotId=null, logicalName=null}', u'device_652f5624-9607-43b3-9cb1-d4b60145858c': u'VmDevice {vmId=77766cb4-0625-4432-a16f-def5e702102a, deviceId=652f5624-9607-43b3-9cb1-d4b60145858c, device=ide, type=CONTROLLER, bootOrder=0, specParams={}, address={slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}, managed=false, plugged=true, readOnly=false, deviceAlias=ide0, customProperties={}, snapshotId=null, logicalName=null}', u'device_652f5624-9607-43b3-9cb1-d4b60145858cdevice_f6cf6e79-c87b-4859-ac35-90f21e402d06device_ebc927d9-a3b8-48db-b7d4-2bc05463debcdevice_3f79972b-2fd4-468e-abde-8b448c2121c4': u'VmDevice {vmId=77766cb4-0625-4432-a16f-def5e702102a, deviceId=3f79972b-2fd4-468e-abde-8b448c2121c4, device=spicevmc, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=3}, managed=false, plugged=true, readOnly=false, deviceAlias=channel2, customProperties={}, snapshotId=null, logicalName=null}', u'device_652f5624-9607-43b3-9cb1-d4b60145858cdevice_f6cf6e79-c87b-4859-ac35-90f21e402d06device_ebc927d9-a3b8-48db-b7d4-2bc05463debc': u'VmDevice {vmId=77766cb4-0625-4432-a16f-def5e702102a, deviceId=ebc927d9-a3b8-48db-b7d4-2bc05463debc, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=2}, managed=false, plugged=true, readOnly=false, deviceAlias=channel1, customProperties={}, snapshotId=null, logicalName=null}'}, u'vmType': u'kvm', u'_srcDomXML': u"<domain type='kvm' id='25'>\n <name>labtesteval</name>\n <uuid>77766cb4-0625-4432-a16f-def5e702102a</uuid>\n <memory unit='KiB'>1024000</memory>\n <currentMemory unit='KiB'>1024000</currentMemory>\n <vcpu placement='static' current='1'>16</vcpu>\n <cputune>\n <shares>1020</shares>\n </cputune>\n <resource>\n <partition>/machine</partition>\n </resource>\n <sysinfo type='smbios'>\n <system>\n <entry name='manufacturer'>oVirt</entry>\n <entry name='product'>oVirt Node</entry>\n <entry name='version'>7-1.1503.el7.centos.2.8</entry>\n <entry name='serial'>4C4C4544-0034-4E10-8031-B6C04F445A31</entry>\n <entry name='uuid'>77766cb4-0625-4432-a16f-def5e702102a</entry>\n </system>\n </sysinfo>\n <os>\n <type arch='x86_64' machine='rhel6.5.0'>hvm</type>\n <smbios mode='sysinfo'/>\n </os>\n <features>\n <acpi/>\n </features>\n <cpu mode='custom' match='exact'>\n <model fallback='allow'>SandyBridge</model>\n <topology sockets='16' cores='1' threads='1'/>\n </cpu>\n <clock offset='variable' adjustment='-18000' basis='utc'>\n <timer name='rtc' tickpolicy='catchup'/>\n <timer name='pit' tickpolicy='delay'/>\n <timer name='hpet' present='no'/>\n </clock>\n <on_poweroff>destroy</on_poweroff>\n <on_reboot>restart</on_reboot>\n <on_crash>destroy</on_crash>\n <devices>\n <emulator>/usr/libexec/qemu-kvm</emulator>\n <disk type='file' device='cdrom'>\n <driver name='qemu' type='raw'/>\n <source startupPolicy='optional'/>\n <backingStore/>\n <target dev='hdc' bus='ide'/>\n <readonly/>\n <serial></serial>\n <alias name='ide0-1-0'/>\n <address type='drive' controller='0' bus='1' target='0' unit='0'/>\n </disk>\n <disk type='file' device='disk' snapshot='no'>\n <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>\n <source file='/rhev/data-center/aca61eac-ed8a-484b-bbee-3645148f6405/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/15552565-cde9-46f1-b5fa-988aa0e5977f/c33da882-d20e-43b8-8aee-ea8ddcd1f3a0'>\n <seclabel model='selinux' labelskip='yes'/>\n </source>\n <backingStore/>\n <target dev='vda' bus='virtio'/>\n <serial>15552565-cde9-46f1-b5fa-988aa0e5977f</serial>\n <boot order='1'/>\n <alias name='virtio-disk0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>\n </disk>\n <disk type='file' device='disk' snapshot='no'>\n <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>\n <source file='/rhev/data-center/aca61eac-ed8a-484b-bbee-3645148f6405/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/663cbb1a-95a2-445d-b32c-1e34299ea6e3/755ab7bb-dbe8-47be-b663-a47fc4b3c571'>\n <seclabel model='selinux' labelskip='yes'/>\n </source>\n <backingStore/>\n <target dev='vdb' bus='virtio'/>\n <serial>663cbb1a-95a2-445d-b32c-1e34299ea6e3</serial>\n <alias name='virtio-disk1'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>\n </disk>\n <controller type='scsi' index='0' model='virtio-scsi'>\n <alias name='scsi0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>\n </controller>\n <controller type='virtio-serial' index='0' ports='16'>\n <alias name='virtio-serial0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>\n </controller>\n <controller type='ide' index='0'>\n <alias name='ide0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>\n </controller>\n <controller type='usb' index='0'>\n <alias name='usb0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>\n </controller>\n <controller type='pci' index='0' model='pci-root'>\n <alias name='pci.0'/>\n </controller>\n <interface type='bridge'>\n <mac address='00:1a:4a:32:3d:21'/>\n <source bridge='yorku'/>\n <bandwidth>\n </bandwidth>\n <target dev='vnet10'/>\n <model type='virtio'/>\n <filterref filter='vdsm-no-mac-spoofing'/>\n <link state='up'/>\n <alias name='net0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>\n </interface>\n <interface type='bridge'>\n <mac address='00:1a:4a:32:3d:22'/>\n <source bridge='mrpriv'/>\n <bandwidth>\n </bandwidth>\n <target dev='vnet11'/>\n <model type='virtio'/>\n <filterref filter='vdsm-no-mac-spoofing'/>\n <link state='up'/>\n <alias name='net1'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>\n </interface>\n <channel type='unix'>\n <source mode='bind' path='/var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.com.redhat.rhevm.vdsm'/>\n <target type='virtio' name='com.redhat.rhevm.vdsm' state='disconnected'/>\n <alias name='channel0'/>\n <address type='virtio-serial' controller='0' bus='0' port='1'/>\n </channel>\n <channel type='unix'>\n <source mode='bind' path='/var/lib/libvirt/qemu/channels/77766cb4-0625-4432-a16f-def5e702102a.org.qemu.guest_agent.0'/>\n <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>\n <alias name='channel1'/>\n <address type='virtio-serial' controller='0' bus='0' port='2'/>\n </channel>\n <channel type='spicevmc'>\n <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>\n <alias name='channel2'/>\n <address type='virtio-serial' controller='0' bus='0' port='3'/>\n </channel>\n <input type='mouse' bus='ps2'/>\n <input type='keyboard' bus='ps2'/>\n <graphics type='spice' tlsPort='5905' autoport='yes' keymap='en-us' passwdValidTo='2015-11-03T00:37:31' connected='disconnect'>\n <listen type='network' address='130.63.94.34' network='vdsm-yorku'/>\n <channel name='main' mode='secure'/>\n <channel name='display' mode='secure'/>\n <channel name='inputs' mode='secure'/>\n <channel name='cursor' mode='secure'/>\n <channel name='playback' mode='secure'/>\n <channel name='record' mode='secure'/>\n <channel name='smartcard' mode='secure'/>\n <channel name='usbredir' mode='secure'/>\n </graphics>\n <video>\n <model type='qxl' ram='65536' vram='32768' vgamem='16384' heads='1'/>\n <alias name='video0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>\n </video>\n <memballoon model='virtio'>\n <alias name='balloon0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>\n </memballoon>\n <rng model='virtio'>\n <backend model='random'>/dev/random</backend>\n <alias name='rng0'/>\n <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>\n </rng>\n </devices>\n <seclabel type='dynamic' model='selinux' relabel='yes'>\n <label>system_u:system_r:svirt_t:s0:c91,c286</label>\n <imagelabel>system_u:object_r:svirt_image_t:s0:c91,c286</imagelabel>\n </seclabel>\n</domain>\n", u'memSize': 1000, u'smpCoresPerSocket': u'1', u'vmName': u'labtesteval', u'nice': u'0', u'username': u'Unknown', u'bootMenuEnable': u'false', u'guestDiskMapping': {}, u'copyPasteEnable': u'true', u'displayIp': u'130.63.94.34', u'keyboardLayout': u'en-us', u'displayPort': u'-1', u'smartcardEnable': u'false', u'clientIp': u'', u'fileTransferEnable': u'true', u'nicModel': u'rtl8139,pv', u'elapsedTimeOffset': 308058.1552519798, u'kvmEnable': u'true', u'displayNetwork': u'yorku', u'devices': [{u'target': 1024000, u'specParams': {u'model': u'virtio'}, u'alias': u'balloon0', u'deviceId': u'f8d2d555-df6b-4ce1-ac08-18eee4e74845', u'address': {u'slot': u'0x09', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'memballoon', u'type': u'balloon'}, {u'alias': u'rng0', u'specParams': {u'source': u'random'}, u'deviceId': u'eb6d7c3a-c1ee-426b-a719-a949621808cc', u'address': {u'slot': u'0x0a', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'model': u'virtio', u'type': u'rng'}, {u'device': u'', u'alias': u'rng0', u'type': u'rng', u'address': {u'slot': u'0x0a', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}, {u'device': u'unix', u'alias': u'channel0', u'type': u'channel', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}, {u'device': u'unix', u'alias': u'channel1', u'type': u'channel', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}, {u'device': u'spicevmc', u'alias': u'channel2', u'type': u'channel', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'3'}}, {u'index': u'0', u'alias': u'scsi0', u'specParams': {}, u'deviceId': u'3beb42a9-69b5-454d-b9f8-326509d23984', u'address': {u'slot': u'0x05', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'specParams': {}, u'alias': u'virtio-serial0', u'deviceId': u'dee9b89f-5518-4349-b885-918774e6eb3e', u'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'virtio-serial', u'type': u'controller'}, {u'device': u'usb', u'alias': u'usb0', u'type': u'controller', u'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x2'}}, {u'device': u'ide', u'alias': u'ide0', u'type': u'controller', u'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x1'}}, {u'specParams': {u'vram': u'32768', u'heads': u'1'}, u'alias': u'video0', u'deviceId': u'f54f3bd7-2a59-4c25-90ed-30b40621e28a', u'address': {u'slot': u'0x02', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'qxl', u'type': u'video'}, {u'device': u'spice', u'specParams': {u'copyPasteEnable': u'true', u'displayNetwork': u'yorku', u'keyMap': u'en-us', u'displayIp': u'130.63.94.34', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard'}, u'type': u'graphics', u'tlsPort': u'5905'}, {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:32:3d:21', u'linkActive': True, u'network': u'yorku', u'specParams': {u'inbound': {}, u'outbound': {}}, u'filter': u'vdsm-no-mac-spoofing', u'alias': u'net0', u'deviceId': u'aed60117-08a1-4ba3-a7fd-e1b7b57202be', u'address': {u'slot': u'0x03', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'bridge', u'type': u'interface', u'name': u'vnet10'}, {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:32:3d:22', u'linkActive': True, u'network': u'mrpriv', u'specParams': {u'inbound': {}, u'outbound': {}}, u'filter': u'vdsm-no-mac-spoofing', u'alias': u'net1', u'deviceId': u'72f26169-9052-4d52-8d95-c1c084bd4916', u'address': {u'slot': u'0x04', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'bridge', u'type': u'interface', u'name': u'vnet11'}, {u'index': u'2', u'iface': u'ide', u'name': u'hdc', u'alias': u'ide0-1-0', u'specParams': {u'path': u''}, u'readonly': u'True', u'deviceId': u'f2509c75-eb84-4e89-b6c3-83a26a65cc6a', u'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'cdrom', u'shared': u'false', u'path': u'', u'type': u'disk'}, {u'poolID': u'aca61eac-ed8a-484b-bbee-3645148f6405', u'volumeInfo': {u'domainID': u'c8653dba-63ec-4447-872c-0ffcc6af9c0f', u'volType': u'path', u'leaseOffset': 0, u'volumeID': u'c33da882-d20e-43b8-8aee-ea8ddcd1f3a0', u'leasePath': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/15552565-cde9-46f1-b5fa-988aa0e5977f/c33da882-d20e-43b8-8aee-ea8ddcd1f3a0.lease', u'imageID': u'15552565-cde9-46f1-b5fa-988aa0e5977f', u'path': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/15552565-cde9-46f1-b5fa-988aa0e5977f/c33da882-d20e-43b8-8aee-ea8ddcd1f3a0'}, u'index': 0, u'iface': u'virtio', u'apparentsize': u'42949672960', u'specParams': {}, u'imageID': u'15552565-cde9-46f1-b5fa-988aa0e5977f', u'readonly': u'False', u'shared': u'false', u'truesize': u'14287695872', u'type': u'disk', u'domainID': u'c8653dba-63ec-4447-872c-0ffcc6af9c0f', u'reqsize': u'0', u'format': u'raw', u'deviceId': u'15552565-cde9-46f1-b5fa-988aa0e5977f', u'address': {u'slot': u'0x07', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'disk', u'path': u'/rhev/data-center/aca61eac-ed8a-484b-bbee-3645148f6405/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/15552565-cde9-46f1-b5fa-988aa0e5977f/c33da882-d20e-43b8-8aee-ea8ddcd1f3a0', u'propagateErrors': u'off', u'optional': u'false', u'name': u'vda', u'bootOrder': u'1', u'volumeID': u'c33da882-d20e-43b8-8aee-ea8ddcd1f3a0', u'alias': u'virtio-disk0', u'volumeChain': [{u'domainID': u'c8653dba-63ec-4447-872c-0ffcc6af9c0f', u'volType': u'path', u'leaseOffset': 0, u'volumeID': u'c33da882-d20e-43b8-8aee-ea8ddcd1f3a0', u'leasePath': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/15552565-cde9-46f1-b5fa-988aa0e5977f/c33da882-d20e-43b8-8aee-ea8ddcd1f3a0.lease', u'imageID': u'15552565-cde9-46f1-b5fa-988aa0e5977f', u'path': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/15552565-cde9-46f1-b5fa-988aa0e5977f/c33da882-d20e-43b8-8aee-ea8ddcd1f3a0'}]}, {u'poolID': u'aca61eac-ed8a-484b-bbee-3645148f6405', u'reqsize': u'0', u'index': u'1', u'iface': u'virtio', u'apparentsize': u'21474836480', u'specParams': {}, u'imageID': u'663cbb1a-95a2-445d-b32c-1e34299ea6e3', u'readonly': u'False', u'shared': u'false', u'truesize': u'476389376', u'type': u'disk', u'domainID': u'c8653dba-63ec-4447-872c-0ffcc6af9c0f', u'volumeInfo': {u'domainID': u'c8653dba-63ec-4447-872c-0ffcc6af9c0f', u'volType': u'path', u'leaseOffset': 0, u'volumeID': u'755ab7bb-dbe8-47be-b663-a47fc4b3c571', u'leasePath': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/663cbb1a-95a2-445d-b32c-1e34299ea6e3/755ab7bb-dbe8-47be-b663-a47fc4b3c571.lease', u'imageID': u'663cbb1a-95a2-445d-b32c-1e34299ea6e3', u'path': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/663cbb1a-95a2-445d-b32c-1e34299ea6e3/755ab7bb-dbe8-47be-b663-a47fc4b3c571'}, u'format': u'raw', u'deviceId': u'663cbb1a-95a2-445d-b32c-1e34299ea6e3', u'address': {u'slot': u'0x08', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'disk', u'path': u'/rhev/data-center/aca61eac-ed8a-484b-bbee-3645148f6405/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/663cbb1a-95a2-445d-b32c-1e34299ea6e3/755ab7bb-dbe8-47be-b663-a47fc4b3c571', u'propagateErrors': u'off', u'optional': u'false', u'name': u'vdb', u'volumeID': u'755ab7bb-dbe8-47be-b663-a47fc4b3c571', u'alias': u'virtio-disk1', u'volumeChain': [{u'domainID': u'c8653dba-63ec-4447-872c-0ffcc6af9c0f', u'volType': u'path', u'leaseOffset': 0, u'volumeID': u'755ab7bb-dbe8-47be-b663-a47fc4b3c571', u'leasePath': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/663cbb1a-95a2-445d-b32c-1e34299ea6e3/755ab7bb-dbe8-47be-b663-a47fc4b3c571.lease', u'imageID': u'663cbb1a-95a2-445d-b32c-1e34299ea6e3', u'path': u'/rhev/data-center/mnt/virtfs-fs:_nfs_data/c8653dba-63ec-4447-872c-0ffcc6af9c0f/images/663cbb1a-95a2-445d-b32c-1e34299ea6e3/755ab7bb-dbe8-47be-b663-a47fc4b3c571'}]}], u'status': u'Up', u'timeOffset': u'-18000', u'maxVCpus': u'16', u'guestIPs': u'', u'statusTime': u'9387795610', u'display': u'qxl'}] Thread-157::ERROR::2015-11-06 10:43:54,839::vm::751::virt.vm::(_startUnderlyingVm) vmId=`77766cb4-0625-4432-a16f-def5e702102a`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 714, in _startUnderlyingVm self._completeIncomingMigration() File "/usr/share/vdsm/virt/vm.py", line 2762, in _completeIncomingMigration self._domDependentInit() File "/usr/share/vdsm/virt/vm.py", line 1733, in _domDependentInit self._getUnderlyingVmDevicesInfo() File "/usr/share/vdsm/virt/vm.py", line 1680, in _getUnderlyingVmDevicesInfo self._getUnderlyingRngDeviceInfo() File "/usr/share/vdsm/virt/vm.py", line 4064, in _getUnderlyingRngDeviceInfo if caps.RNG_SOURCES[dev.specParams['source']] == source and \ KeyError: 'source'

Hi! I'm trying to reproduce your issue. Can you help me with the exact scenario? 1. You had 3.5 running. What version of VDSM was on the hosts? 2. You replaced the engine and restarted it. Now it is 3.6, right? 3. You put a host into maintenance. Failure occured when VMs were migrating from it? Or you put the host into maintenance, replaced VDSM on it and failure occured when VMs were migrating to it from other hosts? Shmuel On Fri, Nov 6, 2015 at 6:21 PM, Jason Keltz <jas@cse.yorku.ca> wrote:
Hi.
Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly. Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!
2015-11-06 10:09:16,065 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1). 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1). 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1). 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1). 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).
The complete engine log, virt1, virt2, and virt3 vdsm logs are here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015
Jason.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------000709050608050206080203 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi Shmuel, Thanks very much for looking into my problem! I installed 3.6 on the engine. I rebooted the engine. The 3 hosts were still running vdsm from 3.5. I checked back in the yum log, and it was 4.16.26-0.el7. On the first host upgrade (virt1), I made a mistake. After bringing in the 3.6 repo, I upgraded the packages with just "yum update". However, I know that I should have put the host into maintenance mode first. After the updates installed, I put the host into maintenance mode, and it migrated the VMs off, during which I saw more than one failed VM migration. I'm willing to accept the failures there because I should have put the host into maintenance mode first. Live and learn! I had two other hosts to do this right. For virt2, and virt3, I put the hosts into maintenance mode first. However, the same problem occurred with failed migrations. I proceeded anyway, brought the failed VMs back up elsewhere, applied the updates, and rebooted the hosts. So now, 3.6 is installed on the engine and the 3 hosts, and they are all rebooted. I tried another migration, and again, there were failures, so this isn't specifically related to just 3.6. By the way, I'm using ovirtmgmt for migrations. virt1, virt2, and virt3 have a dedicated 10G link via Intel X540 to a 10G switch. engine is on that network as well, but it's a 1G link. I was able to run iperf tests between the nodes, and saw nearly 10G speed. During the failed migrations, I also don't have any problem with ovirtmgmt, so I don't think the network is an issue... I found this bug in bugzilla over the weekend: https://bugzilla.redhat.com/show_bug.cgi?id=1142776 I was nearly positive that this had something to do with the failed migrations. As a final test, I decided to migrate the VMs from one host to another, one at a time. I was nearly done migrating all the VMs from virt3 to virt1. I had migrated 5 VMs all successfully, one at a time, without any failures. When I migrated the 6th, boom - it didn't migrate, and the VM was down. It was a pretty basic VM as well, with very little traffic. I included on the bug report above an additional link with the engine, virt1, virt2, and virt3 logs for Saturday where I was doing this experimentation because there's a couple more failures recorded. I'll include that link here: http://www.eecs.yorku.ca/~jas/ovirt-debug/11072015 The last VM that I attempted to transfer one at a time was "webapp". It was transferred from virt3 to virt1. I'm really puzzled that more people haven't experienced this issue. I've disabled the load balancing feature because I'm really concerned that if it load balances my VMs, then they might not come back up! I don't *think* this was happening when I was all purely 3.5, but I can't remember doing big migrations. I most certainly was able to put a host into maintenance mode without having VMs go down! In another email, Dan Kenisberg says that "It seems that 3.6's vdsm-4.17.10.1 cannot consume a Random Number Generator device that was created on 3.5.". Thanks also to Dan for looking into that as well! I'm still waiting for more details though before opening additional bug reports because this puzzles me... if this were the case, then ALL of the VMs were created on 3.5, and ALL with random number generator device, and all would fail migration, but they don't. I have a feeling that there are a few issues at play here. Jason. On 11/09/2015 11:13 AM, Shmuel Melamud wrote:
Hi!
I'm trying to reproduce your issue. Can you help me with the exact scenario?
1. You had 3.5 running. What version of VDSM was on the hosts? 2. You replaced the engine and restarted it. Now it is 3.6, right? 3. You put a host into maintenance. Failure occured when VMs were migrating from it? Or you put the host into maintenance, replaced VDSM on it and failure occured when VMs were migrating to it from other hosts?
Shmuel
On Fri, Nov 6, 2015 at 6:21 PM, Jason Keltz <jas@cse.yorku.ca <mailto:jas@cse.yorku.ca>> wrote:
Hi.
Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly. Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!
2015-11-06 10:09:16,065 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1). 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1). 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1). 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1). 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).
The complete engine log, virt1, virt2, and virt3 vdsm logs are here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015 <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/11062015>
Jason.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
--------------000709050608050206080203 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> Hi Shmuel,<br> <br> Thanks very much for looking into my problem! <br> <br> I installed 3.6 on the engine. I rebooted the engine. <br> The 3 hosts were still running vdsm from 3.5. I checked back in the yum log, and it was 4.16.26-0.el7.<br> On the first host upgrade (virt1), I made a mistake. After bringing in the 3.6 repo, I upgraded the packages with just "yum update". However, I know that I should have put the host into maintenance mode first. After the updates installed, I put the host into maintenance mode, and it migrated the VMs off, during which I saw more than one failed VM migration. <br> I'm willing to accept the failures there because I should have put the host into maintenance mode first. Live and learn!<br> I had two other hosts to do this right. For virt2, and virt3, I put the hosts into maintenance mode first. However, the same problem occurred with failed migrations. I proceeded anyway, brought the failed VMs back up elsewhere, applied the updates, and rebooted the hosts.<br> So now, 3.6 is installed on the engine and the 3 hosts, and they are all rebooted.<br> I tried another migration, and again, there were failures, so this isn't specifically related to just 3.6.<br> By the way, I'm using ovirtmgmt for migrations. virt1, virt2, and virt3 have a dedicated 10G link via Intel X540 to a 10G switch. engine is on that network as well, but it's a 1G link.<br> I was able to run iperf tests between the nodes, and saw nearly 10G speed. During the failed migrations, I also don't have any problem with ovirtmgmt, so I don't think the network is an issue...<br> <br> I found this bug in bugzilla over the weekend:<br> <br> <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1142776">https://bugzilla.redhat.com/show_bug.cgi?id=1142776</a><br> <br> I was nearly positive that this had something to do with the failed migrations. As a final test, I decided to migrate the VMs from one host to another, one at a time. I was nearly done migrating all the VMs from virt3 to virt1. I had migrated 5 VMs all successfully, one at a time, without any failures. When I migrated the 6th, boom - it didn't migrate, and the VM was down. It was a pretty basic VM as well, with very little traffic. <br> <br> I included on the bug report above an additional link with the engine, virt1, virt2, and virt3 logs for Saturday where I was doing this experimentation because there's a couple more failures recorded. I'll include that link here:<br> <br> <a class="moz-txt-link-freetext" href="http://www.eecs.yorku.ca/~jas/ovirt-debug/11072015">http://www.eecs.yorku.ca/~jas/ovirt-debug/11072015</a><br> <br> The last VM that I attempted to transfer one at a time was "webapp". It was transferred from virt3 to virt1.<br> <br> I'm really puzzled that more people haven't experienced this issue. I've disabled the load balancing feature because I'm really concerned that if it load balances my VMs, then they might not come back up! I don't *think* this was happening when I was all purely 3.5, but I can't remember doing big migrations. I most certainly was able to put a host into maintenance mode without having VMs go down!<br> <br> In another email, Dan Kenisberg says that "It seems that 3.6's vdsm-4.17.10.1 cannot consume a Random Number Generator device that was created on 3.5.". Thanks also to Dan for looking into that as well! I'm still waiting for more details though before opening additional bug reports because this puzzles me... if this were the case, then ALL of the VMs were created on 3.5, and ALL with random number generator device, and all would fail migration, but they don't. I have a feeling that there are a few issues at play here. <br> <br> Jason.<br> <br> <div class="moz-cite-prefix">On 11/09/2015 11:13 AM, Shmuel Melamud wrote:<br> </div> <blockquote cite="mid:CAMFVLy0t-Nq6EMA_vyA5EXwTpN473aALA1XOKNOV2ctcnvCqAg@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_default" style="font-family:monospace,monospace">Hi!<br> <br> </div> <div class="gmail_default" style="font-family:monospace,monospace">I'm trying to reproduce your issue. Can you help me with the exact scenario?<br> <br> 1. You had 3.5 running. What version of VDSM was on the hosts?<br> 2. You replaced the engine and restarted it. Now it is 3.6, right?<br> </div> <div class="gmail_default" style="font-family:monospace,monospace">3. You put a host into maintenance. Failure occured when VMs were migrating from it? Or you put the host into maintenance, replaced VDSM on it and failure occured when VMs were migrating to it from other hosts?</div> <div class="gmail_default" style="font-family:monospace,monospace"><br> </div> <div class="gmail_default" style="font-family:monospace,monospace">Shmuel<br> </div> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Nov 6, 2015 at 6:21 PM, Jason Keltz <span dir="ltr"><<a moz-do-not-send="true" href="mailto:jas@cse.yorku.ca" target="_blank">jas@cse.yorku.ca</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi.<br> <br> Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly.<br> Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!<br> <br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2015-11-06 10:09:16,065 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1).<br> 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1).<br> 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state.<br> 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state.<br> 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1).<br> 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1).<br> 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).<br> </blockquote> <br> <br> The complete engine log, virt1, virt2, and virt3 vdsm logs are here:<br> <br> <a moz-do-not-send="true" href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/11062015" rel="noreferrer" target="_blank">http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015</a><br> <br> Jason.<br> <br> _______________________________________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> </blockquote> </div> <br> </div> </div> </blockquote> <br> </body> </html> --------------000709050608050206080203--

On 09/11/15 14:00 -0500, Jason Keltz wrote:
Hi Shmuel,
Thanks very much for looking into my problem!
I installed 3.6 on the engine. I rebooted the engine. The 3 hosts were still running vdsm from 3.5. I checked back in the yum log, and it was 4.16.26-0.el7. On the first host upgrade (virt1), I made a mistake. After bringing in the 3.6 repo, I upgraded the packages with just "yum update". However, I know that I should have put the host into maintenance mode first. After the updates installed, I put the host into maintenance mode, and it migrated the VMs off, during which I saw more than one failed VM migration. I'm willing to accept the failures there because I should have put the host into maintenance mode first. Live and learn! I had two other hosts to do this right. For virt2, and virt3, I put the hosts into maintenance mode first. However, the same problem occurred with failed migrations. I proceeded anyway, brought the failed VMs back up elsewhere, applied the updates, and rebooted the hosts. So now, 3.6 is installed on the engine and the 3 hosts, and they are all rebooted. I tried another migration, and again, there were failures, so this isn't specifically related to just 3.6. By the way, I'm using ovirtmgmt for migrations. virt1, virt2, and virt3 have a dedicated 10G link via Intel X540 to a 10G switch. engine is on that network as well, but it's a 1G link. I was able to run iperf tests between the nodes, and saw nearly 10G speed. During the failed migrations, I also don't have any problem with ovirtmgmt, so I don't think the network is an issue...
I found this bug in bugzilla over the weekend:
https://bugzilla.redhat.com/show_bug.cgi?id=1142776
I was nearly positive that this had something to do with the failed migrations. As a final test, I decided to migrate the VMs from one host to another, one at a time. I was nearly done migrating all the VMs from virt3 to virt1. I had migrated 5 VMs all successfully, one at a time, without any failures. When I migrated the 6th, boom - it didn't migrate, and the VM was down. It was a pretty basic VM as well, with very little traffic.
I included on the bug report above an additional link with the engine, virt1, virt2, and virt3 logs for Saturday where I was doing this experimentation because there's a couple more failures recorded. I'll include that link here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/11072015
The last VM that I attempted to transfer one at a time was "webapp". It was transferred from virt3 to virt1.
I'm really puzzled that more people haven't experienced this issue. I've disabled the load balancing feature because I'm really concerned that if it load balances my VMs, then they might not come back up! I don't *think* this was happening when I was all purely 3.5, but I can't remember doing big migrations. I most certainly was able to put a host into maintenance mode without having VMs go down!
In another email, Dan Kenisberg says that "It seems that 3.6's vdsm-4.17.10.1 cannot consume a Random Number Generator device that was created on 3.5.". Thanks also to Dan for looking into that as well! I'm still waiting for more details though before opening additional bug reports because this puzzles me... if this were the case, then ALL of the VMs were created on 3.5, and ALL with random number generator device, and all would fail migration, but they don't. I have a feeling that there are a few issues at play here.
Hello and sorry for dropping in so late. The issue is that 3.5 engine created RNG device without sending the device key (which should've been 'rng' but it wasn't properly documented in the API as fixed in [1]). This caused the getUnderlyingRngDevice method to fail matching the device (fixed in [2]) and it would therefore be treated as unknown device (where the notion of 'source' isn't known). 3.6 engine should handle it correctly [3]. The implication is that when VM is created in 3.5 environment and moved to 3.6 environment, the matching will work but there will be 2 RNG devices for the single one. Same goes for migration. I'm not sure about the fix yet, to rescue the 3.6 VM we would have to remove the duplicate device without specParams (meaning that address would be lost) or remove the original device but adding it's specParams to the new device. A temporary fix would be creating a hook that does this. [1] https://gerrit.ovirt.org/#/c/43166/ [2] https://gerrit.ovirt.org/#/c/40095/ [3] https://gerrit.ovirt.org/#/c/43165/ Regards, mpolednik
Jason.
On 11/09/2015 11:13 AM, Shmuel Melamud wrote:
Hi!
I'm trying to reproduce your issue. Can you help me with the exact scenario?
1. You had 3.5 running. What version of VDSM was on the hosts? 2. You replaced the engine and restarted it. Now it is 3.6, right? 3. You put a host into maintenance. Failure occured when VMs were migrating from it? Or you put the host into maintenance, replaced VDSM on it and failure occured when VMs were migrating to it from other hosts?
Shmuel
On Fri, Nov 6, 2015 at 6:21 PM, Jason Keltz <jas@cse.yorku.ca <mailto:jas@cse.yorku.ca>> wrote:
Hi.
Last night, I upgraded my engine from 3.5 to 3.6. That went flawlessly. Today, I'm trying to upgrade the vdsm on the hosts from 3.5 to 3.6 (along with applying other RHEL7.1 updates) However, when I'm trying to put each host into maintenance mode, and migrations start to occur, they all seem to FAIL now! Even worse, when they fail, it leaves the hosts DOWN! If there's a failure, I'd expect the host to simply abort the migration.... Any help in debugging this would be VERY much appreciated!
2015-11-06 10:09:16,065 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) [] Correlation ID: 658ba478, Job ID: 524e8c44-04e0-42d3-89f9-9f4e4d397583, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: eportfolio, Source: virt1). 2015-11-06 10:10:17,112 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [2f0dee16] Correlation ID: 7da3ac1b, Job ID: 93c0b1f2-4c8e-48cf-9e63-c1ba91be425f, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ftp1, Source: virt1). 2015-11-06 10:15:08,273 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-45) [] Correlation ID: 5394ef76, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:19:13,712 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: 6e422728, Job ID: 994065fc-a142-4821-934a-c2297d86ec12, Call Stack: null, Custom Event ID: -1, Message: Migration failed while Host is in 'preparing for maintenance' state. 2015-11-06 10:42:37,852 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-12) [] Correlation ID: e7f6300, Job ID: 1ea16622-0fa0-4e92-89e5-9dc235c03ef8, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: ipa, Source: virt1). 2015-11-06 10:43:59,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-40) [] Correlation ID: 39cfdf9, Job ID: 72be29bc-a02b-4a90-b5ec-8b995c2fa692, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: labtesteval, Source: virt1). 2015-11-06 10:52:11,893 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-23) [] Correlation ID: 5c435149, Job ID: 1dcd1e14-baa6-44bc-a853-5d33107b759c, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: www-vhost, Source: virt1).
The complete engine log, virt1, virt2, and virt3 vdsm logs are here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/11062015 <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/11062015>
Jason.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

On 11/9/2015 6:20 PM, Martin Polednik wrote:
On 09/11/15 14:00 -0500, Jason Keltz wrote:
Hi Shmuel,
Thanks very much for looking into my problem!
I installed 3.6 on the engine. I rebooted the engine. The 3 hosts were still running vdsm from 3.5. I checked back in the yum log, and it was 4.16.26-0.el7. On the first host upgrade (virt1), I made a mistake. After bringing in the 3.6 repo, I upgraded the packages with just "yum update". However, I know that I should have put the host into maintenance mode first. After the updates installed, I put the host into maintenance mode, and it migrated the VMs off, during which I saw more than one failed VM migration. I'm willing to accept the failures there because I should have put the host into maintenance mode first. Live and learn! I had two other hosts to do this right. For virt2, and virt3, I put the hosts into maintenance mode first. However, the same problem occurred with failed migrations. I proceeded anyway, brought the failed VMs back up elsewhere, applied the updates, and rebooted the hosts. So now, 3.6 is installed on the engine and the 3 hosts, and they are all rebooted. I tried another migration, and again, there were failures, so this isn't specifically related to just 3.6. By the way, I'm using ovirtmgmt for migrations. virt1, virt2, and virt3 have a dedicated 10G link via Intel X540 to a 10G switch. engine is on that network as well, but it's a 1G link. I was able to run iperf tests between the nodes, and saw nearly 10G speed. During the failed migrations, I also don't have any problem with ovirtmgmt, so I don't think the network is an issue...
I found this bug in bugzilla over the weekend:
https://bugzilla.redhat.com/show_bug.cgi?id=1142776
I was nearly positive that this had something to do with the failed migrations. As a final test, I decided to migrate the VMs from one host to another, one at a time. I was nearly done migrating all the VMs from virt3 to virt1. I had migrated 5 VMs all successfully, one at a time, without any failures. When I migrated the 6th, boom - it didn't migrate, and the VM was down. It was a pretty basic VM as well, with very little traffic.
I included on the bug report above an additional link with the engine, virt1, virt2, and virt3 logs for Saturday where I was doing this experimentation because there's a couple more failures recorded. I'll include that link here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/11072015
The last VM that I attempted to transfer one at a time was "webapp". It was transferred from virt3 to virt1.
I'm really puzzled that more people haven't experienced this issue. I've disabled the load balancing feature because I'm really concerned that if it load balances my VMs, then they might not come back up! I don't *think* this was happening when I was all purely 3.5, but I can't remember doing big migrations. I most certainly was able to put a host into maintenance mode without having VMs go down!
In another email, Dan Kenisberg says that "It seems that 3.6's vdsm-4.17.10.1 cannot consume a Random Number Generator device that was created on 3.5.". Thanks also to Dan for looking into that as well! I'm still waiting for more details though before opening additional bug reports because this puzzles me... if this were the case, then ALL of the VMs were created on 3.5, and ALL with random number generator device, and all would fail migration, but they don't. I have a feeling that there are a few issues at play here.
Hello and sorry for dropping in so late.
The issue is that 3.5 engine created RNG device without sending the device key (which should've been 'rng' but it wasn't properly documented in the API as fixed in [1]). This caused the getUnderlyingRngDevice method to fail matching the device (fixed in [2]) and it would therefore be treated as unknown device (where the notion of 'source' isn't known). 3.6 engine should handle it correctly [3].
The implication is that when VM is created in 3.5 environment and moved to 3.6 environment, the matching will work but there will be 2 RNG devices for the single one. Same goes for migration.
I'm not sure about the fix yet, to rescue the 3.6 VM we would have to remove the duplicate device without specParams (meaning that address would be lost) or remove the original device but adding it's specParams to the new device. A temporary fix would be creating a hook that does this.
[1] https://gerrit.ovirt.org/#/c/43166/ [2] https://gerrit.ovirt.org/#/c/40095/ [3] https://gerrit.ovirt.org/#/c/43165/
Martin, Thanks for your message and for looking at the debug logs. What I don't understand is why in my last test case, I was able to transition 5 VMs from one host to another, completely successfully, and only on the 6th, the problem occurred.? Why would this RNG issue not have come up with every single transition? What is it that made it happen on the 6th? I still have a feeling that there is something else at play here as well. All of my VMs were created on 3.5, and all of them have RNG devices. Assuming that you've created a bug entry, can you please give me the bugzilla ID so I can add myself to it? I'm anxious for when you've come up with a patch to fix the existing issue. All of my VMs were created on 3.5, and I don't want to have to hold my breath every time I transition VMs from one host to another. Jason.
participants (6)
-
Dan Kenigsberg
-
Jason Keltz
-
Kyle Bassett
-
Martin Polednik
-
Shmuel Melamud
-
Simone Tiraboschi