[ovirt-users] Corrupted VM's
Nir Soffer
nsoffer at redhat.com
Tue Oct 6 12:24:08 UTC 2015
On Tue, Oct 6, 2015 at 10:18 AM, Neil <nwilson123 at gmail.com> wrote:
> Hi guys,
>
> I had a strange issue on the 3rd of September and I've only got round to
> checking what caused it now. Basically about 4 or 5 Windows Server VM's got
> completely corrupted. When I press start I'd just get a blank screen and
> nothing would display, I tried various things but no matter what I wouldn't
> even get the Seabios display showing the VM was even posting
> The remaining 10 VM's were fine, it was just these 4 or 5 that got
> corrupted and to recover I had to do a full DR restore of the VM's.
>
> I'm concerned that the issue might appear again, which is why I'm mailing
> the list now, does anyone have any clues as to what might have caused this?
> All logs on the FC SAN were fine and all hosts appeared normal...
>
> The following are my versions...
>
> CentOS release 6.5 (Final)
> ovirt-release34-1.0.3-1.noarch
> ovirt-host-deploy-1.2.3-1.el6.noarch
> ovirt-engine-lib-3.4.4-1.el6.noarch
> ovirt-iso-uploader-3.4.4-1.el6.noarch
> ovirt-engine-cli-3.4.0.5-1.el6.noarch
> ovirt-engine-setup-base-3.4.4-1.el6.noarch
> ovirt-engine-websocket-proxy-3.4.4-1.el6.noarch
> ovirt-engine-backend-3.4.4-1.el6.noarch
> ovirt-engine-tools-3.4.4-1.el6.noarch
> ovirt-engine-dbscripts-3.4.4-1.el6.noarch
> ovirt-engine-3.4.4-1.el6.noarch
> ovirt-engine-setup-3.4.4-1.el6.noarch
> ovirt-engine-sdk-python-3.4.4.0-1.el6.noarch
> ovirt-image-uploader-3.4.3-1.el6.noarch
> ovirt-host-deploy-java-1.2.3-1.el6.noarch
> ovirt-engine-setup-plugin-websocket-proxy-3.4.4-1.el6.noarch
> ovirt-engine-setup-plugin-ovirt-engine-common-3.4.4-1.el6.noarch
> ovirt-engine-restapi-3.4.4-1.el6.noarch
> ovirt-engine-userportal-3.4.4-1.el6.noarch
> ovirt-engine-webadmin-portal-3.4.4-1.el6.noarch
> ovirt-engine-setup-plugin-ovirt-engine-3.4.4-1.el6.noarch
>
> CentOS release 6.5 (Final)
> vdsm-python-zombiereaper-4.14.11.2-0.el6.noarch
> vdsm-cli-4.14.11.2-0.el6.noarch
> vdsm-python-4.14.11.2-0.el6.x86_64
> vdsm-4.14.11.2-0.el6.x86_64
> vdsm-xmlrpc-4.14.11.2-0.el6.noarch
>
> Below are the sanlock.logs from two of my hosts and attached is my
> ovirt-engine.log from the date of the issue...
>
> Node02
> 2015-09-03 10:34:53+0200 33184492 [7369]: 0e6991ae aio timeout 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 7
> 2015-09-03 10:34:53+0200 33184492 [7369]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 10:34:53+0200 33184492 [7369]: s1 renewal error -202
> delta_length 10 last_success 33184461
> 2015-09-03 10:35:04+0200 33184503 [7369]: 0e6991ae aio timeout 0
> 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 ioto 10 to_count 8
> 2015-09-03 10:35:04+0200 33184503 [7369]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 10:35:04+0200 33184503 [7369]: s1 renewal error -202
> delta_length 11 last_success 33184461
> 2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free r
> 2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0
> 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 result 1048576:0 match reap
> 2015-09-03 11:03:00+0200 33186178 [7369]: 0e6991ae aio timeout 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 9
> 2015-09-03 11:03:00+0200 33186178 [7369]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 11:03:00+0200 33186178 [7369]: s1 renewal error -202
> delta_length 10 last_success 33186147
> 2015-09-03 11:03:07+0200 33186185 [7369]: 0e6991ae aio collect 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
> 2015-09-03 11:10:18+0200 33186616 [7369]: 0e6991ae aio timeout 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 10
> 2015-09-03 11:10:18+0200 33186616 [7369]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 11:10:18+0200 33186616 [7369]: s1 renewal error -202
> delta_length 10 last_success 33186586
> 2015-09-03 11:10:21+0200 33186620 [7369]: 0e6991ae aio collect 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
> 2015-09-03 12:39:14+0200 33191953 [7369]: 0e6991ae aio timeout 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 11
> 2015-09-03 12:39:14+0200 33191953 [7369]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 12:39:14+0200 33191953 [7369]: s1 renewal error -202
> delta_length 10 last_success 33191922
> 2015-09-03 12:39:19+0200 33191957 [7369]: 0e6991ae aio collect 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
> 2015-09-03 12:40:10+0200 33192008 [7369]: 0e6991ae aio timeout 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 12
> 2015-09-03 12:40:10+0200 33192008 [7369]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 12:40:10+0200 33192008 [7369]: s1 renewal error -202
> delta_length 10 last_success 33191977
> 2015-09-03 12:40:12+0200 33192011 [7369]: 0e6991ae aio collect 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
> 2015-09-03 12:43:17+0200 33192196 [7369]: 0e6991ae aio timeout 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 13
> 2015-09-03 12:43:17+0200 33192196 [7369]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 12:43:17+0200 33192196 [7369]: s1 renewal error -202
> delta_length 10 last_success 33192165
> 2015-09-03 12:43:25+0200 33192203 [7369]: 0e6991ae aio collect 0
> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
> 2015-09-03 13:02:43+0200 33193361 [5807]: cmd 9 target pid 23383 not found
> 2015-09-03 13:13:24+0200 33194002 [5807]: cmd 9 target pid 24611 not found
> 2015-09-03 13:35:10+0200 33195308 [5807]: cmd 9 target pid 26392 not found
> 2015-09-03 13:53:32+0200 33196411 [5807]: cmd 9 target pid 28213 not found
> 2015-09-03 14:33:42+0200 33198820 [5807]: cmd 9 target pid 30732 not found
>
>
> Node3
> 2015-09-03 10:34:53+0200 33181297 [7509]: 0e6991ae aio timeout 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 7
> 2015-09-03 10:34:53+0200 33181297 [7509]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 10:34:53+0200 33181297 [7509]: s1 renewal error -202
> delta_length 10 last_success 33181266
> 2015-09-03 10:35:04+0200 33181308 [7509]: 0e6991ae aio timeout 0
> 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 ioto 10 to_count 8
> 2015-09-03 10:35:04+0200 33181308 [7509]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 10:35:04+0200 33181308 [7509]: s1 renewal error -202
> delta_length 11 last_success 33181266
> 2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free r
> 2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0
> 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 result 1048576:0 match reap
> 2015-09-03 11:03:00+0200 33182983 [7509]: 0e6991ae aio timeout 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 9
> 2015-09-03 11:03:00+0200 33182983 [7509]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 11:03:00+0200 33182983 [7509]: s1 renewal error -202
> delta_length 10 last_success 33182953
> 2015-09-03 11:03:07+0200 33182990 [7509]: 0e6991ae aio collect 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
> 2015-09-03 11:10:29+0200 33183432 [7509]: s1 renewed 33183417 delta_length
> 21 too long
> 2015-09-03 12:31:46+0200 33188310 [5666]: cmd 9 target pid 3657 not found
> 2015-09-03 12:39:14+0200 33188758 [7509]: 0e6991ae aio timeout 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 10
> 2015-09-03 12:39:14+0200 33188758 [7509]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 12:39:14+0200 33188758 [7509]: s1 renewal error -202
> delta_length 10 last_success 33188727
> 2015-09-03 12:39:19+0200 33188762 [7509]: 0e6991ae aio collect 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
> 2015-09-03 12:40:10+0200 33188813 [7509]: 0e6991ae aio timeout 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 11
> 2015-09-03 12:40:10+0200 33188813 [7509]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 12:40:10+0200 33188813 [7509]: s1 renewal error -202
> delta_length 10 last_success 33188783
> 2015-09-03 12:40:12+0200 33188816 [7509]: 0e6991ae aio collect 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
> 2015-09-03 12:43:17+0200 33189001 [7509]: 0e6991ae aio timeout 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 12
> 2015-09-03 12:43:17+0200 33189001 [7509]: s1 delta_renew read rv -202
> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
> 2015-09-03 12:43:17+0200 33189001 [7509]: s1 renewal error -202
> delta_length 10 last_success 33188970
> 2015-09-03 12:43:25+0200 33189008 [7509]: 0e6991ae aio collect 0
> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
> 2015-09-03 12:54:43+0200 33189687 [5666]: cmd 9 target pid 6503 not found
> 2015-09-03 13:00:01+0200 33190004 [5666]: cmd 9 target pid 7021 not found
> 2015-09-03 13:01:20+0200 33190083 [5666]: cmd 9 target pid 8009 not found
> 2015-09-03 13:06:38+0200 33190401 [5666]: cmd 9 target pid 9119 not found
> 2015-09-03 13:12:31+0200 33190754 [5666]: cmd 9 target pid 10248 not found
> 2015-09-03 14:03:36+0200 33193819 [5666]: cmd 9 target pid 13381 not found
> 2015-09-03 14:05:56+0200 33193959 [5666]: cmd 9 target pid 14367 not found
> 2015-09-03 14:16:02+0200 33194565 [5666]: cmd 9 target pid 15553 not found
> 2015-09-03 14:17:01+0200 33194624 [5666]: cmd 9 target pid 16385 not found
> 2015-09-03 14:23:19+0200 33195002 [5666]: cmd 9 target pid 17456 not found
> 2015-09-03 14:47:25+0200 33196448 [5666]: cmd 9 target pid 20262 not found
> 2015-09-03 15:02:45+0200 33197368 [5666]: cmd 9 target pid 21619 not found
> 2015-09-03 15:03:37+0200 33197420 [5666]: cmd 9 target pid 22321 not found
> 2015-09-03 15:07:43+0200 33197666 [5666]: cmd 9 target pid 23381 not found
> 2015-09-03 16:33:39+0200 33202822 [5666]: cmd 9 target pid 29063 not found
> 2015-09-09 11:36:22+0200 33703385 [5666]: cmd 9 target pid 22695 not found
> 2015-09-09 11:51:15+0200 33704278 [5666]: cmd 9 target pid 24089 not found
> 2015-09-09 11:58:25+0200 33704709 [5666]: cmd 9 target pid 25110 not found
> 2015-09-21 09:29:36+0200 34732579 [5666]: cmd 9 target pid 8527 not found
>
> Please shout if you need more info, unfortunately because I've left this
> for so long the logs might have rotated already.
>
It looks like sanlock had trouble writing and reading to storage (renewal
errors).
This may be storage hardware issue or qemu issue, we don't have any data to
tell.
I suggest you open a bug about this and add al the info you can get, such
as which storage is this, logs on the hosts, logs on the storage server etc.
Nir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20151006/2d373c33/attachment-0001.html>
More information about the Users
mailing list