On Tue, Oct 6, 2015 at 10:18 AM, Neil <nwilson123@gmail.com> wrote:
Hi guys,

I had a strange issue on the 3rd of September and I've only got round to checking what caused it now. Basically about 4 or 5 Windows Server VM's got completely corrupted. When I press start I'd just get a blank screen and nothing would display, I tried various things but no matter what I wouldn't even get the Seabios display showing the VM was even posting
The remaining 10 VM's were fine, it was just these 4 or 5 that got corrupted and to recover I had to do a full DR restore of the VM's.

I'm concerned that the issue might appear again, which is why I'm mailing the list now, does anyone have any clues as to what might have caused this? All logs on the FC SAN were fine and all hosts appeared normal...

The following are my versions...

CentOS release 6.5 (Final)
ovirt-release34-1.0.3-1.noarch
ovirt-host-deploy-1.2.3-1.el6.noarch
ovirt-engine-lib-3.4.4-1.el6.noarch
ovirt-iso-uploader-3.4.4-1.el6.noarch
ovirt-engine-cli-3.4.0.5-1.el6.noarch
ovirt-engine-setup-base-3.4.4-1.el6.noarch
ovirt-engine-websocket-proxy-3.4.4-1.el6.noarch
ovirt-engine-backend-3.4.4-1.el6.noarch
ovirt-engine-tools-3.4.4-1.el6.noarch
ovirt-engine-dbscripts-3.4.4-1.el6.noarch
ovirt-engine-3.4.4-1.el6.noarch
ovirt-engine-setup-3.4.4-1.el6.noarch
ovirt-engine-sdk-python-3.4.4.0-1.el6.noarch
ovirt-image-uploader-3.4.3-1.el6.noarch
ovirt-host-deploy-java-1.2.3-1.el6.noarch
ovirt-engine-setup-plugin-websocket-proxy-3.4.4-1.el6.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-3.4.4-1.el6.noarch
ovirt-engine-restapi-3.4.4-1.el6.noarch
ovirt-engine-userportal-3.4.4-1.el6.noarch
ovirt-engine-webadmin-portal-3.4.4-1.el6.noarch
ovirt-engine-setup-plugin-ovirt-engine-3.4.4-1.el6.noarch

CentOS release 6.5 (Final)
vdsm-python-zombiereaper-4.14.11.2-0.el6.noarch
vdsm-cli-4.14.11.2-0.el6.noarch
vdsm-python-4.14.11.2-0.el6.x86_64
vdsm-4.14.11.2-0.el6.x86_64
vdsm-xmlrpc-4.14.11.2-0.el6.noarch

Below are the sanlock.logs from two of my hosts and attached is my ovirt-engine.log from the date of the issue...

Node02
2015-09-03 10:34:53+0200 33184492 [7369]: 0e6991ae aio timeout 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 7
2015-09-03 10:34:53+0200 33184492 [7369]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 10:34:53+0200 33184492 [7369]: s1 renewal error -202 delta_length 10 last_success 33184461
2015-09-03 10:35:04+0200 33184503 [7369]: 0e6991ae aio timeout 0 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 ioto 10 to_count 8
2015-09-03 10:35:04+0200 33184503 [7369]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 10:35:04+0200 33184503 [7369]: s1 renewal error -202 delta_length 11 last_success 33184461
2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free r
2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 result 1048576:0 match reap
2015-09-03 11:03:00+0200 33186178 [7369]: 0e6991ae aio timeout 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 9
2015-09-03 11:03:00+0200 33186178 [7369]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 11:03:00+0200 33186178 [7369]: s1 renewal error -202 delta_length 10 last_success 33186147
2015-09-03 11:03:07+0200 33186185 [7369]: 0e6991ae aio collect 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
2015-09-03 11:10:18+0200 33186616 [7369]: 0e6991ae aio timeout 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 10
2015-09-03 11:10:18+0200 33186616 [7369]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 11:10:18+0200 33186616 [7369]: s1 renewal error -202 delta_length 10 last_success 33186586
2015-09-03 11:10:21+0200 33186620 [7369]: 0e6991ae aio collect 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
2015-09-03 12:39:14+0200 33191953 [7369]: 0e6991ae aio timeout 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 11
2015-09-03 12:39:14+0200 33191953 [7369]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 12:39:14+0200 33191953 [7369]: s1 renewal error -202 delta_length 10 last_success 33191922
2015-09-03 12:39:19+0200 33191957 [7369]: 0e6991ae aio collect 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
2015-09-03 12:40:10+0200 33192008 [7369]: 0e6991ae aio timeout 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 12
2015-09-03 12:40:10+0200 33192008 [7369]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 12:40:10+0200 33192008 [7369]: s1 renewal error -202 delta_length 10 last_success 33191977
2015-09-03 12:40:12+0200 33192011 [7369]: 0e6991ae aio collect 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
2015-09-03 12:43:17+0200 33192196 [7369]: 0e6991ae aio timeout 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 13
2015-09-03 12:43:17+0200 33192196 [7369]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 12:43:17+0200 33192196 [7369]: s1 renewal error -202 delta_length 10 last_success 33192165
2015-09-03 12:43:25+0200 33192203 [7369]: 0e6991ae aio collect 0 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
2015-09-03 13:02:43+0200 33193361 [5807]: cmd 9 target pid 23383 not found
2015-09-03 13:13:24+0200 33194002 [5807]: cmd 9 target pid 24611 not found
2015-09-03 13:35:10+0200 33195308 [5807]: cmd 9 target pid 26392 not found
2015-09-03 13:53:32+0200 33196411 [5807]: cmd 9 target pid 28213 not found
2015-09-03 14:33:42+0200 33198820 [5807]: cmd 9 target pid 30732 not found


Node3
2015-09-03 10:34:53+0200 33181297 [7509]: 0e6991ae aio timeout 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 7
2015-09-03 10:34:53+0200 33181297 [7509]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 10:34:53+0200 33181297 [7509]: s1 renewal error -202 delta_length 10 last_success 33181266
2015-09-03 10:35:04+0200 33181308 [7509]: 0e6991ae aio timeout 0 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 ioto 10 to_count 8
2015-09-03 10:35:04+0200 33181308 [7509]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 10:35:04+0200 33181308 [7509]: s1 renewal error -202 delta_length 11 last_success 33181266
2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free r
2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 result 1048576:0 match reap
2015-09-03 11:03:00+0200 33182983 [7509]: 0e6991ae aio timeout 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 9
2015-09-03 11:03:00+0200 33182983 [7509]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 11:03:00+0200 33182983 [7509]: s1 renewal error -202 delta_length 10 last_success 33182953
2015-09-03 11:03:07+0200 33182990 [7509]: 0e6991ae aio collect 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
2015-09-03 11:10:29+0200 33183432 [7509]: s1 renewed 33183417 delta_length 21 too long
2015-09-03 12:31:46+0200 33188310 [5666]: cmd 9 target pid 3657 not found
2015-09-03 12:39:14+0200 33188758 [7509]: 0e6991ae aio timeout 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 10
2015-09-03 12:39:14+0200 33188758 [7509]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 12:39:14+0200 33188758 [7509]: s1 renewal error -202 delta_length 10 last_success 33188727
2015-09-03 12:39:19+0200 33188762 [7509]: 0e6991ae aio collect 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
2015-09-03 12:40:10+0200 33188813 [7509]: 0e6991ae aio timeout 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 11
2015-09-03 12:40:10+0200 33188813 [7509]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 12:40:10+0200 33188813 [7509]: s1 renewal error -202 delta_length 10 last_success 33188783
2015-09-03 12:40:12+0200 33188816 [7509]: 0e6991ae aio collect 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
2015-09-03 12:43:17+0200 33189001 [7509]: 0e6991ae aio timeout 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 12
2015-09-03 12:43:17+0200 33189001 [7509]: s1 delta_renew read rv -202 offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
2015-09-03 12:43:17+0200 33189001 [7509]: s1 renewal error -202 delta_length 10 last_success 33188970
2015-09-03 12:43:25+0200 33189008 [7509]: 0e6991ae aio collect 0 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
2015-09-03 12:54:43+0200 33189687 [5666]: cmd 9 target pid 6503 not found
2015-09-03 13:00:01+0200 33190004 [5666]: cmd 9 target pid 7021 not found
2015-09-03 13:01:20+0200 33190083 [5666]: cmd 9 target pid 8009 not found
2015-09-03 13:06:38+0200 33190401 [5666]: cmd 9 target pid 9119 not found
2015-09-03 13:12:31+0200 33190754 [5666]: cmd 9 target pid 10248 not found
2015-09-03 14:03:36+0200 33193819 [5666]: cmd 9 target pid 13381 not found
2015-09-03 14:05:56+0200 33193959 [5666]: cmd 9 target pid 14367 not found
2015-09-03 14:16:02+0200 33194565 [5666]: cmd 9 target pid 15553 not found
2015-09-03 14:17:01+0200 33194624 [5666]: cmd 9 target pid 16385 not found
2015-09-03 14:23:19+0200 33195002 [5666]: cmd 9 target pid 17456 not found
2015-09-03 14:47:25+0200 33196448 [5666]: cmd 9 target pid 20262 not found
2015-09-03 15:02:45+0200 33197368 [5666]: cmd 9 target pid 21619 not found
2015-09-03 15:03:37+0200 33197420 [5666]: cmd 9 target pid 22321 not found
2015-09-03 15:07:43+0200 33197666 [5666]: cmd 9 target pid 23381 not found
2015-09-03 16:33:39+0200 33202822 [5666]: cmd 9 target pid 29063 not found
2015-09-09 11:36:22+0200 33703385 [5666]: cmd 9 target pid 22695 not found
2015-09-09 11:51:15+0200 33704278 [5666]: cmd 9 target pid 24089 not found
2015-09-09 11:58:25+0200 33704709 [5666]: cmd 9 target pid 25110 not found
2015-09-21 09:29:36+0200 34732579 [5666]: cmd 9 target pid 8527 not found

Please shout if you need more info, unfortunately because I've left this for so long the logs might have rotated already.

It looks like sanlock had trouble writing and reading to storage (renewal errors).

This may be storage hardware issue or qemu issue, we don't have any data to tell.

I suggest you open a bug about this and add al the info you can get, such as which storage is this, logs on the hosts, logs on the storage server etc.

Nir