On Tue, Oct 6, 2015 at 4:00 PM, Neil <nwilson123(a)gmail.com> wrote:
Hi Nir,
Thank you for coming back to me. I see in the ovirt-engine log the one VM
said it ran out of space, do you think perhaps the SAN itself was over
allocated somehow? Could this cause the issue shown in the logs?
I don't think it can cause sanlock failures, as sanlock write to the
already allocated 1MiB area.
Nir
On Tue, Oct 6, 2015 at 2:24 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
> On Tue, Oct 6, 2015 at 10:18 AM, Neil <nwilson123(a)gmail.com> wrote:
>
>> Hi guys,
>>
>> I had a strange issue on the 3rd of September and I've only got round to
>> checking what caused it now. Basically about 4 or 5 Windows Server VM's got
>> completely corrupted. When I press start I'd just get a blank screen and
>> nothing would display, I tried various things but no matter what I wouldn't
>> even get the Seabios display showing the VM was even posting
>> The remaining 10 VM's were fine, it was just these 4 or 5 that got
>> corrupted and to recover I had to do a full DR restore of the VM's.
>>
>> I'm concerned that the issue might appear again, which is why I'm
>> mailing the list now, does anyone have any clues as to what might have
>> caused this? All logs on the FC SAN were fine and all hosts appeared
>> normal...
>>
>> The following are my versions...
>>
>> CentOS release 6.5 (Final)
>> ovirt-release34-1.0.3-1.noarch
>> ovirt-host-deploy-1.2.3-1.el6.noarch
>> ovirt-engine-lib-3.4.4-1.el6.noarch
>> ovirt-iso-uploader-3.4.4-1.el6.noarch
>> ovirt-engine-cli-3.4.0.5-1.el6.noarch
>> ovirt-engine-setup-base-3.4.4-1.el6.noarch
>> ovirt-engine-websocket-proxy-3.4.4-1.el6.noarch
>> ovirt-engine-backend-3.4.4-1.el6.noarch
>> ovirt-engine-tools-3.4.4-1.el6.noarch
>> ovirt-engine-dbscripts-3.4.4-1.el6.noarch
>> ovirt-engine-3.4.4-1.el6.noarch
>> ovirt-engine-setup-3.4.4-1.el6.noarch
>> ovirt-engine-sdk-python-3.4.4.0-1.el6.noarch
>> ovirt-image-uploader-3.4.3-1.el6.noarch
>> ovirt-host-deploy-java-1.2.3-1.el6.noarch
>> ovirt-engine-setup-plugin-websocket-proxy-3.4.4-1.el6.noarch
>> ovirt-engine-setup-plugin-ovirt-engine-common-3.4.4-1.el6.noarch
>> ovirt-engine-restapi-3.4.4-1.el6.noarch
>> ovirt-engine-userportal-3.4.4-1.el6.noarch
>> ovirt-engine-webadmin-portal-3.4.4-1.el6.noarch
>> ovirt-engine-setup-plugin-ovirt-engine-3.4.4-1.el6.noarch
>>
>> CentOS release 6.5 (Final)
>> vdsm-python-zombiereaper-4.14.11.2-0.el6.noarch
>> vdsm-cli-4.14.11.2-0.el6.noarch
>> vdsm-python-4.14.11.2-0.el6.x86_64
>> vdsm-4.14.11.2-0.el6.x86_64
>> vdsm-xmlrpc-4.14.11.2-0.el6.noarch
>>
>> Below are the sanlock.logs from two of my hosts and attached is my
>> ovirt-engine.log from the date of the issue...
>>
>> Node02
>> 2015-09-03 10:34:53+0200 33184492 [7369]: 0e6991ae aio timeout 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 7
>> 2015-09-03 10:34:53+0200 33184492 [7369]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 10:34:53+0200 33184492 [7369]: s1 renewal error -202
>> delta_length 10 last_success 33184461
>> 2015-09-03 10:35:04+0200 33184503 [7369]: 0e6991ae aio timeout 0
>> 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 ioto 10 to_count 8
>> 2015-09-03 10:35:04+0200 33184503 [7369]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 10:35:04+0200 33184503 [7369]: s1 renewal error -202
>> delta_length 11 last_success 33184461
>> 2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free r
>> 2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0
>> 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 result 1048576:0 match reap
>> 2015-09-03 11:03:00+0200 33186178 [7369]: 0e6991ae aio timeout 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 9
>> 2015-09-03 11:03:00+0200 33186178 [7369]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 11:03:00+0200 33186178 [7369]: s1 renewal error -202
>> delta_length 10 last_success 33186147
>> 2015-09-03 11:03:07+0200 33186185 [7369]: 0e6991ae aio collect 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
>> 2015-09-03 11:10:18+0200 33186616 [7369]: 0e6991ae aio timeout 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 10
>> 2015-09-03 11:10:18+0200 33186616 [7369]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 11:10:18+0200 33186616 [7369]: s1 renewal error -202
>> delta_length 10 last_success 33186586
>> 2015-09-03 11:10:21+0200 33186620 [7369]: 0e6991ae aio collect 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
>> 2015-09-03 12:39:14+0200 33191953 [7369]: 0e6991ae aio timeout 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 11
>> 2015-09-03 12:39:14+0200 33191953 [7369]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 12:39:14+0200 33191953 [7369]: s1 renewal error -202
>> delta_length 10 last_success 33191922
>> 2015-09-03 12:39:19+0200 33191957 [7369]: 0e6991ae aio collect 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
>> 2015-09-03 12:40:10+0200 33192008 [7369]: 0e6991ae aio timeout 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 12
>> 2015-09-03 12:40:10+0200 33192008 [7369]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 12:40:10+0200 33192008 [7369]: s1 renewal error -202
>> delta_length 10 last_success 33191977
>> 2015-09-03 12:40:12+0200 33192011 [7369]: 0e6991ae aio collect 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
>> 2015-09-03 12:43:17+0200 33192196 [7369]: 0e6991ae aio timeout 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 13
>> 2015-09-03 12:43:17+0200 33192196 [7369]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 12:43:17+0200 33192196 [7369]: s1 renewal error -202
>> delta_length 10 last_success 33192165
>> 2015-09-03 12:43:25+0200 33192203 [7369]: 0e6991ae aio collect 0
>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
>> 2015-09-03 13:02:43+0200 33193361 [5807]: cmd 9 target pid 23383 not
>> found
>> 2015-09-03 13:13:24+0200 33194002 [5807]: cmd 9 target pid 24611 not
>> found
>> 2015-09-03 13:35:10+0200 33195308 [5807]: cmd 9 target pid 26392 not
>> found
>> 2015-09-03 13:53:32+0200 33196411 [5807]: cmd 9 target pid 28213 not
>> found
>> 2015-09-03 14:33:42+0200 33198820 [5807]: cmd 9 target pid 30732 not
>> found
>>
>>
>> Node3
>> 2015-09-03 10:34:53+0200 33181297 [7509]: 0e6991ae aio timeout 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 7
>> 2015-09-03 10:34:53+0200 33181297 [7509]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 10:34:53+0200 33181297 [7509]: s1 renewal error -202
>> delta_length 10 last_success 33181266
>> 2015-09-03 10:35:04+0200 33181308 [7509]: 0e6991ae aio timeout 0
>> 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 ioto 10 to_count 8
>> 2015-09-03 10:35:04+0200 33181308 [7509]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 10:35:04+0200 33181308 [7509]: s1 renewal error -202
>> delta_length 11 last_success 33181266
>> 2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free r
>> 2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0
>> 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 result 1048576:0 match reap
>> 2015-09-03 11:03:00+0200 33182983 [7509]: 0e6991ae aio timeout 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 9
>> 2015-09-03 11:03:00+0200 33182983 [7509]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 11:03:00+0200 33182983 [7509]: s1 renewal error -202
>> delta_length 10 last_success 33182953
>> 2015-09-03 11:03:07+0200 33182990 [7509]: 0e6991ae aio collect 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
>> 2015-09-03 11:10:29+0200 33183432 [7509]: s1 renewed 33183417
>> delta_length 21 too long
>> 2015-09-03 12:31:46+0200 33188310 [5666]: cmd 9 target pid 3657 not found
>> 2015-09-03 12:39:14+0200 33188758 [7509]: 0e6991ae aio timeout 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 10
>> 2015-09-03 12:39:14+0200 33188758 [7509]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 12:39:14+0200 33188758 [7509]: s1 renewal error -202
>> delta_length 10 last_success 33188727
>> 2015-09-03 12:39:19+0200 33188762 [7509]: 0e6991ae aio collect 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
>> 2015-09-03 12:40:10+0200 33188813 [7509]: 0e6991ae aio timeout 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 11
>> 2015-09-03 12:40:10+0200 33188813 [7509]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 12:40:10+0200 33188813 [7509]: s1 renewal error -202
>> delta_length 10 last_success 33188783
>> 2015-09-03 12:40:12+0200 33188816 [7509]: 0e6991ae aio collect 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
>> 2015-09-03 12:43:17+0200 33189001 [7509]: 0e6991ae aio timeout 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 12
>> 2015-09-03 12:43:17+0200 33189001 [7509]: s1 delta_renew read rv -202
>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>> 2015-09-03 12:43:17+0200 33189001 [7509]: s1 renewal error -202
>> delta_length 10 last_success 33188970
>> 2015-09-03 12:43:25+0200 33189008 [7509]: 0e6991ae aio collect 0
>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
>> 2015-09-03 12:54:43+0200 33189687 [5666]: cmd 9 target pid 6503 not found
>> 2015-09-03 13:00:01+0200 33190004 [5666]: cmd 9 target pid 7021 not found
>> 2015-09-03 13:01:20+0200 33190083 [5666]: cmd 9 target pid 8009 not found
>> 2015-09-03 13:06:38+0200 33190401 [5666]: cmd 9 target pid 9119 not found
>> 2015-09-03 13:12:31+0200 33190754 [5666]: cmd 9 target pid 10248 not
>> found
>> 2015-09-03 14:03:36+0200 33193819 [5666]: cmd 9 target pid 13381 not
>> found
>> 2015-09-03 14:05:56+0200 33193959 [5666]: cmd 9 target pid 14367 not
>> found
>> 2015-09-03 14:16:02+0200 33194565 [5666]: cmd 9 target pid 15553 not
>> found
>> 2015-09-03 14:17:01+0200 33194624 [5666]: cmd 9 target pid 16385 not
>> found
>> 2015-09-03 14:23:19+0200 33195002 [5666]: cmd 9 target pid 17456 not
>> found
>> 2015-09-03 14:47:25+0200 33196448 [5666]: cmd 9 target pid 20262 not
>> found
>> 2015-09-03 15:02:45+0200 33197368 [5666]: cmd 9 target pid 21619 not
>> found
>> 2015-09-03 15:03:37+0200 33197420 [5666]: cmd 9 target pid 22321 not
>> found
>> 2015-09-03 15:07:43+0200 33197666 [5666]: cmd 9 target pid 23381 not
>> found
>> 2015-09-03 16:33:39+0200 33202822 [5666]: cmd 9 target pid 29063 not
>> found
>> 2015-09-09 11:36:22+0200 33703385 [5666]: cmd 9 target pid 22695 not
>> found
>> 2015-09-09 11:51:15+0200 33704278 [5666]: cmd 9 target pid 24089 not
>> found
>> 2015-09-09 11:58:25+0200 33704709 [5666]: cmd 9 target pid 25110 not
>> found
>> 2015-09-21 09:29:36+0200 34732579 [5666]: cmd 9 target pid 8527 not found
>>
>> Please shout if you need more info, unfortunately because I've left this
>> for so long the logs might have rotated already.
>>
>
> It looks like sanlock had trouble writing and reading to storage (renewal
> errors).
>
> This may be storage hardware issue or qemu issue, we don't have any data
> to tell.
>
> I suggest you open a bug about this and add al the info you can get, such
> as which storage is this, logs on the hosts, logs on the storage server etc.
>
> Nir
>
>