[ovirt-users] Corrupted VM's

Nir Soffer nsoffer at redhat.com
Wed Oct 7 11:12:11 UTC 2015


On Tue, Oct 6, 2015 at 4:00 PM, Neil <nwilson123 at gmail.com> wrote:

> Hi Nir,
>
> Thank you for coming back to me. I see in the ovirt-engine log the one VM
> said it ran out of space, do you think perhaps the SAN itself was over
> allocated somehow? Could this cause the issue shown in the logs?
>

I don't think it can cause sanlock failures, as sanlock write to the
already allocated 1MiB area.

Nir

On Tue, Oct 6, 2015 at 2:24 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>
>> On Tue, Oct 6, 2015 at 10:18 AM, Neil <nwilson123 at gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I had a strange issue on the 3rd of September and I've only got round to
>>> checking what caused it now. Basically about 4 or 5 Windows Server VM's got
>>> completely corrupted. When I press start I'd just get a blank screen and
>>> nothing would display, I tried various things but no matter what I wouldn't
>>> even get the Seabios display showing the VM was even posting
>>> The remaining 10 VM's were fine, it was just these 4 or 5 that got
>>> corrupted and to recover I had to do a full DR restore of the VM's.
>>>
>>> I'm concerned that the issue might appear again, which is why I'm
>>> mailing the list now, does anyone have any clues as to what might have
>>> caused this? All logs on the FC SAN were fine and all hosts appeared
>>> normal...
>>>
>>> The following are my versions...
>>>
>>> CentOS release 6.5 (Final)
>>> ovirt-release34-1.0.3-1.noarch
>>> ovirt-host-deploy-1.2.3-1.el6.noarch
>>> ovirt-engine-lib-3.4.4-1.el6.noarch
>>> ovirt-iso-uploader-3.4.4-1.el6.noarch
>>> ovirt-engine-cli-3.4.0.5-1.el6.noarch
>>> ovirt-engine-setup-base-3.4.4-1.el6.noarch
>>> ovirt-engine-websocket-proxy-3.4.4-1.el6.noarch
>>> ovirt-engine-backend-3.4.4-1.el6.noarch
>>> ovirt-engine-tools-3.4.4-1.el6.noarch
>>> ovirt-engine-dbscripts-3.4.4-1.el6.noarch
>>> ovirt-engine-3.4.4-1.el6.noarch
>>> ovirt-engine-setup-3.4.4-1.el6.noarch
>>> ovirt-engine-sdk-python-3.4.4.0-1.el6.noarch
>>> ovirt-image-uploader-3.4.3-1.el6.noarch
>>> ovirt-host-deploy-java-1.2.3-1.el6.noarch
>>> ovirt-engine-setup-plugin-websocket-proxy-3.4.4-1.el6.noarch
>>> ovirt-engine-setup-plugin-ovirt-engine-common-3.4.4-1.el6.noarch
>>> ovirt-engine-restapi-3.4.4-1.el6.noarch
>>> ovirt-engine-userportal-3.4.4-1.el6.noarch
>>> ovirt-engine-webadmin-portal-3.4.4-1.el6.noarch
>>> ovirt-engine-setup-plugin-ovirt-engine-3.4.4-1.el6.noarch
>>>
>>> CentOS release 6.5 (Final)
>>> vdsm-python-zombiereaper-4.14.11.2-0.el6.noarch
>>> vdsm-cli-4.14.11.2-0.el6.noarch
>>> vdsm-python-4.14.11.2-0.el6.x86_64
>>> vdsm-4.14.11.2-0.el6.x86_64
>>> vdsm-xmlrpc-4.14.11.2-0.el6.noarch
>>>
>>> Below are the sanlock.logs from two of my hosts and attached is my
>>> ovirt-engine.log from the date of the issue...
>>>
>>> Node02
>>> 2015-09-03 10:34:53+0200 33184492 [7369]: 0e6991ae aio timeout 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 7
>>> 2015-09-03 10:34:53+0200 33184492 [7369]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 10:34:53+0200 33184492 [7369]: s1 renewal error -202
>>> delta_length 10 last_success 33184461
>>> 2015-09-03 10:35:04+0200 33184503 [7369]: 0e6991ae aio timeout 0
>>> 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 ioto 10 to_count 8
>>> 2015-09-03 10:35:04+0200 33184503 [7369]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 10:35:04+0200 33184503 [7369]: s1 renewal error -202
>>> delta_length 11 last_success 33184461
>>> 2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free r
>>> 2015-09-03 10:35:05+0200 33184504 [7369]: 0e6991ae aio collect 0
>>> 0x7fbd70000910:0x7fbd70000920:0x7fbd7feff000 result 1048576:0 match reap
>>> 2015-09-03 11:03:00+0200 33186178 [7369]: 0e6991ae aio timeout 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 9
>>> 2015-09-03 11:03:00+0200 33186178 [7369]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 11:03:00+0200 33186178 [7369]: s1 renewal error -202
>>> delta_length 10 last_success 33186147
>>> 2015-09-03 11:03:07+0200 33186185 [7369]: 0e6991ae aio collect 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
>>> 2015-09-03 11:10:18+0200 33186616 [7369]: 0e6991ae aio timeout 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 10
>>> 2015-09-03 11:10:18+0200 33186616 [7369]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 11:10:18+0200 33186616 [7369]: s1 renewal error -202
>>> delta_length 10 last_success 33186586
>>> 2015-09-03 11:10:21+0200 33186620 [7369]: 0e6991ae aio collect 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
>>> 2015-09-03 12:39:14+0200 33191953 [7369]: 0e6991ae aio timeout 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 11
>>> 2015-09-03 12:39:14+0200 33191953 [7369]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 12:39:14+0200 33191953 [7369]: s1 renewal error -202
>>> delta_length 10 last_success 33191922
>>> 2015-09-03 12:39:19+0200 33191957 [7369]: 0e6991ae aio collect 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
>>> 2015-09-03 12:40:10+0200 33192008 [7369]: 0e6991ae aio timeout 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 ioto 10 to_count 12
>>> 2015-09-03 12:40:10+0200 33192008 [7369]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 12:40:10+0200 33192008 [7369]: s1 renewal error -202
>>> delta_length 10 last_success 33191977
>>> 2015-09-03 12:40:12+0200 33192011 [7369]: 0e6991ae aio collect 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd9094b000 result 1048576:0 other free
>>> 2015-09-03 12:43:17+0200 33192196 [7369]: 0e6991ae aio timeout 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 ioto 10 to_count 13
>>> 2015-09-03 12:43:17+0200 33192196 [7369]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 12:43:17+0200 33192196 [7369]: s1 renewal error -202
>>> delta_length 10 last_success 33192165
>>> 2015-09-03 12:43:25+0200 33192203 [7369]: 0e6991ae aio collect 0
>>> 0x7fbd700008c0:0x7fbd700008d0:0x7fbd7feff000 result 1048576:0 other free
>>> 2015-09-03 13:02:43+0200 33193361 [5807]: cmd 9 target pid 23383 not
>>> found
>>> 2015-09-03 13:13:24+0200 33194002 [5807]: cmd 9 target pid 24611 not
>>> found
>>> 2015-09-03 13:35:10+0200 33195308 [5807]: cmd 9 target pid 26392 not
>>> found
>>> 2015-09-03 13:53:32+0200 33196411 [5807]: cmd 9 target pid 28213 not
>>> found
>>> 2015-09-03 14:33:42+0200 33198820 [5807]: cmd 9 target pid 30732 not
>>> found
>>>
>>>
>>> Node3
>>> 2015-09-03 10:34:53+0200 33181297 [7509]: 0e6991ae aio timeout 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 7
>>> 2015-09-03 10:34:53+0200 33181297 [7509]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 10:34:53+0200 33181297 [7509]: s1 renewal error -202
>>> delta_length 10 last_success 33181266
>>> 2015-09-03 10:35:04+0200 33181308 [7509]: 0e6991ae aio timeout 0
>>> 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 ioto 10 to_count 8
>>> 2015-09-03 10:35:04+0200 33181308 [7509]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 10:35:04+0200 33181308 [7509]: s1 renewal error -202
>>> delta_length 11 last_success 33181266
>>> 2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free r
>>> 2015-09-03 10:35:05+0200 33181309 [7509]: 0e6991ae aio collect 0
>>> 0x7f45d0000910:0x7f45d0000920:0x7f45f03c9000 result 1048576:0 match reap
>>> 2015-09-03 11:03:00+0200 33182983 [7509]: 0e6991ae aio timeout 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 9
>>> 2015-09-03 11:03:00+0200 33182983 [7509]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 11:03:00+0200 33182983 [7509]: s1 renewal error -202
>>> delta_length 10 last_success 33182953
>>> 2015-09-03 11:03:07+0200 33182990 [7509]: 0e6991ae aio collect 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
>>> 2015-09-03 11:10:29+0200 33183432 [7509]: s1 renewed 33183417
>>> delta_length 21 too long
>>> 2015-09-03 12:31:46+0200 33188310 [5666]: cmd 9 target pid 3657 not found
>>> 2015-09-03 12:39:14+0200 33188758 [7509]: 0e6991ae aio timeout 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 10
>>> 2015-09-03 12:39:14+0200 33188758 [7509]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 12:39:14+0200 33188758 [7509]: s1 renewal error -202
>>> delta_length 10 last_success 33188727
>>> 2015-09-03 12:39:19+0200 33188762 [7509]: 0e6991ae aio collect 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
>>> 2015-09-03 12:40:10+0200 33188813 [7509]: 0e6991ae aio timeout 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 ioto 10 to_count 11
>>> 2015-09-03 12:40:10+0200 33188813 [7509]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 12:40:10+0200 33188813 [7509]: s1 renewal error -202
>>> delta_length 10 last_success 33188783
>>> 2015-09-03 12:40:12+0200 33188816 [7509]: 0e6991ae aio collect 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45f03c9000 result 1048576:0 other free
>>> 2015-09-03 12:43:17+0200 33189001 [7509]: 0e6991ae aio timeout 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 ioto 10 to_count 12
>>> 2015-09-03 12:43:17+0200 33189001 [7509]: s1 delta_renew read rv -202
>>> offset 0 /dev/0e6991ae-6238-4c61-96d2-ca8fed35161e/ids
>>> 2015-09-03 12:43:17+0200 33189001 [7509]: s1 renewal error -202
>>> delta_length 10 last_success 33188970
>>> 2015-09-03 12:43:25+0200 33189008 [7509]: 0e6991ae aio collect 0
>>> 0x7f45d00008c0:0x7f45d00008d0:0x7f45ec434000 result 1048576:0 other free
>>> 2015-09-03 12:54:43+0200 33189687 [5666]: cmd 9 target pid 6503 not found
>>> 2015-09-03 13:00:01+0200 33190004 [5666]: cmd 9 target pid 7021 not found
>>> 2015-09-03 13:01:20+0200 33190083 [5666]: cmd 9 target pid 8009 not found
>>> 2015-09-03 13:06:38+0200 33190401 [5666]: cmd 9 target pid 9119 not found
>>> 2015-09-03 13:12:31+0200 33190754 [5666]: cmd 9 target pid 10248 not
>>> found
>>> 2015-09-03 14:03:36+0200 33193819 [5666]: cmd 9 target pid 13381 not
>>> found
>>> 2015-09-03 14:05:56+0200 33193959 [5666]: cmd 9 target pid 14367 not
>>> found
>>> 2015-09-03 14:16:02+0200 33194565 [5666]: cmd 9 target pid 15553 not
>>> found
>>> 2015-09-03 14:17:01+0200 33194624 [5666]: cmd 9 target pid 16385 not
>>> found
>>> 2015-09-03 14:23:19+0200 33195002 [5666]: cmd 9 target pid 17456 not
>>> found
>>> 2015-09-03 14:47:25+0200 33196448 [5666]: cmd 9 target pid 20262 not
>>> found
>>> 2015-09-03 15:02:45+0200 33197368 [5666]: cmd 9 target pid 21619 not
>>> found
>>> 2015-09-03 15:03:37+0200 33197420 [5666]: cmd 9 target pid 22321 not
>>> found
>>> 2015-09-03 15:07:43+0200 33197666 [5666]: cmd 9 target pid 23381 not
>>> found
>>> 2015-09-03 16:33:39+0200 33202822 [5666]: cmd 9 target pid 29063 not
>>> found
>>> 2015-09-09 11:36:22+0200 33703385 [5666]: cmd 9 target pid 22695 not
>>> found
>>> 2015-09-09 11:51:15+0200 33704278 [5666]: cmd 9 target pid 24089 not
>>> found
>>> 2015-09-09 11:58:25+0200 33704709 [5666]: cmd 9 target pid 25110 not
>>> found
>>> 2015-09-21 09:29:36+0200 34732579 [5666]: cmd 9 target pid 8527 not found
>>>
>>> Please shout if you need more info, unfortunately because I've left this
>>> for so long the logs might have rotated already.
>>>
>>
>> It looks like sanlock had trouble writing and reading to storage (renewal
>> errors).
>>
>> This may be storage hardware issue or qemu issue, we don't have any data
>> to tell.
>>
>> I suggest you open a bug about this and add al the info you can get, such
>> as which storage is this, logs on the hosts, logs on the storage server etc.
>>
>> Nir
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20151007/f4c09830/attachment-0001.html>


More information about the Users mailing list