On Thu, Apr 21, 2016 at 1:10 AM, Clint Boggio <clint(a)theboggios.com> wrote:
Bug is filed.
[Bug 1329000] Snapshot Images Flagged as "ILLEGAL" After Backup Script Is Run
Thanks
Is there a technique for recovery from this condition or should I back up the data on the
VM's that are afflicted and still running and start over ?
I'm not sure what is the condition.
It may be good chain on host side, with missing volumes on engine side
which are not
needed and can be deleted, or missing volumes on host side.
To check if there are missing volumes on host side, you can inspect
the real chain
used by libvirt using virsh:
# virsh
virsh # list
Please enter your authentication name: vdsm@ovirt
Please enter your password: shibboleth
Id Name State
----------------------------------------------------
2 lsm-test running
virsh # dumpxml 2
<domain type='kvm' id='2'>
[...]
<devices>
[...]
<disk type='block' device='disk' snapshot='no'>
<driver name='qemu' type='qcow2' cache='none'
error_policy='stop' io='native'/>
<source
dev='/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/1e999a77-8fbb-4792-9224-0693be3242b9/images/bb26f6eb-d54d-43f3-8d18-e260efb1df7e/4786bb86-da94-44af-b012-51d899cc7225'/>
<backingStore type='block' index='1'>
<format type='qcow2'/>
<source
dev='/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/1e999a77-8fbb-4792-9224-0693be3242b9/images/bb26f6eb-d54d-43f3-8d18-e260efb1df7e/../bb26f6eb-d54d-43f3-8d18-e260efb1df7e/a07c0fec-242f-444b-8892-e4a0b22e08a7'/>
<backingStore type='block' index='2'>
<format type='qcow2'/>
<source
dev='/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/1e999a77-8fbb-4792-9224-0693be3242b9/images/bb26f6eb-d54d-43f3-8d18-e260efb1df7e/../bb26f6eb-d54d-43f3-8d18-e260efb1df7e/../bb26f6eb-d54d-43f3-8d18-e260efb1df7e/c76a763a-8208-4fa6-ab60-6bdab15b6159'/>
<backingStore type='block' index='3'>
<format type='qcow2'/>
<source
dev='/rhev/data-center/f9374c0e-ae24-4bc1-a596-f61d5f05bc5f/1e999a77-8fbb-4792-9224-0693be3242b9/images/bb26f6eb-d54d-43f3-8d18-e260efb1df7e/../bb26f6eb-d54d-43f3-8d18-e260efb1df7e/../bb26f6eb-d54d-43f3-8d18-e260efb1df7e/../bb26f6eb-d54d-43f3-8d18-e260efb1df7e/7c0d5c23-710d-445b-868c-9add6219436d'/>
<backingStore/>
</backingStore>
</backingStore>
</backingStore>
<target dev='vda' bus='virtio'/>
<serial>bb26f6eb-d54d-43f3-8d18-e260efb1df7e</serial>
<boot order='1'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x06'
function='0x0'/>
</disk>
[...]
Here we can see the real chain (removing everything but the volume id)
1.4786bb86-da94-44af-b012-51d899cc7225
2. a07c0fec-242f-444b-8892-e4a0b22e08a7
3. c76a763a-8208-4fa6-ab60-6bdab15b6159
4. 7c0d5c23-710d-445b-868c-9add6219436d
If engine complains about a snapshot which is not part of this chain,
the problem is in
engine database and we can safely remove the snapshot from the database.
If engine complains about a volume which is in this chain, and the volume is
missing on disk, this is an issue on the host side. I'm not sure it is
possible to
restore such missing file unless you have a backup.
It would be useful if you dump the xml of the vms with this issue and
attach it to
the bug.
Nir
>
> Thank you all for your help.
>
>> On Apr 20, 2016, at 3:19 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>>> On Wed, Apr 20, 2016 at 10:42 PM, Clint Boggio <clint(a)theboggios.com>
wrote:
>>> I grepped out the engine logs until i found reference to the illegal
>>> disk in question. The log indicates that the image has been flagged
>>> illegal because the original disk is no longer present. So it is very
>>> possible that the backup script, somehow through the miracle of 1's and
>>> 0's deleted the base VM disks.
>>>
>>> ######################
>>> # BEGIN
>>> ######################
>>>
>>> 2016-03-27 18:57:41,769
>>> INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSComma
>>> nd] (org.ovirt.thread.pool-8-thread-11) [30680dce] START,
>>> CreateSnapshotVDSCommand(
>>> CreateSnapshotVDSCommandParameters:{runAsync='true',
>>> storagePoolId='85a72afd-7bde-4065-a1bc-7fc6e22e6bf6',
>>> ignoreFailoverLimit='false',
storageDomainId='045c7fda-ab98-4905-876c-
>>> 00b5413a619f',
imageGroupId='ad486d26-4594-4d16-a402-68b45d82078a',
>>> imageSizeInBytes='268435456000', volumeFormat='COW',
>>> newImageId='e87e0c7c-4f6f-45e9-90ca-cf34617da3f6',
>>> newImageDescription='', imageInitialSizeInBytes='0',
imageId='d538e0ef-
>>> 2f55-4c74-b8f1-8900fd6b814b',
sourceImageGroupId='ad486d26-4594-4d16-
>>> a402-68b45d82078a'}), log id: 7648bbd2
>>> 2016-03-27 18:57:42,835
>>> INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSComma
>>> nd] (org.ovirt.thread.pool-8-thread-11) [30680dce] FINISH,
>>> CreateSnapshotVDSCommand, return: e87e0c7c-4f6f-45e9-90ca-cf34617da3f6,
>>> log id: 7648bbd2
>>> 2016-03-27 18:58:24,395
>>> INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand
>>> ] (org.ovirt.thread.pool-8-thread-20) [30680dce] START,
>>> GetImageInfoVDSCommand(
>>> GetImageInfoVDSCommandParameters:{runAsync='true',
>>> storagePoolId='85a72afd-7bde-4065-a1bc-7fc6e22e6bf6',
>>> ignoreFailoverLimit='false',
storageDomainId='045c7fda-ab98-4905-876c-
>>> 00b5413a619f',
imageGroupId='ad486d26-4594-4d16-a402-68b45d82078a',
>>> imageId='e87e0c7c-4f6f-45e9-90ca-cf34617da3f6'}), log id: 6d2d19f6
>>> 2016-03-28 14:14:49,454
>>> INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand]
>>> (pool-7-thread-3) [718f57] START, MergeVDSCommand(HostName = KVM04,
>>> MergeVDSCommandParameters:{runAsync='true',
hostId='b51933a3-9201-4446-
>>> a3e3-906a2ec1b467', vmId='6ef30172-b010-46fa-9482-accd30682232',
>>> storagePoolId='85a72afd-7bde-4065-a1bc-7fc6e22e6bf6',
>>> storageDomainId='045c7fda-ab98-4905-876c-00b5413a619f',
>>> imageGroupId='ad486d26-4594-4d16-a402-68b45d82078a',
imageId='e87e0c7c-
>>> 4f6f-45e9-90ca-cf34617da3f6', baseImageId='6e008200-3c21-4285-96b8-
>>> 07c29c0cb72c',
topImageId='d538e0ef-2f55-4c74-b8f1-8900fd6b814b',
>>> bandwidth='0'}), log id: 2cc2db4
>>> 2016-03-28 17:01:22,368
>>> INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSComma
>>> nd] (default task-77) [410b6a44] START, CreateSnapshotVDSCommand(
>>> CreateSnapshotVDSCommandParameters:{runAsync='true',
>>> storagePoolId='85a72afd-7bde-4065-a1bc-7fc6e22e6bf6',
>>> ignoreFailoverLimit='false',
storageDomainId='045c7fda-ab98-4905-876c-
>>> 00b5413a619f',
imageGroupId='ad486d26-4594-4d16-a402-68b45d82078a',
>>> imageSizeInBytes='268435456000', volumeFormat='COW',
>>> newImageId='919d6991-43e4-4f26-868e-031a01011191',
>>> newImageDescription='', imageInitialSizeInBytes='0',
imageId='e87e0c7c-
>>> 4f6f-45e9-90ca-cf34617da3f6',
sourceImageGroupId='ad486d26-4594-4d16-
>>> a402-68b45d82078a'}), log id: 4ed3e9ca
>>> 2016-03-28 18:36:28,404
>>> INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand]
>>> (pool-7-thread-1) [6911a44f] START, MergeVDSCommand(HostName = KVM04,
>>> MergeVDSCommandParameters:{runAsync='true',
hostId='b51933a3-9201-4446-
>>> a3e3-906a2ec1b467', vmId='6ef30172-b010-46fa-9482-accd30682232',
>>> storagePoolId='85a72afd-7bde-4065-a1bc-7fc6e22e6bf6',
>>> storageDomainId='045c7fda-ab98-4905-876c-00b5413a619f',
>>> imageGroupId='ad486d26-4594-4d16-a402-68b45d82078a',
imageId='919d6991-
>>> 43e4-4f26-868e-031a01011191', baseImageId='e87e0c7c-4f6f-45e9-90ca-
>>> cf34617da3f6',
topImageId='919d6991-43e4-4f26-868e-031a01011191',
>>> bandwidth='0'}), log id: d09cb70
>>> 2016-03-28 18:39:53,773
>>> INFO [org.ovirt.engine.core.bll.MergeCommandCallback]
>>> (DefaultQuartzScheduler_Worker-99) [6911a44f] Merge command has
>>> completed for images
'e87e0c7c-4f6f-45e9-90ca-cf34617da3f6'..'919d6991-
>>> 43e4-4f26-868e-031a01011191'
>>> 2016-03-28 18:41:23,003 ERROR
>>> [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand]
>>> (DefaultQuartzScheduler_Worker-44) [a00e3a8] Merging of snapshot
>>> 'a1b3c247-2c6f-4731-9e62-c15f5cfb9a72' images
'e87e0c7c-4f6f-45e9-90ca-
>>> cf34617da3f6'..'919d6991-43e4-4f26-868e-031a01011191' failed.
Images
>>> have been marked illegal and can no longer be previewed or reverted to.
>>> Please retry Live Merge on the snapshot to complete the operation.
>>
>> This is live merge failure - we have a similar bug causing this, and I have
>> reproduced similar failure today. This may be the same bug, we must inspect
>> the logs to be sure.
>>
>> Typically the merge succeeds in vdsm side, but from some reason the engine
>> fail to detect the merge success and mark the volumes as illegal.
>>
>>>
>>> ##################
>>> # END
>>> ##################
>>>
>>> If that's the case, then why (how) are the afflicted machines that have
>>> not been rebooted still running without thier backing disks ?
>>
>> It is possible to unlink a file while it is being used by another process.
>> The directory entry is removed so another process cannot access the file,
>> but processes that already opened the file are not affected.
>>
>> But this looks like the live merge issue, not like your backup script trying
>> too hard.
>>
>>>
>>> I can upload the logs and a copy of the backup script. Do you all have
>>> a repository you'd like meto upload to ? Let me know and i'll
upload
>>> them right now.
>>
>> Please file a bug and attach the files there.
>>
>> Nir
>>
>>>
>>>
>>>
>>>> On Wed, 2016-04-20 at 13:33 -0400, users-request(a)ovirt.org wrote:
>>>> Send Users mailing list submissions to
>>>> users(a)ovirt.org
>>>>
>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>> or, via email, send a message with subject or body 'help' to
>>>> users-request(a)ovirt.org
>>>>
>>>> You can reach the person managing the list at
>>>> users-owner(a)ovirt.org
>>>>
>>>> When replying, please edit your Subject line so it is more specific
>>>> than "Re: Contents of Users digest..."
>>>>
>>>>
>>>> Today's Topics:
>>>>
>>>> 1. Re: vhostmd vdsm-hook (Ars?ne Gschwind)
>>>> 2. Re: Disks Illegal State (Nir Soffer)
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> ---
>>>>
>>>> Message: 1
>>>> Date: Wed, 20 Apr 2016 19:09:39 +0200
>>>> From: Ars?ne Gschwind <arsene.gschwind(a)unibas.ch>
>>>> To: Simon Barrett <Simon.Barrett(a)tradingscreen.com>,
"users@ov
>>>> irt.org"
>>>> <users(a)ovirt.org>
>>>> Subject: Re: [ovirt-users] vhostmd vdsm-hook
>>>> Message-ID: <5717B7D3.2070502(a)unibas.ch>
>>>> Content-Type: text/plain; charset="windows-1252";
Format="flowed"
>>>>
>>>> I've never tried with 2 disks but I will assume that the next free
>>>> available disk will be used by the vdsm hook and the vm-dump-metrics
>>>> cmd
>>>> will check the kind of disk.
>>>> Let me know if you give a try....
>>>>
>>>> thanks,
>>>> Ars?ne
>>>>
>>>>> On 04/19/2016 02:43 PM, Simon Barrett wrote:
>>>>>
>>>>>
>>>>> Thanks again but how does that work when a VM is configured to
>>>>> have
>>>>> more than one disk?
>>>>>
>>>>> If I have a VM with a /dev/vda disk and a /dev/vdb disk, when I
>>>>> turn
>>>>> the vhostmd hook on the vm metric device gets created as /dev/vdb
>>>>> and
>>>>> the original /dev/vdb disk gets bumped to /dev/vdc.
>>>>>
>>>>> Is that expected behavior? Will that not cause problems?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Simon
>>>>>
>>>>> *From:*Ars?ne Gschwind [mailto:arsene.gschwind@unibas.ch]
>>>>> *Sent:* Tuesday, 19 April, 2016 13:06
>>>>> *To:* Simon Barrett <Simon.Barrett(a)tradingscreen.com>;
users@ovirt.
>>>>> org
>>>>> *Subject:* Re: [ovirt-users] vhostmd vdsm-hook
>>>>>
>>>>> The metric information are available on this additional disk
>>>>> /dev/vdb.
>>>>> You may install the package vm-dump-metrics and use the command
>>>>> vm-dump-metrics which will display all metrics in an xml format.
>>>>>
>>>>> Ars?ne
>>>>>
>>>>> On 04/19/2016 10:48 AM, Simon Barrett wrote:
>>>>>
>>>>> Thanks Ars?ne,
>>>>>
>>>>> I have vhostmd running on the ovirt node and have set the
>>>>> sap_agent to true on the VM configuration. I also stopped and
>>>>> started the VM to ensure that the config change took effect.
>>>>>
>>>>> On the oVirt node I see the vhostmd running and see the
>>>>> following
>>>>> entry in the qemu-kvm output:
>>>>>
>>>>> drive
>>>>> file=/dev/shm/vhostmd0,if=none,id=drive-virtio-
>>>>> disk701,readonly=on,format=raw
>>>>> -device
>>>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-
>>>>> disk701,id=virtio-disk701
>>>>>
>>>>> The part I wasn?t quite understanding was how this presented
>>>>> itself on the VM but I now see a new disk device ?/dev/vdb?. If
>>>>> I
>>>>> cat the contents of /dev/vdb I now see the information that is
>>>>> provided from the ovirt node, which is great news and very
>>>>> useful.
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>> Simon
>>>>>
>>>>> *From:*users-bounces@ovirt.org
<mailto:users-bounces@ovirt.org>
>>>>> [mailto:users-bounces@ovirt.org] *On Behalf Of *Ars?ne Gschwind
>>>>> *Sent:* Monday, 18 April, 2016 16:03
>>>>> *To:* users(a)ovirt.org <mailto:users@ovirt.org>
>>>>> *Subject:* Re: [ovirt-users] vhostmd vdsm-hook
>>>>>
>>>>> Hi Simon,
>>>>>
>>>>> You will need to have vhostmd running on the oVirt node and set
>>>>> the "sap_agent" custom property for the vm as you may
see on
>>>>> the
>>>>> screenshot.
>>>>>
>>>>> sap_agent
>>>>>
>>>>> Ars?ne
>>>>>
>>>>> On 04/15/2016 12:15 PM, Simon Barrett wrote:
>>>>>
>>>>> I?m trying to use the vhostmd vdsm host to access ovirt
>>>>> node
>>>>> metrics from within a VM. Vhostmd is running and updating
>>>>> the
>>>>> /dev/shm/vhostmd0 on the ovirt node.
>>>>>
>>>>> The part I?m stuck on is: ?This disk image is exported
>>>>> read-only to guests. Guests can read the disk image to see
>>>>> metrics? from
>>>>>
http://www.ovirt.org/develop/developer-guide/vdsm/hook/vhos
>>>>> tmd/
>>>>>
>>>>> Does the hook do this by default? I don?t see any new
>>>>> read-only device mounted in the guest. Is there additional
>>>>> work I need to do to mount this and access the data from
>>>>> within the guest?
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> Simon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>> Users mailing list
>>>>>
>>>>> Users(a)ovirt.org <mailto:Users@ovirt.org>
>>>>>
>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>