[ovirt-users] Disks Illegal State
Nir Soffer
nsoffer at redhat.com
Wed Apr 20 13:33:06 EDT 2016
On Wed, Apr 20, 2016 at 5:34 PM, Clint Boggio <clint at theboggios.com> wrote:
> The "vdsm-tool dump-volume-chains" command on the iSCSI storage domain
> shows one disk in "ILLEGAL" state while the gui shows 8 disk images in
> the same state.
Interesting - it would be useful to find the missing volume ids in
engine log and
understand wahy they are marked as illegal.
>
> ###########################################
> # BEGIN COMMAND OUTPUT
> ###########################################
>
>
>
> [root at KVM01 ~]# vdsm-tool dump-volume-chains 045c7fda-ab98-4905-876c-
> 00b5413a619f
>
> Images volume chains (base volume first)
>
> image: 477e73af-e7db-4914-81ed-89b3fbc876f7
>
> - c8320522-f839-472e-9707-a75f6fbe5cb6
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 882c73fc-a833-4e2e-8e6a-f714d80c0f0d
>
> - 689220c0-70f8-475f-98b2-6059e735cd1f
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 0ca8c49f-452e-4f61-a3fc-c4bf2711e200
>
> - dac06a5c-c5a8-4f82-aa8d-5c7a382da0b3
> status: OK, voltype: LEAF, format: RAW, legality: LEGAL,
> type: PREALLOCATED
>
>
> image: 0ca0b8f8-8802-46ae-a9f8-45d5647feeb7
>
> - 51a6de7b-b505-4c46-ae2a-25fb9faad810
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: ae6d2c62-cfbb-4765-930f-c0a0e3bc07d0
>
> - b2d39c7d-5b9b-498d-a955-0e99c9bd5f3c
> status: OK, voltype: INTERNAL, format: COW, legality:
> LEGAL, type: SPARSE
>
> - bf962809-3de7-4264-8c68-6ac12d65c151
> status: ILLEGAL, voltype: LEAF, format: COW, legality:
> ILLEGAL, type: SPARSE
Lets check vdsm and engine log, and find when and why this disk became
illegal.
If this was a result of a live merge that failed while finalizing the merge
on the engine side, we can safely delete the illegal volume.
If this is the case, we should find a live merge for volume
bf962809-3de7-4264-8c68-6ac12d65c151, and the live merge
should be successful on vdsm side. At this point, vdsm set the old
volume state to illegal. Engine should ask to delete this volume
later in this flow.
Adding Ala and Adam to look at this case.
>
>
> image: ff8c64c4-d52b-4812-b541-7f291f98d961
>
> - 85f77cd5-2f86-49a9-a411-8539114d3035
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 70fc19a2-75da-41bd-a1f6-eb857ed2f18f
>
> - a8f27397-395f-4b62-93c4-52699f59ea4b
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 2b315278-65f5-45e8-a51e-02b9bc84dcee
>
> - a6e2150b-57fa-46eb-b205-017fe01b0e4b
> status: OK, voltype: INTERNAL, format: COW, legality:
> LEGAL, type: SPARSE
>
> - 2d8e5c14-c923-49ac-8660-8e57b801e329
> status: OK, voltype: INTERNAL, format: COW, legality:
> LEGAL, type: SPARSE
>
> - 43100548-b849-4762-bfc5-18a0f281df2e
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: bf4594b0-242e-4823-abfd-9398ce5e31b7
>
> - 4608ce2e-f288-40da-b4e5-2a5e7f3bf837
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 00efca9d-932a-45b3-92c3-80065c1a40ce
>
> - a0bb00bc-cefa-4031-9b59-3cddc3a53a0a
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 5ce704eb-3508-4c36-b0ce-444ebdd27e66
>
> - e41f2c2d-0a79-49f1-8911-1535a82bd735
> status: OK, voltype: LEAF, format: RAW, legality: LEGAL,
> type: PREALLOCATED
>
>
> image: 11288fa5-0019-4ac0-8a7d-1d455e5e1549
>
> - 5df31efc-14dd-427c-b575-c0d81f47c6d8
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: a091f7df-5c64-4b6b-a806-f4bf3aad53bc
>
> - 38138111-2724-44a4-bde1-1fd9d60a1f63
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: c0b302c4-4b9d-4759-bb80-de1e865ecd58
>
> - d4db9ba7-1b39-4b48-b319-013ebc1d71ce
> status: OK, voltype: LEAF, format: RAW, legality: LEGAL,
> type: PREALLOCATED
>
>
> image: 21123edb-f74f-440b-9c42-4c16ba06a2b7
>
> - f3cc17aa-4336-4542-9ab0-9df27032be0b
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: ad486d26-4594-4d16-a402-68b45d82078a
>
> - e87e0c7c-4f6f-45e9-90ca-cf34617da3f6
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: c30c7f11-7818-4592-97ca-9d5be46e2d8e
>
> - cb53ad06-65e8-474d-94c3-9acf044d5a09
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 998ac54a-0d91-431f-8929-fe62f5d7290a
>
> - d11aa0ee-d793-4830-9120-3b118ca44b6c
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: a1e69838-0bdf-42f3-95a4-56e4084510a9
>
> - f687c727-ec06-49f1-9762-b0195e0b549a
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: a29598fe-f94e-4215-8508-19ac24b082c8
>
> - 29b9ff26-2386-4fb5-832e-b7129307ceb4
> status: OK, voltype: LEAF, format: RAW, legality: LEGAL,
> type: PREALLOCATED
>
>
> image: b151d4d7-d7fc-43ff-8bb2-75cf947ed626
>
> - 34676d55-695a-4d2a-a7fa-546971067829
> status: OK, voltype: LEAF, format: COW, legality: LEGAL,
> type: SPARSE
>
>
> image: 352a3a9a-4e1a-41bf-af86-717e374a7562
>
> - adcc7655-9586-48c1-90d2-1dc9a851bbe1
> status: OK, voltype: LEAF, format: RAW, legality: LEGAL,
> type: PREALLOCATED
>
>
> ###########################################
> # END COMMAND OUTPUT
> ########
> ###################################
>
>
> ###########################################
> # BEGIN LOG OUTPUT FROM ENGINE
> ###########################################
>
> The below output is an excerpt from the engine.log while attemtping to start one of the afflicted VM's. Following the storage chain out i have discovered that "919d6991-43e4-4f26-868e-031a01011191" does not exist. This is likely due to a failure of the backup python script that the client is using. I can provide that script if you all would like.
How is this related to backup? did you restore the files using your
backup script? or maybe
the backup script deleted files by mistake?
>
> 2016-04-20 08:56:58,285 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-8) [] Correlation ID: abd1342, Job ID: 1ad2ee48-2c2c-437e-997b-469e09498e41, Call Stack: null, Custom Event ID: -1, Message: VM Bill-V was started by admin at internal (Host: KVM03).
> 2016-04-20 08:57:00,392 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM Bill-V is down with error. Exit message: Unable to get volume size for domain 045c7fda-ab98-4905-876c- 00b5413a619f volume 919d6991-43e4-4f26-868e-031a01011191.
> 2016-04-20 08:57:00,393 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '6ef30172-b010-46fa-9482-accd30682232(Bill-V) is running in db and not running in VDS 'KVM03'
> 2016-04-20 08:57:00,498 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-5) [] Correlation ID: abd1342, Job ID: 1ad2ee48-2c2c-437e-997b-469e09498e41, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM Bill-V on Host KVM03.
We need the entire engine to investigate.
>
> ###########################################
> # END LOG OUTPUT FROM ENGINE
> ###########################################
>
> I have followed the storage chain out to where the UUID'ed snapshots live, and discovered that all of the "ILLEGAL" snapshots show to be broken symbolic links.
>
> Attached is a screenshot of the snapshots as they appear in the GUI. ALL of the UUID's illustrated show as broken symbolic links in the storage domains.
>
Unless you think that the issue is your backup script deleting files,
I think the best way to proceeed would be to file a bug:
https://bugzilla.redhat.com/enter_bug.cgi?product=ovirt-engine
Use:
oVirt Team: storage
Severity: high
Please include the information in this mail, and complete vdsm
and engine logs.
Nir
> On Tue, 2016-04-19 at 21:28 +0300, Nir Soffer wrote:
>> On Mon, Apr 18, 2016 at 3:16 PM, Clint Boggio <clint at theboggios.com>
>> wrote:
>> >
>> > OVirt 3.6, 4 node cluster with dedicated engine. Main storage
>> > domain is iscsi, ISO and Export domains are NFS.
>> >
>> > Several of my VM snapshot disks show to be in an "illegal state".
>> > The system will not allow me to manipulate the snapshots in any
>> > way, nor clone the active system, or create a new snapshot.
>> >
>> > In the logs I see that the system complains about not being able to
>> > "get volume size for xxx", and also that the system appears to
>> > believe that the image is "locked" and is currently in the snapshot
>> > process.
>> >
>> > Of the VM's with this status, one rebooted and was lost due to
>> > "cannot get volume size for domain xxx".
>> >
>> Can you share the vdsm log showing these errors?
>>
>> Also it may helpful to get the output of this command:
>>
>> vdsm-tool dump-volume-chains SDUUID
>>
>> Nir
>>
>> >
>> > I fear that in this current condition, should any of the other
>> > machine reboot, they too will be lost.
>> >
>> > How can I troubleshoot this problem further, and hopefully
>> > alleviate the condition ?
>> >
>> > Thank you for your help.
>> >
>> > Clint
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
More information about the Users
mailing list