
OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is iscsi, ISO and Export domains are NFS. Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot. In the logs I see that the system complains about not being able to "get volume size for xxx", and also that the system appears to believe that the image is "locked" and is currently in the snapshot process. Of the VM's with this status, one rebooted and was lost due to "cannot get volume size for domain xxx". I fear that in this current condition, should any of the other machine reboot, they too will be lost. How can I troubleshoot this problem further, and hopefully alleviate the condition ? Thank you for your help. Clint

On 18 April 2016 at 13:16, Clint Boggio <clint@theboggios.com> wrote:
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
I've experienced a similar thing before and referred to this [1] comment on BZ1302215 to help me out. Followed those steps and my issue was resolved. I'm still not aware of the root cause of this problem though. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13

Hi Clint, Did you try to remove the relevant snapshots with illegal disks? In case you do, the removal failed or passed? See this: https://bugzilla.redhat.com/show_bug.cgi?id=1275836 Thanks, Raz Tamir Red Hat Israel On Mon, Apr 18, 2016 at 3:20 PM, Ollie Armstrong <ollie@fubra.com> wrote:
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone
On 18 April 2016 at 13:16, Clint Boggio <clint@theboggios.com> wrote: the active system, or create a new snapshot.
How can I troubleshoot this problem further, and hopefully alleviate the
condition ?
I've experienced a similar thing before and referred to this [1] comment on BZ1302215 to help me out. Followed those steps and my issue was resolved. I'm still not aware of the root cause of this problem though.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail-73AA0A94-83F0-47A2-9800-628A00C0D265 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Yes i did try that and the system would not allow it.
On Apr 18, 2016, at 7:27 AM, Raz Tamir <ratamir@redhat.com> wrote: =20 =E2=80=8B=E2=80=8BHi Clint, Did you try to remove the relevant snapshots with illegal disks? In case you do, the removal failed or passed? =20 See this: https://bugzilla.redhat.com/show_bug.cgi?id=3D1275836 =20 =20 =20 =20 Thanks, Raz Tamir Red Hat Israel =20
On Mon, Apr 18, 2016 at 3:20 PM, Ollie Armstrong <ollie@fubra.com> wrote:=
Several of my VM snapshot disks show to be in an "illegal state". The s= ystem will not allow me to manipulate the snapshots in any way, nor clone th= e active system, or create a new snapshot.
How can I troubleshoot this problem further, and hopefully alleviate th= e condition ? =20 I've experienced a similar thing before and referred to this [1] comment on BZ1302215 to help me out. Followed those steps and my issue was resolved. I'm still not aware of the root cause of this
On 18 April 2016 at 13:16, Clint Boggio <clint@theboggios.com> wrote: problem though. =20 [1] https://bugzilla.redhat.com/show_bug.cgi?id=3D1302215#c13 _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users =20
<div><font size=3D"2" face=3D"arial, helvetica, sans-serif" color=3D"#00000= 0">Red Hat Israel</font></div></div></div></div></div></div></div></div></di= v></div></div></div> <br><div class=3D"gmail_quote">On Mon, Apr 18, 2016 at 3:20 PM, Ollie Armstr= ong <span dir=3D"ltr"><<a href=3D"mailto:ollie@fubra.com" target=3D"_blan= k">ollie@fubra.com</a>></span> wrote:<br><blockquote class=3D"gmail_quote= " style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:= rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=3D"">= On 18 April 2016 at 13:16, Clint Boggio <<a href=3D"mailto:clint@theboggi= os.com">clint@theboggios.com</a>> wrote:<br> > Several of my VM snapshot disks show to be in an "illegal state". The s= ystem will not allow me to manipulate the snapshots in any way, nor clone th= e active system, or create a new snapshot.<br> ><br> </span><span class=3D"">> How can I troubleshoot this problem further, an= d hopefully alleviate the condition ?<br> <br> </span>I've experienced a similar thing before and referred to this [1]<br> comment on BZ1302215 to help me out. Followed those steps and my<br> issue was resolved. I'm still not aware of the root cause of this<br>
--Apple-Mail-73AA0A94-83F0-47A2-9800-628A00C0D265 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D= utf-8"></head><body dir=3D"auto"><div>Yes i did try that and the system woul= d not allow it.</div><div><br>On Apr 18, 2016, at 7:27 AM, Raz Tamir <<a h= ref=3D"mailto:ratamir@redhat.com">ratamir@redhat.com</a>> wrote:<br><br><= /div><blockquote type=3D"cite"><div><div dir=3D"ltr"><div class=3D"gmail_def= ault" style=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:= rgb(0,0,0)">=E2=80=8B=E2=80=8BHi <span style=3D"font-size:12.8px;white-= space:nowrap;font-family:arial,sans-serif;color:rgb(34,34,34)">Clint,</span>= </div><div class=3D"gmail_default"><span style=3D"font-size:12.8px;white-spa= ce:nowrap">Did you try to remove the relevant snapshots with illegal disks?<= /span></div><div class=3D"gmail_default"><span style=3D"font-size:12.8px;whi= te-space:nowrap">In case you do, the removal failed or passed?</span></div><= div class=3D"gmail_default"><span style=3D"font-size:12.8px;white-space:nowr= ap"><br></span></div><div class=3D"gmail_default"><span style=3D"font-size:1= 2.8px;white-space:nowrap">See this: <a href=3D"https://bugzilla.redhat.= com/show_bug.cgi?id=3D1275836">https://bugzilla.redhat.com/show_bug.cgi?id=3D= 1275836</a></span></div><div class=3D"gmail_default"><span style=3D"font-siz= e:12.8px;white-space:nowrap"><br></span></div><div class=3D"gmail_extra"><br= clear=3D"all"><div><div class=3D"gmail_signature"><div dir=3D"ltr"><div><di= v dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"l= tr"><font face=3D"times new roman, serif" size=3D"4"><br></font></div><div d= ir=3D"ltr"><font size=3D"2" face=3D"arial, helvetica, sans-serif" color=3D"#= 000000"><br></font></div><div dir=3D"ltr"><font size=3D"2" face=3D"arial, he= lvetica, sans-serif" color=3D"#000000">Thanks,</font><div><font size=3D"2" f= ace=3D"arial, helvetica, sans-serif" color=3D"#000000">Raz Tamir</font></div= problem though.<br> <br> [1] <a href=3D"https://bugzilla.redhat.com/show_bug.cgi?id=3D1302215#c13" re= l=3D"noreferrer" target=3D"_blank">https://bugzilla.redhat.com/show_bug.cgi?= id=3D1302215#c13</a><br> <div class=3D""><div class=3D"h5">__________________________________________= _____<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer"= target=3D"_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> </div></div></blockquote></div><br></div></div> </div></blockquote></body></html>= --Apple-Mail-73AA0A94-83F0-47A2-9800-628A00C0D265--

Thank you I will read on the link provided and follow the course.
On Apr 18, 2016, at 7:20 AM, Ollie Armstrong <ollie@fubra.com> wrote:
On 18 April 2016 at 13:16, Clint Boggio <clint@theboggios.com> wrote: Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
I've experienced a similar thing before and referred to this [1] comment on BZ1302215 to help me out. Followed those steps and my issue was resolved. I'm still not aware of the root cause of this problem though.

This is a multi-part message in MIME format. ------=_NextPartTM-000-7d39ebe2-944b-48be-9ecc-ea36b1014843 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable > Von: users-bounces@ovirt.org [users-bounces@ovirt.org]" im Auftrag v= on "Clint Boggio [clint@theboggios.com]=0A= > Gesendet: Montag, 18. April 2016 14:16=0A= > An: users@ovirt.org=0A= > Betreff: [ovirt-users] Disks Illegal State=0A= > =0A= > OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is i= scsi, ISO and Export domains are NFS.=0A= > =0A= > Several of my VM snapshot disks show to be in an "illegal state". The sys= tem will not allow me to manipulate the snapshots in any way, nor clone the= active system, or create a new snapshot.=0A= > =0A= > In the logs I see that the system complains about not being able to "get = volume size for xxx", and also that the system appears to believe that the = image is "locked" and is currently in the snapshot process.=0A= > =0A= > Of the VM's with this status, one rebooted and was lost due to "cannot ge= t volume size for domain xxx".=0A= > =0A= > I fear that in this current condition, should any of the other machine re= boot, they too will be lost.=0A= > =0A= > How can I troubleshoot this problem further, and hopefully alleviate the = condition ?=0A= > =0A= > Thank you for your help.=0A= =0A= Hi Clint,=0A= =0A= for us the problem always boils down to the following steps. Might be simpl= er as we use=0A= NFS for all of our domains and have direct access to the image files.=0A= =0A= 1) Check if snapshot disks are currently used. Capture the qemu command lin= e with a "ps -ef"=0A= on the nodes. There you can see what images qemu is started with. For each = of the files check=0A= the backing chain:=0A= =0A= # qemu-img info /rhev/.../bbd05dd8-c3bf-4d15-9317-73040e04abae=0A= image: bbd05dd8-c3bf-4d15-9317-73040e04abae=0A= file format: qcow2=0A= virtual size: 50G (53687091200 bytes)=0A= disk size: 133M=0A= cluster_size: 65536=0A= backing file: ../f8ebfb39-2ac6-4b87-b193-4204d1854edc/595b95f4-ce1a-4298-bd= 27-3f6745ae4e4c=0A= backing file format: raw=0A= Format specific information:=0A= compat: 0.10=0A= =0A= # qemu-img info .../595b95f4-ce1a-4298-bd27-3f6745ae4e4c (see above)=0A= ...=0A= =0A= I don't know how you can accomplish this on ISCSI (and LVM based images ins= ide iirc). We=0A= usually follow the backing chain and test if all the files exist and are li= nked correctly. Especially=0A= if everything matches the OVirt GUI. I guess this is the most important par= t for you.=0A= =0A= 2) In most of our cases everything is fine and only the OVirt database is w= rong. So we fix it=0A= at our own risk. Because of your explanation I do not recommend that for yo= u. It is just for =0A= documentation purpose.=0A= =0A= engine# su - postgres=0A= > psql engine postgres=0A= =0A= > select image_group_id,imagestatus from images where imagestatus =3D4;=0A= > ... list of illegal images=0A= > update images set imagestatus =3D1 where imagestatus =3D 4 and <other cri= teria>;=0A= > commit=0A= =0A= > select description,status from snapshots where status <> 'OK';=0A= > ... list of locked snapshots=0A= > update snapshots set status =3D 'OK' where status <> 'OK' and <other crit= eria>;=0A= > commit=0A= =0A= > \q=0A= =0A= Restart engine and everything should be in sync again. =0A= =0A= Best regards.=0A= =0A= Markus= ------=_NextPartTM-000-7d39ebe2-944b-48be-9ecc-ea36b1014843 Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-7d39ebe2-944b-48be-9ecc-ea36b1014843--

Markus thank you so much for the information. I'll be focusing on resolution of this problem this week and I'll keep you in the loop. On Apr 18, 2016, at 7:39 AM, Markus Stockhausen <stockhausen@collogia.de> wrote:
Von: users-bounces@ovirt.org [users-bounces@ovirt.org]" im Auftrag von "Clint Boggio [clint@theboggios.com] Gesendet: Montag, 18. April 2016 14:16 An: users@ovirt.org Betreff: [ovirt-users] Disks Illegal State
OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is iscsi, ISO and Export domains are NFS.
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
In the logs I see that the system complains about not being able to "get volume size for xxx", and also that the system appears to believe that the image is "locked" and is currently in the snapshot process.
Of the VM's with this status, one rebooted and was lost due to "cannot get volume size for domain xxx".
I fear that in this current condition, should any of the other machine reboot, they too will be lost.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
Thank you for your help.
Hi Clint,
for us the problem always boils down to the following steps. Might be simpler as we use NFS for all of our domains and have direct access to the image files.
1) Check if snapshot disks are currently used. Capture the qemu command line with a "ps -ef" on the nodes. There you can see what images qemu is started with. For each of the files check the backing chain:
# qemu-img info /rhev/.../bbd05dd8-c3bf-4d15-9317-73040e04abae image: bbd05dd8-c3bf-4d15-9317-73040e04abae file format: qcow2 virtual size: 50G (53687091200 bytes) disk size: 133M cluster_size: 65536 backing file: ../f8ebfb39-2ac6-4b87-b193-4204d1854edc/595b95f4-ce1a-4298-bd27-3f6745ae4e4c backing file format: raw Format specific information: compat: 0.10
# qemu-img info .../595b95f4-ce1a-4298-bd27-3f6745ae4e4c (see above) ...
I don't know how you can accomplish this on ISCSI (and LVM based images inside iirc). We usually follow the backing chain and test if all the files exist and are linked correctly. Especially if everything matches the OVirt GUI. I guess this is the most important part for you.
2) In most of our cases everything is fine and only the OVirt database is wrong. So we fix it at our own risk. Because of your explanation I do not recommend that for you. It is just for documentation purpose.
engine# su - postgres
psql engine postgres
select image_group_id,imagestatus from images where imagestatus =4; ... list of illegal images update images set imagestatus =1 where imagestatus = 4 and <other criteria>; commit
select description,status from snapshots where status <> 'OK'; ... list of locked snapshots update snapshots set status = 'OK' where status <> 'OK' and <other criteria>; commit
\q
Restart engine and everything should be in sync again.
Best regards.
Markus= <InterScan_Disclaimer.txt>

We're seeing this in RHEV 3.5 with snapshot management on VMs with multiple disks. It would be awesome to have a "fsck" type script that could be run daily which reports on any problems with the snapshot disks. On Mon, Apr 18, 2016 at 10:59 PM, Clint Boggio <clint@theboggios.com> wrote:
Markus thank you so much for the information. I'll be focusing on resolution of this problem this week and I'll keep you in the loop.
On Apr 18, 2016, at 7:39 AM, Markus Stockhausen <stockhausen@collogia.de> wrote:
Von: users-bounces@ovirt.org [users-bounces@ovirt.org]" im Auftrag von "Clint Boggio [clint@theboggios.com] Gesendet: Montag, 18. April 2016 14:16 An: users@ovirt.org Betreff: [ovirt-users] Disks Illegal State
OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is iscsi, ISO and Export domains are NFS.
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
In the logs I see that the system complains about not being able to "get volume size for xxx", and also that the system appears to believe that the image is "locked" and is currently in the snapshot process.
Of the VM's with this status, one rebooted and was lost due to "cannot get volume size for domain xxx".
I fear that in this current condition, should any of the other machine reboot, they too will be lost.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
Thank you for your help.
Hi Clint,
for us the problem always boils down to the following steps. Might be simpler as we use NFS for all of our domains and have direct access to the image files.
1) Check if snapshot disks are currently used. Capture the qemu command line with a "ps -ef" on the nodes. There you can see what images qemu is started with. For each of the files check the backing chain:
# qemu-img info /rhev/.../bbd05dd8-c3bf-4d15-9317-73040e04abae image: bbd05dd8-c3bf-4d15-9317-73040e04abae file format: qcow2 virtual size: 50G (53687091200 bytes) disk size: 133M cluster_size: 65536 backing file: ../f8ebfb39-2ac6-4b87-b193-4204d1854edc/595b95f4-ce1a-4298-bd27-3f6745ae4e4c backing file format: raw Format specific information: compat: 0.10
# qemu-img info .../595b95f4-ce1a-4298-bd27-3f6745ae4e4c (see above) ...
I don't know how you can accomplish this on ISCSI (and LVM based images inside iirc). We usually follow the backing chain and test if all the files exist and are linked correctly. Especially if everything matches the OVirt GUI. I guess this is the most important part for you.
2) In most of our cases everything is fine and only the OVirt database is wrong. So we fix it at our own risk. Because of your explanation I do not recommend that for you. It is just for documentation purpose.
engine# su - postgres
psql engine postgres
select image_group_id,imagestatus from images where imagestatus =4; ... list of illegal images update images set imagestatus =1 where imagestatus = 4 and <other criteria>; commit
select description,status from snapshots where status <> 'OK'; ... list of locked snapshots update snapshots set status = 'OK' where status <> 'OK' and <other criteria>; commit
\q
Restart engine and everything should be in sync again.
Best regards.
Markus= <InterScan_Disclaimer.txt>
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

In my case Markus, the backing disks are MIA and show only as bright red broken symbolic links. Using the postgres commands to set them as OK would be folly, and likley cause more trouble. if the snapshot disks are truly gone, (and they are), what procedure would i use to inform the database and set the VM's in a usable status status again ? On Mon, 2016-04-18 at 12:39 +0000, Markus Stockhausen wrote:
Von: users-bounces@ovirt.org [users-bounces@ovirt.org]" im Auftrag von "Clint Boggio [clint@theboggios.com] Gesendet: Montag, 18. April 2016 14:16 An: users@ovirt.org Betreff: [ovirt-users] Disks Illegal State
OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is iscsi, ISO and Export domains are NFS.
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
In the logs I see that the system complains about not being able to "get volume size for xxx", and also that the system appears to believe that the image is "locked" and is currently in the snapshot process.
Of the VM's with this status, one rebooted and was lost due to "cannot get volume size for domain xxx".
I fear that in this current condition, should any of the other machine reboot, they too will be lost.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
Thank you for your help.
Hi Clint,
for us the problem always boils down to the following steps. Might be simpler as we use NFS for all of our domains and have direct access to the image files.
1) Check if snapshot disks are currently used. Capture the qemu command line with a "ps -ef" on the nodes. There you can see what images qemu is started with. For each of the files check the backing chain:
# qemu-img info /rhev/.../bbd05dd8-c3bf-4d15-9317-73040e04abae image: bbd05dd8-c3bf-4d15-9317-73040e04abae file format: qcow2 virtual size: 50G (53687091200 bytes) disk size: 133M cluster_size: 65536 backing file: ../f8ebfb39-2ac6-4b87-b193-4204d1854edc/595b95f4-ce1a- 4298-bd27-3f6745ae4e4c backing file format: raw Format specific information: compat: 0.10
# qemu-img info .../595b95f4-ce1a-4298-bd27-3f6745ae4e4c (see above) ...
I don't know how you can accomplish this on ISCSI (and LVM based images inside iirc). We usually follow the backing chain and test if all the files exist and are linked correctly. Especially if everything matches the OVirt GUI. I guess this is the most important part for you.
2) In most of our cases everything is fine and only the OVirt database is wrong. So we fix it at our own risk. Because of your explanation I do not recommend that for you. It is just for documentation purpose.
engine# su - postgres
psql engine postgres
select image_group_id,imagestatus from images where imagestatus =4; ... list of illegal images update images set imagestatus =1 where imagestatus = 4 and <other criteria>; commit
select description,status from snapshots where status <> 'OK'; ... list of locked snapshots update snapshots set status = 'OK' where status <> 'OK' and <other criteria>; commit
\q
Restart engine and everything should be in sync again.
Best regards.
Markus

On Mon, Apr 18, 2016 at 3:16 PM, Clint Boggio <clint@theboggios.com> wrote:
OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is iscsi, ISO and Export domains are NFS.
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
In the logs I see that the system complains about not being able to "get volume size for xxx", and also that the system appears to believe that the image is "locked" and is currently in the snapshot process.
Of the VM's with this status, one rebooted and was lost due to "cannot get volume size for domain xxx".
Can you share the vdsm log showing these errors? Also it may helpful to get the output of this command: vdsm-tool dump-volume-chains SDUUID Nir
I fear that in this current condition, should any of the other machine reboot, they too will be lost.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
Thank you for your help.
Clint _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

The "vdsm-tool dump-volume-chains" command on the iSCSI storage domain shows one disk in "ILLEGAL" state while the gui shows 8 disk images in the same state. ########################################### # BEGIN COMMAND OUTPUT ########################################### [root@KVM01 ~]# vdsm-tool dump-volume-chains 045c7fda-ab98-4905-876c- 00b5413a619f Images volume chains (base volume first) image: 477e73af-e7db-4914-81ed-89b3fbc876f7 - c8320522-f839-472e-9707-a75f6fbe5cb6 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 882c73fc-a833-4e2e-8e6a-f714d80c0f0d - 689220c0-70f8-475f-98b2-6059e735cd1f status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 0ca8c49f-452e-4f61-a3fc-c4bf2711e200 - dac06a5c-c5a8-4f82-aa8d-5c7a382da0b3 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED image: 0ca0b8f8-8802-46ae-a9f8-45d5647feeb7 - 51a6de7b-b505-4c46-ae2a-25fb9faad810 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: ae6d2c62-cfbb-4765-930f-c0a0e3bc07d0 - b2d39c7d-5b9b-498d-a955-0e99c9bd5f3c status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE - bf962809-3de7-4264-8c68-6ac12d65c151 status: ILLEGAL, voltype: LEAF, format: COW, legality: ILLEGAL, type: SPARSE image: ff8c64c4-d52b-4812-b541-7f291f98d961 - 85f77cd5-2f86-49a9-a411-8539114d3035 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 70fc19a2-75da-41bd-a1f6-eb857ed2f18f - a8f27397-395f-4b62-93c4-52699f59ea4b status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 2b315278-65f5-45e8-a51e-02b9bc84dcee - a6e2150b-57fa-46eb-b205-017fe01b0e4b status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE - 2d8e5c14-c923-49ac-8660-8e57b801e329 status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE - 43100548-b849-4762-bfc5-18a0f281df2e status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: bf4594b0-242e-4823-abfd-9398ce5e31b7 - 4608ce2e-f288-40da-b4e5-2a5e7f3bf837 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 00efca9d-932a-45b3-92c3-80065c1a40ce - a0bb00bc-cefa-4031-9b59-3cddc3a53a0a status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 5ce704eb-3508-4c36-b0ce-444ebdd27e66 - e41f2c2d-0a79-49f1-8911-1535a82bd735 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED image: 11288fa5-0019-4ac0-8a7d-1d455e5e1549 - 5df31efc-14dd-427c-b575-c0d81f47c6d8 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: a091f7df-5c64-4b6b-a806-f4bf3aad53bc - 38138111-2724-44a4-bde1-1fd9d60a1f63 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: c0b302c4-4b9d-4759-bb80-de1e865ecd58 - d4db9ba7-1b39-4b48-b319-013ebc1d71ce status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED image: 21123edb-f74f-440b-9c42-4c16ba06a2b7 - f3cc17aa-4336-4542-9ab0-9df27032be0b status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: ad486d26-4594-4d16-a402-68b45d82078a - e87e0c7c-4f6f-45e9-90ca-cf34617da3f6 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: c30c7f11-7818-4592-97ca-9d5be46e2d8e - cb53ad06-65e8-474d-94c3-9acf044d5a09 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 998ac54a-0d91-431f-8929-fe62f5d7290a - d11aa0ee-d793-4830-9120-3b118ca44b6c status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: a1e69838-0bdf-42f3-95a4-56e4084510a9 - f687c727-ec06-49f1-9762-b0195e0b549a status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: a29598fe-f94e-4215-8508-19ac24b082c8 - 29b9ff26-2386-4fb5-832e-b7129307ceb4 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED image: b151d4d7-d7fc-43ff-8bb2-75cf947ed626 - 34676d55-695a-4d2a-a7fa-546971067829 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE image: 352a3a9a-4e1a-41bf-af86-717e374a7562 - adcc7655-9586-48c1-90d2-1dc9a851bbe1 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED ########################################### # END COMMAND OUTPUT ######## ################################### ########################################### # BEGIN LOG OUTPUT FROM ENGINE ########################################### The below output is an excerpt from the engine.log while attemtping to start one of the afflicted VM's. Following the storage chain out i have discovered that "919d6991-43e4-4f26-868e-031a01011191" does not exist. This is likely due to a failure of the backup python script that the client is using. I can provide that script if you all would like. 2016-04-20 08:56:58,285 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-8) [] Correlation ID: abd1342, Job ID: 1ad2ee48-2c2c-437e-997b-469e09498e41, Call Stack: null, Custom Event ID: -1, Message: VM Bill-V was started by admin@internal (Host: KVM03). 2016-04-20 08:57:00,392 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM Bill-V is down with error. Exit message: Unable to get volume size for domain 045c7fda-ab98-4905-876c- 00b5413a619f volume 919d6991-43e4-4f26-868e-031a01011191. 2016-04-20 08:57:00,393 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '6ef30172-b010-46fa-9482-accd30682232(Bill-V) is running in db and not running in VDS 'KVM03' 2016-04-20 08:57:00,498 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-5) [] Correlation ID: abd1342, Job ID: 1ad2ee48-2c2c-437e-997b-469e09498e41, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM Bill-V on Host KVM03. ########################################### # END LOG OUTPUT FROM ENGINE ########################################### I have followed the storage chain out to where the UUID'ed snapshots live, and discovered that all of the "ILLEGAL" snapshots show to be broken symbolic links. Attached is a screenshot of the snapshots as they appear in the GUI. ALL of the UUID's illustrated show as broken symbolic links in the storage domains. On Tue, 2016-04-19 at 21:28 +0300, Nir Soffer wrote:
On Mon, Apr 18, 2016 at 3:16 PM, Clint Boggio <clint@theboggios.com> wrote:
OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is iscsi, ISO and Export domains are NFS.
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
In the logs I see that the system complains about not being able to "get volume size for xxx", and also that the system appears to believe that the image is "locked" and is currently in the snapshot process.
Of the VM's with this status, one rebooted and was lost due to "cannot get volume size for domain xxx".
Can you share the vdsm log showing these errors?
Also it may helpful to get the output of this command:
vdsm-tool dump-volume-chains SDUUID
Nir
I fear that in this current condition, should any of the other machine reboot, they too will be lost.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
Thank you for your help.
Clint _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Wed, Apr 20, 2016 at 5:34 PM, Clint Boggio <clint@theboggios.com> wrote:
The "vdsm-tool dump-volume-chains" command on the iSCSI storage domain shows one disk in "ILLEGAL" state while the gui shows 8 disk images in the same state.
Interesting - it would be useful to find the missing volume ids in engine log and understand wahy they are marked as illegal.
########################################### # BEGIN COMMAND OUTPUT ###########################################
[root@KVM01 ~]# vdsm-tool dump-volume-chains 045c7fda-ab98-4905-876c- 00b5413a619f
Images volume chains (base volume first)
image: 477e73af-e7db-4914-81ed-89b3fbc876f7
- c8320522-f839-472e-9707-a75f6fbe5cb6 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 882c73fc-a833-4e2e-8e6a-f714d80c0f0d
- 689220c0-70f8-475f-98b2-6059e735cd1f status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 0ca8c49f-452e-4f61-a3fc-c4bf2711e200
- dac06a5c-c5a8-4f82-aa8d-5c7a382da0b3 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED
image: 0ca0b8f8-8802-46ae-a9f8-45d5647feeb7
- 51a6de7b-b505-4c46-ae2a-25fb9faad810 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: ae6d2c62-cfbb-4765-930f-c0a0e3bc07d0
- b2d39c7d-5b9b-498d-a955-0e99c9bd5f3c status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE
- bf962809-3de7-4264-8c68-6ac12d65c151 status: ILLEGAL, voltype: LEAF, format: COW, legality: ILLEGAL, type: SPARSE
Lets check vdsm and engine log, and find when and why this disk became illegal. If this was a result of a live merge that failed while finalizing the merge on the engine side, we can safely delete the illegal volume. If this is the case, we should find a live merge for volume bf962809-3de7-4264-8c68-6ac12d65c151, and the live merge should be successful on vdsm side. At this point, vdsm set the old volume state to illegal. Engine should ask to delete this volume later in this flow. Adding Ala and Adam to look at this case.
image: ff8c64c4-d52b-4812-b541-7f291f98d961
- 85f77cd5-2f86-49a9-a411-8539114d3035 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 70fc19a2-75da-41bd-a1f6-eb857ed2f18f
- a8f27397-395f-4b62-93c4-52699f59ea4b status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 2b315278-65f5-45e8-a51e-02b9bc84dcee
- a6e2150b-57fa-46eb-b205-017fe01b0e4b status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE
- 2d8e5c14-c923-49ac-8660-8e57b801e329 status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE
- 43100548-b849-4762-bfc5-18a0f281df2e status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: bf4594b0-242e-4823-abfd-9398ce5e31b7
- 4608ce2e-f288-40da-b4e5-2a5e7f3bf837 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 00efca9d-932a-45b3-92c3-80065c1a40ce
- a0bb00bc-cefa-4031-9b59-3cddc3a53a0a status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 5ce704eb-3508-4c36-b0ce-444ebdd27e66
- e41f2c2d-0a79-49f1-8911-1535a82bd735 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED
image: 11288fa5-0019-4ac0-8a7d-1d455e5e1549
- 5df31efc-14dd-427c-b575-c0d81f47c6d8 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: a091f7df-5c64-4b6b-a806-f4bf3aad53bc
- 38138111-2724-44a4-bde1-1fd9d60a1f63 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: c0b302c4-4b9d-4759-bb80-de1e865ecd58
- d4db9ba7-1b39-4b48-b319-013ebc1d71ce status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED
image: 21123edb-f74f-440b-9c42-4c16ba06a2b7
- f3cc17aa-4336-4542-9ab0-9df27032be0b status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: ad486d26-4594-4d16-a402-68b45d82078a
- e87e0c7c-4f6f-45e9-90ca-cf34617da3f6 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: c30c7f11-7818-4592-97ca-9d5be46e2d8e
- cb53ad06-65e8-474d-94c3-9acf044d5a09 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 998ac54a-0d91-431f-8929-fe62f5d7290a
- d11aa0ee-d793-4830-9120-3b118ca44b6c status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: a1e69838-0bdf-42f3-95a4-56e4084510a9
- f687c727-ec06-49f1-9762-b0195e0b549a status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: a29598fe-f94e-4215-8508-19ac24b082c8
- 29b9ff26-2386-4fb5-832e-b7129307ceb4 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED
image: b151d4d7-d7fc-43ff-8bb2-75cf947ed626
- 34676d55-695a-4d2a-a7fa-546971067829 status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE
image: 352a3a9a-4e1a-41bf-af86-717e374a7562
- adcc7655-9586-48c1-90d2-1dc9a851bbe1 status: OK, voltype: LEAF, format: RAW, legality: LEGAL, type: PREALLOCATED
########################################### # END COMMAND OUTPUT ######## ###################################
########################################### # BEGIN LOG OUTPUT FROM ENGINE ###########################################
The below output is an excerpt from the engine.log while attemtping to start one of the afflicted VM's. Following the storage chain out i have discovered that "919d6991-43e4-4f26-868e-031a01011191" does not exist. This is likely due to a failure of the backup python script that the client is using. I can provide that script if you all would like.
How is this related to backup? did you restore the files using your backup script? or maybe the backup script deleted files by mistake?
2016-04-20 08:56:58,285 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-8) [] Correlation ID: abd1342, Job ID: 1ad2ee48-2c2c-437e-997b-469e09498e41, Call Stack: null, Custom Event ID: -1, Message: VM Bill-V was started by admin@internal (Host: KVM03). 2016-04-20 08:57:00,392 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM Bill-V is down with error. Exit message: Unable to get volume size for domain 045c7fda-ab98-4905-876c- 00b5413a619f volume 919d6991-43e4-4f26-868e-031a01011191. 2016-04-20 08:57:00,393 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '6ef30172-b010-46fa-9482-accd30682232(Bill-V) is running in db and not running in VDS 'KVM03' 2016-04-20 08:57:00,498 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-5) [] Correlation ID: abd1342, Job ID: 1ad2ee48-2c2c-437e-997b-469e09498e41, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM Bill-V on Host KVM03.
We need the entire engine to investigate.
########################################### # END LOG OUTPUT FROM ENGINE ###########################################
I have followed the storage chain out to where the UUID'ed snapshots live, and discovered that all of the "ILLEGAL" snapshots show to be broken symbolic links.
Attached is a screenshot of the snapshots as they appear in the GUI. ALL of the UUID's illustrated show as broken symbolic links in the storage domains.
Unless you think that the issue is your backup script deleting files, I think the best way to proceeed would be to file a bug: https://bugzilla.redhat.com/enter_bug.cgi?product=ovirt-engine Use: oVirt Team: storage Severity: high Please include the information in this mail, and complete vdsm and engine logs. Nir
On Tue, 2016-04-19 at 21:28 +0300, Nir Soffer wrote:
On Mon, Apr 18, 2016 at 3:16 PM, Clint Boggio <clint@theboggios.com> wrote:
OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is iscsi, ISO and Export domains are NFS.
Several of my VM snapshot disks show to be in an "illegal state". The system will not allow me to manipulate the snapshots in any way, nor clone the active system, or create a new snapshot.
In the logs I see that the system complains about not being able to "get volume size for xxx", and also that the system appears to believe that the image is "locked" and is currently in the snapshot process.
Of the VM's with this status, one rebooted and was lost due to "cannot get volume size for domain xxx".
Can you share the vdsm log showing these errors?
Also it may helpful to get the output of this command:
vdsm-tool dump-volume-chains SDUUID
Nir
I fear that in this current condition, should any of the other machine reboot, they too will be lost.
How can I troubleshoot this problem further, and hopefully alleviate the condition ?
Thank you for your help.
Clint _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (6)
-
Clint Boggio
-
Colin Coe
-
Markus Stockhausen
-
Nir Soffer
-
Ollie Armstrong
-
Raz Tamir