Ovirt backups lead to unresponsive VM

Hi all, I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on top glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk). Guest agents are installed at VMs. i almost always observe that during the backup of the VM the VM is rendered unresponsive (dashboard shows a question mark at the VM status and VM does not respond to ping or to anything). For scheduled backups I use: https://github.com/wefixit-AT/oVirtBackup The script does the following: 1. snapshot VM (this is done ok without any failure) 2. Clone snapshot (this steps renders the VM unresponsive) 3. Export Clone 4. Delete clone 5. Delete snapshot Do you have any similar experience? Any suggestions to address this? I have never seen such issue with hosted Linux VMs. The cluster has enough storage to accommodate the clone. Thanx, Alex

--_000_MWHPR01MB25126DB3CF5E436DB53C2895FFE60MWHPR01MB2512prod_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, We have a cluster of 17 nodes, backed by GlusterFS storage, and using this = same script for backup. we have no issues with it so far. have you checked engine log file ? -- Respectfully Mahdi A. Mahdi ________________________________ From: users-bounces@ovirt.org <users-bounces@ovirt.org> on behalf of Alex K= <rightkicktech@gmail.com> Sent: Wednesday, January 24, 2018 4:18 PM To: users Subject: [ovirt-users] Ovirt backups lead to unresponsive VM Hi all, I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on to= p glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk).= Guest agents are installed at VMs. i almost always observe that during the= backup of the VM the VM is rendered unresponsive (dashboard shows a questi= on mark at the VM status and VM does not respond to ping or to anything). For scheduled backups I use: https://github.com/wefixit-AT/oVirtBackup The script does the following: 1. snapshot VM (this is done ok without any failure) 2. Clone snapshot (this steps renders the VM unresponsive) 3. Export Clone 4. Delete clone 5. Delete snapshot Do you have any similar experience? Any suggestions to address this? I have never seen such issue with hosted Linux VMs. The cluster has enough storage to accommodate the clone. Thanx, Alex --_000_MWHPR01MB25126DB3CF5E436DB53C2895FFE60MWHPR01MB2512prod_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
<style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo= ttom:0;} </style> </head> <body dir=3D"ltr"> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0); background-color: rgba(0, 0, 0, 0);"> Hi,</div> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0); background-color: rgba(0, 0, 0, 0);"> <br> </div> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0); background-color: rgba(0, 0, 0, 0);"> We have a cluster of 17 nodes, backed by GlusterFS storage, and using this = same script for backup.</div> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0); background-color: rgba(0, 0, 0, 0);"> we have no issues with it so far.</div> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0); background-color: rgba(0, 0, 0, 0);"> have you checked engine log file ?</div> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0);"> <br> </div> <div id=3D"signature"><br> <div class=3D"ecxmoz-signature">-- <br> <br> <font color=3D"#3366ff"><font color=3D"#000000">Respectfully<b><br> </b><b>Mahdi A. Mahdi</b></font></font><font color=3D"#3366ff"><br> <br> </font><font color=3D"#3366ff"></font></div> </div> <hr style=3D"display:inline-block;width:98%" tabindex=3D"-1"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" st= yle=3D"font-size:11pt" color=3D"#000000"><b>From:</b> users-bounces@ovirt.o= rg <users-bounces@ovirt.org> on behalf of Alex K <rightkicktech@gm= ail.com><br> <b>Sent:</b> Wednesday, January 24, 2018 4:18 PM<br> <b>To:</b> users<br> <b>Subject:</b> [ovirt-users] Ovirt backups lead to unresponsive VM</font> <div> </div> </div> <div> <div dir=3D"ltr"> <div> <div> <div>Hi all, <br> <br> </div> I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on to= p glusterfs. </div> On some VMs (especially one Windows server 2016 64bit with 500 GB of disk).= Guest agents are installed at VMs. i almost always observe that during the= backup of the VM the VM is rendered unresponsive (dashboard shows a questi= on mark at the VM status and VM does not respond to ping or to anything). <br> <br> </div> For scheduled backups I use: <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><a href=3D"https://github.c= om/wefixit-AT/oVirtBackup">https://github.com/wefixit-AT/oVirtBackup</a></f= ont></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">The script does the followi= ng:<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">1. snapshot VM (this is don= e ok without any failure)<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">2. Clone snapshot (this ste= ps renders the VM unresponsive)<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">3. Export Clone</font></fon= t></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">4. Delete clone</font></fon= t></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">5. Delete snapshot</font></= font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">Do you have any similar exp= erience? Any suggestions to address this?</font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">I have never seen such issu= e with hosted Linux VMs.<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">The cluster has enough stor= age to accommodate the clone. <br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><br> </p> <p style=3D"margin-bottom:0in; line-height:100%">Thanx, <br> </p> <p style=3D"margin-bottom:0in; line-height:100%">Alex<br> </p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><b></b></font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><b><br> </b></font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><b><br> </b></font></font></p> </div> </div> </body> </html> --_000_MWHPR01MB25126DB3CF5E436DB53C2895FFE60MWHPR01MB2512prod_--

Hi, I have observed this logged at host when the issue occurs: VDSM command GetStoragePoolInfoVDS failed: Connection reset by peer or VDSM host.domain command GetStatsVDS failed: Connection reset by peer At engine logs have not been able to correlate. Are you hosting Windows 2016 server and Windows 10 VMs? The weird is that I have same setup on other clusters with no issues. Thanx, Alex On Sun, Jan 28, 2018 at 9:21 PM, Mahdi Adnan <mahdi.adnan@outlook.com> wrote:
Hi,
We have a cluster of 17 nodes, backed by GlusterFS storage, and using this same script for backup. we have no issues with it so far. have you checked engine log file ?
--
Respectfully *Mahdi A. Mahdi*
------------------------------ *From:* users-bounces@ovirt.org <users-bounces@ovirt.org> on behalf of Alex K <rightkicktech@gmail.com> *Sent:* Wednesday, January 24, 2018 4:18 PM *To:* users *Subject:* [ovirt-users] Ovirt backups lead to unresponsive VM
Hi all,
I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on top glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk). Guest agents are installed at VMs. i almost always observe that during the backup of the VM the VM is rendered unresponsive (dashboard shows a question mark at the VM status and VM does not respond to ping or to anything).
For scheduled backups I use:
https://github.com/wefixit-AT/oVirtBackup
The script does the following:
1. snapshot VM (this is done ok without any failure)
2. Clone snapshot (this steps renders the VM unresponsive)
3. Export Clone
4. Delete clone
5. Delete snapshot
Do you have any similar experience? Any suggestions to address this?
I have never seen such issue with hosted Linux VMs.
The cluster has enough storage to accommodate the clone.
Thanx,
Alex

--_000_MWHPR01MB25126B8F5E11C9E328F6276CFFE50MWHPR01MB2512prod_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I have Windows VMs, both client and server. if you provide the engine.log file we might have a look at it. -- Respectfully Mahdi A. Mahdi ________________________________ From: Alex K <rightkicktech@gmail.com> Sent: Monday, January 29, 2018 5:40 PM To: Mahdi Adnan Cc: users Subject: Re: [ovirt-users] Ovirt backups lead to unresponsive VM Hi, I have observed this logged at host when the issue occurs: VDSM command GetStoragePoolInfoVDS failed: Connection reset by peer or VDSM host.domain command GetStatsVDS failed: Connection reset by peer At engine logs have not been able to correlate. Are you hosting Windows 2016 server and Windows 10 VMs? The weird is that I have same setup on other clusters with no issues. Thanx, Alex On Sun, Jan 28, 2018 at 9:21 PM, Mahdi Adnan <mahdi.adnan@outlook.com<mailt= o:mahdi.adnan@outlook.com>> wrote: Hi, We have a cluster of 17 nodes, backed by GlusterFS storage, and using this = same script for backup. we have no issues with it so far. have you checked engine log file ? -- Respectfully Mahdi A. Mahdi ________________________________ From: users-bounces@ovirt.org<mailto:users-bounces@ovirt.org> <users-bounce= s@ovirt.org<mailto:users-bounces@ovirt.org>> on behalf of Alex K <rightkick= tech@gmail.com<mailto:rightkicktech@gmail.com>> Sent: Wednesday, January 24, 2018 4:18 PM To: users Subject: [ovirt-users] Ovirt backups lead to unresponsive VM Hi all, I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on to= p glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk).= Guest agents are installed at VMs. i almost always observe that during the= backup of the VM the VM is rendered unresponsive (dashboard shows a questi= on mark at the VM status and VM does not respond to ping or to anything). For scheduled backups I use: https://github.com/wefixit-AT/oVirtBackup The script does the following: 1. snapshot VM (this is done ok without any failure) 2. Clone snapshot (this steps renders the VM unresponsive) 3. Export Clone 4. Delete clone 5. Delete snapshot Do you have any similar experience? Any suggestions to address this? I have never seen such issue with hosted Linux VMs. The cluster has enough storage to accommodate the clone. Thanx, Alex --_000_MWHPR01MB25126B8F5E11C9E328F6276CFFE50MWHPR01MB2512prod_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
<style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo= ttom:0;} </style> </head> <body dir=3D"ltr"> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0); background-color: rgba(0, 0, 0, 0);"> I have Windows VMs, both client and server.</div> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0); background-color: rgba(0, 0, 0, 0);"> if you provide the engine.log file we might have a look at it.</div> <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;= color: rgb(0, 0, 0);"> <br> </div> <div id=3D"signature"><br> <div class=3D"ecxmoz-signature">-- <br> <br> <font color=3D"#3366ff"><font color=3D"#000000">Respectfully<b><br> </b><b>Mahdi A. Mahdi</b></font></font><font color=3D"#3366ff"><br> <br> </font><font color=3D"#3366ff"></font></div> </div> <hr style=3D"display:inline-block;width:98%" tabindex=3D"-1"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" st= yle=3D"font-size:11pt" color=3D"#000000"><b>From:</b> Alex K <rightkickt= ech@gmail.com><br> <b>Sent:</b> Monday, January 29, 2018 5:40 PM<br> <b>To:</b> Mahdi Adnan<br> <b>Cc:</b> users<br> <b>Subject:</b> Re: [ovirt-users] Ovirt backups lead to unresponsive VM</fo= nt> <div> </div> </div> <div> <div dir=3D"ltr"> <div> <div> <div> <div>Hi, <br> </div> <div><br> </div> <div>I have observed this logged at host when the issue occurs: <br> </div> <div><br> </div> <div>VDSM command GetStoragePoolInfoVDS failed: Connection reset by peer</d= iv> <div><br> </div> <div>or <br> </div> <div><br> </div> <div>VDSM host.domain command GetStatsVDS failed: Connection reset by peer<= br> </div> <div><br> </div> <div>At engine logs have not been able to correlate. <br> </div> <div><br> </div> Are you hosting Windows 2016 server and Windows 10 VMs?<br> </div> The weird is that I have same setup on other clusters with no issues. <br> <br> </div> Thanx, <br> </div> Alex<br> </div> <div class=3D"x_gmail_extra"><br> <div class=3D"x_gmail_quote">On Sun, Jan 28, 2018 at 9:21 PM, Mahdi Adnan <= span dir=3D"ltr"> <<a href=3D"mailto:mahdi.adnan@outlook.com" target=3D"_blank">mahdi.adna= n@outlook.com</a>></span> wrote:<br> <blockquote class=3D"x_gmail_quote" style=3D"margin:0 0 0 .8ex; border-left= :1px #ccc solid; padding-left:1ex"> <div dir=3D"ltr"> <div style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:12pt; col= or:rgb(0,0,0); background-color:rgba(0,0,0,0)"> Hi,</div> <div style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:12pt; col= or:rgb(0,0,0); background-color:rgba(0,0,0,0)"> <br> </div> <div style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:12pt; col= or:rgb(0,0,0); background-color:rgba(0,0,0,0)"> We have a cluster of 17 nodes, backed by GlusterFS storage, and using this = same script for backup.</div> <div style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:12pt; col= or:rgb(0,0,0); background-color:rgba(0,0,0,0)"> we have no issues with it so far.</div> <div style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:12pt; col= or:rgb(0,0,0); background-color:rgba(0,0,0,0)"> have you checked engine log file ?</div> <span class=3D"x_HOEnZb"><font color=3D"#888888"> <div style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:12pt; col= or:rgb(0,0,0)"> <br> </div> <div id=3D"x_m_2991573719749610803signature"><br> <div class=3D"x_m_2991573719749610803ecxmoz-signature">-- <br> <br> <font color=3D"#3366ff"><font color=3D"#000000">Respectfully<b><br> </b><b>Mahdi A. Mahdi</b></font></font><font color=3D"#3366ff"><br> <br> </font><font color=3D"#3366ff"></font></div> </div> <hr style=3D"display:inline-block; width:98%"> <div id=3D"x_m_2991573719749610803divRplyFwdMsg" dir=3D"ltr"><font color=3D= "#000000" face=3D"Calibri, sans-serif" style=3D"font-size:11pt"><b>From:</b=
<a href=3D"mailto:users-bounces@ovirt.org" target=3D"_blank">users-bounces@= ovirt.org</a> <<a href=3D"mailto:users-bounces@ovirt.org" target=3D"_bla= nk">users-bounces@ovirt.org</a>> on behalf of Alex K <<a href=3D"mail= to:rightkicktech@gmail.com" target=3D"_blank">rightkicktech@gmail.com</a>&g= t;<br> <b>Sent:</b> Wednesday, January 24, 2018 4:18 PM<br> <b>To:</b> users<br> <b>Subject:</b> [ovirt-users] Ovirt backups lead to unresponsive VM</font> <div> </div> </div> </font></span> <div> <div class=3D"x_h5"> <div> <div dir=3D"ltr"> <div> <div> <div>Hi all, <br> <br> </div> I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on to= p glusterfs. </div> On some VMs (especially one Windows server 2016 64bit with 500 GB of disk).= Guest agents are installed at VMs. i almost always observe that during the= backup of the VM the VM is rendered unresponsive (dashboard shows a questi= on mark at the VM status and VM does not respond to ping or to anything). <br> <br> </div> For scheduled backups I use: <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><a href=3D"https://github.c= om/wefixit-AT/oVirtBackup" target=3D"_blank">https://github.com/wefixit-AT/= <wbr>oVirtBackup</a></font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">The script does the followi= ng:<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">1. snapshot VM (this is don= e ok without any failure)<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">2. Clone snapshot (this ste= ps renders the VM unresponsive)<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">3. Export Clone</font></fon= t></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">4. Delete clone</font></fon= t></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">5. Delete snapshot</font></= font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">Do you have any similar exp= erience? Any suggestions to address this?</font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">I have never seen such issu= e with hosted Linux VMs.<br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt">The cluster has enough stor= age to accommodate the clone. <br> </font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><br> </p> <p style=3D"margin-bottom:0in; line-height:100%">Thanx, <br> </p> <p style=3D"margin-bottom:0in; line-height:100%">Alex<br> </p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><b></b></font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><b><br> </b></font></font></p> <p style=3D"margin-bottom:0in; line-height:100%"><font face=3D"Courier 10 P= itch"><font size=3D"2" style=3D"font-size:10pt"><b><br> </b></font></font></p> </div> </div> </div> </div> </div> </blockquote> </div> <br> </div> </div> </body> </html> --_000_MWHPR01MB25126B8F5E11C9E328F6276CFFE50MWHPR01MB2512prod_--

On Wed, Jan 24, 2018 at 3:19 PM Alex K <rightkicktech@gmail.com> wrote:
Hi all,
I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on top glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk). Guest agents are installed at VMs. i almost always observe that during the backup of the VM the VM is rendered unresponsive (dashboard shows a question mark at the VM status and VM does not respond to ping or to anything).
For scheduled backups I use:
https://github.com/wefixit-AT/oVirtBackup
The script does the following:
1. snapshot VM (this is done ok without any failure)
This is a very cheap operation
2. Clone snapshot (this steps renders the VM unresponsive)
This copy 500g of data. In gluster case, it copies 1500g of data, since in glusterfs, the client is doing the replication. Maybe your network or gluster server is too slow? Can you describe the network topology? Please attach also the volume info for the gluster volume, maybe it is not configured in the best way?
3. Export Clone
This copy 500g to the export domain. If the export domain is on glusterfs as well, you copy now another 1500g of data.
4. Delete clone
5. Delete snapshot
Not clear why do you need to clone the vm before you export it, you can save half of the data copies. If you 4.2, you can backup the vm *while the vm is running* by: - Take a snapshot - Get the vm ovf from the engine api - Download the vm disks using ovirt-imageio and store the snaphosts in your backup storage - Delete a snapshot In this flow, you would copy 500g. Daniel, please correct me if I'm wrong regarding doing this online. Regardless, a vm should not become non-responsive while cloning. Please file a bug for this and attach engine, vdsm, and glusterfs logs. Nir Do you have any similar experience? Any suggestions to address this?
I have never seen such issue with hosted Linux VMs.
The cluster has enough storage to accommodate the clone.
Thanx,
Alex
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Thank you Nir for the below. I am putting some comments inline in blue. On Tue, Feb 13, 2018 at 7:33 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Jan 24, 2018 at 3:19 PM Alex K <rightkicktech@gmail.com> wrote:
Hi all,
I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on top glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk). Guest agents are installed at VMs. i almost always observe that during the backup of the VM the VM is rendered unresponsive (dashboard shows a question mark at the VM status and VM does not respond to ping or to anything).
For scheduled backups I use:
https://github.com/wefixit-AT/oVirtBackup
The script does the following:
1. snapshot VM (this is done ok without any failure)
This is a very cheap operation
2. Clone snapshot (this steps renders the VM unresponsive)
This copy 500g of data. In gluster case, it copies 1500g of data, since in glusterfs, the client is doing the replication.
Maybe your network or gluster server is too slow? Can you describe the network topology?
Please attach also the volume info for the gluster volume, maybe it is not configured in the best way?
The network is 1Gbit. The hosts (3 hosts) are decent ones and new hardware with each host having: 32GB RAM, 16 CPU cores and 2 TB of storage in RAID10. The VMS hosted (7 VMs) exhibit high performance. The VMs are Windows 2016 and Windows10. The network topology is: two networks defined at ovirt: ovirtmgmt is for the managment and access network and "storage" is a separate network, where each server is connected with two network cables at a managed switch with mode 6 load balancing. this storage network is used for gluster traffic. Attached the volume configuration.
3. Export Clone
This copy 500g to the export domain. If the export domain is on glusterfs as well, you copy now another 1500g of data.
Export domain a Synology NAS with NFS share. If the cloning succeeds then export is completed ok.
4. Delete clone
5. Delete snapshot
Not clear why do you need to clone the vm before you export it, you can save half of the data copies.
Because I cannot export the VM while it is running. It does not provide such option.
If you 4.2, you can backup the vm *while the vm is running* by: - Take a snapshot - Get the vm ovf from the engine api - Download the vm disks using ovirt-imageio and store the snaphosts in your backup storage - Delete a snapshot
In this flow, you would copy 500g.
I am not aware about this option. checking quickly at site this seems that
it is still half implemented? Is there any script that I may use and test this? I am interested to have these backups scheduled.
Daniel, please correct me if I'm wrong regarding doing this online.
Regardless, a vm should not become non-responsive while cloning. Please file a bug for this and attach engine, vdsm, and glusterfs logs.
Nir
Do you have any similar experience? Any suggestions to address this?
I have never seen such issue with hosted Linux VMs.
The cluster has enough storage to accommodate the clone.
Thanx,
Alex
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi all, Are there any examples on using ovirt-imageio to backup a VM or where I could find details of RESTAPI for this functionality? I might attempt to write a python script for this purpose. Thanx, Alex On Tue, Feb 13, 2018 at 8:59 PM, Alex K <rightkicktech@gmail.com> wrote:
Thank you Nir for the below.
I am putting some comments inline in blue.
On Tue, Feb 13, 2018 at 7:33 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Jan 24, 2018 at 3:19 PM Alex K <rightkicktech@gmail.com> wrote:
Hi all,
I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on top glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk). Guest agents are installed at VMs. i almost always observe that during the backup of the VM the VM is rendered unresponsive (dashboard shows a question mark at the VM status and VM does not respond to ping or to anything).
For scheduled backups I use:
https://github.com/wefixit-AT/oVirtBackup
The script does the following:
1. snapshot VM (this is done ok without any failure)
This is a very cheap operation
2. Clone snapshot (this steps renders the VM unresponsive)
This copy 500g of data. In gluster case, it copies 1500g of data, since in glusterfs, the client is doing the replication.
Maybe your network or gluster server is too slow? Can you describe the network topology?
Please attach also the volume info for the gluster volume, maybe it is not configured in the best way?
The network is 1Gbit. The hosts (3 hosts) are decent ones and new hardware with each host having: 32GB RAM, 16 CPU cores and 2 TB of storage in RAID10. The VMS hosted (7 VMs) exhibit high performance. The VMs are Windows 2016 and Windows10. The network topology is: two networks defined at ovirt: ovirtmgmt is for the managment and access network and "storage" is a separate network, where each server is connected with two network cables at a managed switch with mode 6 load balancing. this storage network is used for gluster traffic. Attached the volume configuration.
3. Export Clone
This copy 500g to the export domain. If the export domain is on glusterfs as well, you copy now another 1500g of data.
Export domain a Synology NAS with NFS share. If the cloning succeeds then export is completed ok.
4. Delete clone
5. Delete snapshot
Not clear why do you need to clone the vm before you export it, you can save half of the data copies.
Because I cannot export the VM while it is running. It does not provide such option.
If you 4.2, you can backup the vm *while the vm is running* by: - Take a snapshot - Get the vm ovf from the engine api - Download the vm disks using ovirt-imageio and store the snaphosts in your backup storage - Delete a snapshot
In this flow, you would copy 500g.
I am not aware about this option. checking quickly at site this seems
that it is still half implemented? Is there any script that I may use and test this? I am interested to have these backups scheduled.
Daniel, please correct me if I'm wrong regarding doing this online.
Regardless, a vm should not become non-responsive while cloning. Please file a bug for this and attach engine, vdsm, and glusterfs logs.
Nir
Do you have any similar experience? Any suggestions to address this?
I have never seen such issue with hosted Linux VMs.
The cluster has enough storage to accommodate the clone.
Thanx,
Alex
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Sun, Feb 18, 2018 at 8:04 PM Alex K <rightkicktech@gmail.com> wrote:
Are there any examples on using ovirt-imageio to backup a VM or where I could find details of RESTAPI for this functionality? I might attempt to write a python script for this purpose.
Here: - https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/download_... - https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/upload_di... You probably need to add the vm configuration to complete the backup.
Thanx, Alex
On Tue, Feb 13, 2018 at 8:59 PM, Alex K <rightkicktech@gmail.com> wrote:
Thank you Nir for the below.
I am putting some comments inline in blue.
On Tue, Feb 13, 2018 at 7:33 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Wed, Jan 24, 2018 at 3:19 PM Alex K <rightkicktech@gmail.com> wrote:
Hi all,
I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on top glusterfs. On some VMs (especially one Windows server 2016 64bit with 500 GB of disk). Guest agents are installed at VMs. i almost always observe that during the backup of the VM the VM is rendered unresponsive (dashboard shows a question mark at the VM status and VM does not respond to ping or to anything).
For scheduled backups I use:
https://github.com/wefixit-AT/oVirtBackup
The script does the following:
1. snapshot VM (this is done ok without any failure)
This is a very cheap operation
2. Clone snapshot (this steps renders the VM unresponsive)
This copy 500g of data. In gluster case, it copies 1500g of data, since in glusterfs, the client is doing the replication.
Maybe your network or gluster server is too slow? Can you describe the network topology?
Please attach also the volume info for the gluster volume, maybe it is not configured in the best way?
The network is 1Gbit. The hosts (3 hosts) are decent ones and new hardware with each host having: 32GB RAM, 16 CPU cores and 2 TB of storage in RAID10. The VMS hosted (7 VMs) exhibit high performance. The VMs are Windows 2016 and Windows10. The network topology is: two networks defined at ovirt: ovirtmgmt is for the managment and access network and "storage" is a separate network, where each server is connected with two network cables at a managed switch with mode 6 load balancing. this storage network is used for gluster traffic. Attached the volume configuration.
3. Export Clone
This copy 500g to the export domain. If the export domain is on glusterfs as well, you copy now another 1500g of data.
Export domain a Synology NAS with NFS share. If the cloning succeeds then export is completed ok.
4. Delete clone
5. Delete snapshot
Not clear why do you need to clone the vm before you export it, you can save half of the data copies.
Because I cannot export the VM while it is running. It does not provide such option.
If you 4.2, you can backup the vm *while the vm is running* by: - Take a snapshot - Get the vm ovf from the engine api - Download the vm disks using ovirt-imageio and store the snaphosts in your backup storage - Delete a snapshot
In this flow, you would copy 500g.
I am not aware about this option. checking quickly at site this seems
that it is still half implemented? Is there any script that I may use and test this? I am interested to have these backups scheduled.
Daniel, please correct me if I'm wrong regarding doing this online.
Regardless, a vm should not become non-responsive while cloning. Please file a bug for this and attach engine, vdsm, and glusterfs logs.
Nir
Do you have any similar experience? Any suggestions to address this?
I have never seen such issue with hosted Linux VMs.
The cluster has enough storage to accommodate the clone.
Thanx,
Alex
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (3)
-
Alex K
-
Mahdi Adnan
-
Nir Soffer