Workflow after restoring engine from backup

--_000_831f30ed018b4739a2491cbd24f2429depsaero_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi All, I had issue with the storage that hosted my engine vm. The disk got corrupt= ed and I needed to restore the engine from a backup. That worked as expecte= d, I just didn't start the engine yet. I know that after the backup was tak= en some machines where migrated around before the engine disks failed. My q= uestion is what will happen once I start the engine service which has the r= estored backup on it ? Will it query the hosts for the running VMs or will = it assume that the VMs are still on the hosts as they resided at the point = of backup ? Would I need to change the DB manual to let the engine know whe= re VMs are up at this point ? What will happen to HA VMs ? I feel that it m= ight try to start them a second time. My biggest issue is that I can't get= a service Windows to shutdown all VMs and then lat them restart by the eng= ine. Is there a known workflow for that ? Thank you, Sven --_000_831f30ed018b4739a2491cbd24f2429depsaero_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr= osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:= //www.w3.org/TR/REC-html40"> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-fareast-language:EN-US;} a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;} span.E-MailFormatvorlage17 {mso-style-type:personal-compose; font-family:"Calibri","sans-serif"; color:windowtext;} .MsoChpDefault {mso-style-type:export-only; font-family:"Calibri","sans-serif"; mso-fareast-language:EN-US;} @page WordSection1 {size:612.0pt 792.0pt; margin:70.85pt 70.85pt 2.0cm 70.85pt;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--> </head> <body lang=3D"DE" link=3D"#0563C1" vlink=3D"#954F72"> <div class=3D"WordSection1"> <p class=3D"MsoNormal">Hi All, <o:p></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">I had issue with the storage th= at hosted my engine vm. The disk got corrupted and I needed to restore the = engine from a backup. That worked as expected, I just didn’t start th= e engine yet. I know that after the backup was taken some machines where migrated around before the engine disks fail= ed. My question is what will happen once I start the engine service which h= as the restored backup on it ? Will it query the hosts for the running VMs = or will it assume that the VMs are still on the hosts as they resided at the point of backup ? Would I need t= o change the DB manual to let the engine know where VMs are up at this poin= t ? What will happen to HA VMs ? I feel that it might try to start them a s= econd time. My biggest issue is that I can’t get a service Windows to shutdown all VMs and then lat = them restart by the engine.<o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Is there a known workflow for t= hat ? <o:p> </o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Thank you, <o:p></o:p></span></= p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Sven <o:p></o:p></span></p> </div> </body> </html> --_000_831f30ed018b4739a2491cbd24f2429depsaero_--

On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi All,
I had issue with the storage that hosted my engine vm. The disk got corrupted and I needed to restore the engine from a backup.
How did you backup, and how did you restore? Which version was used for each?
That worked as expected, I just didn’t start the engine yet.
OK.
I know that after the backup was taken some machines where migrated around before the engine disks failed.
Are these machines HA?
My question is what will happen once I start the engine service which has the restored backup on it ? Will it query the hosts for the running VMs
It will, but HA machines are handled differently. See also: https://bugzilla.redhat.com/show_bug.cgi?id=1441322 https://bugzilla.redhat.com/show_bug.cgi?id=1446055
or will it assume that the VMs are still on the hosts as they resided at the point of backup ?
It does, initially, but then updates status according to what it gets from hosts. But polling the hosts takes time, especially if you have many, and HA policy might require faster handling. So if it polls first a host that had a machine on it during backup, and sees that it's gone, and didn't yet poll the new host, HA handling starts immediately, which eventually might lead to starting the VM on another host. To prevent that, the fixes to above bugs make the restore process mark HA VMs that do not have leases on the storage as "dead".
Would I need to change the DB manual to let the engine know where VMs are up at this point ?
You might need to, if you have HA VMs and a too-old version of restore.
What will happen to HA VMs ? I feel that it might try to start them a second time. My biggest issue is that I can’t get a service Windows to shutdown all VMs and then lat them restart by the engine.
Is there a known workflow for that ?
I am not aware of a tested procedure for handling above if you have a too-old version, but you can check the patches linked from above bugs and manually run the SQL command(s) they include. They are essentially comment 4 of the first bug. Good luck and best regards, -- Didi

--_000_fda73b8eac2340ef927a807d4e72fbdbepsaero_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Didi, my backups where taken with the end. Backup utility. I have 3 Data centers,= two of them with just one host and the third one with 3 hosts running the = engine. The backup three days old, was taken on engine version 4.1 (4.1.7)= and the restored engine is running on 4.1.9. I have three HA VMs that woul= d be affected. All others are just normal vms. Sounds like it would be the = safest to shut down the HA vm S to make sure that nothing happens ? Or can = I disable the HA action in the DB for now ? Thank you, Sven Von meinem Samsung Galaxy Smartphone gesendet. -------- Urspr=FCngliche Nachricht -------- Von: Yedidyah Bar David <didi@redhat.com> Datum: 19.03.18 07:33 (GMT+01:00) An: Sven Achtelik <Sven.Achtelik@eps.aero> Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik <Sven.Achtelik@eps.aero> wr= ote:
Hi All,
I had issue with the storage that hosted my engine vm. The disk got corrupted and I needed to restore the engine from a backup.
How did you backup, and how did you restore? Which version was used for each?
That worked as expected, I just didn=92t start the engine yet.
OK.
I know that after the backup was taken some machines where migrated around before the engine disks failed.
Are these machines HA?
My question is what will happen once I start the engine service which has the restored backup on it ? Will it query the hosts for the running VMs
It will, but HA machines are handled differently. See also: https://bugzilla.redhat.com/show_bug.cgi?id=3D1441322 https://bugzilla.redhat.com/show_bug.cgi?id=3D1446055
or will it assume that the VMs are still on the hosts as they resided at the point of backup ?
It does, initially, but then updates status according to what it gets from hosts. But polling the hosts takes time, especially if you have many, and HA policy might require faster handling. So if it polls first a host that had a machine on it during backup, and sees that it's gone, and didn't yet poll the new host, HA handling starts immediately, which eventually might lead to starting the VM on another host. To prevent that, the fixes to above bugs make the restore process mark HA VMs that do not have leases on the storage as "dead".
Would I need to change the DB manual to let the engine know where VMs are up at this point ?
You might need to, if you have HA VMs and a too-old version of restore.
What will happen to HA VMs ? I feel that it might try to start them a second time. My biggest issue= is that I can=92t get a service Windows to shutdown all VMs and then lat the= m restart by the engine.
Is there a known workflow for that ?
I am not aware of a tested procedure for handling above if you have a too-old version, but you can check the patches linked from above bugs and manually run the SQL command(s) they include. They are essentially comment 4 of the first bug. Good luck and best regards, -- Didi --_000_fda73b8eac2340ef927a807d4e72fbdbepsaero_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> <meta name=3D"Generator" content=3D"Microsoft Exchange Server"> <!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; pad= ding-left: 4pt; border-left: #800000 2px solid; } --></style> </head> <body> <div> <div>Hi Didi,</div> <div><br> </div> <div>my backups where taken with the end. Backup utility. I have 3 Data cen= ters, two of them with just one host and the third one with 3 hosts running= the engine. The backup three days old, was taken on engine version 4= .1 (4.1.7) and the restored engine is running on 4.1.9. I have three HA VMs that would be affected. All others a= re just normal vms. Sounds like it would be the safest to shut down the HA = vm S to make sure that nothing happens ? Or can I disable the HA action in = the DB for now ?</div> <div><br> </div> <div>Thank you,</div> <div><br> </div> <div>Sven</div> <div><br> </div> <div><br> </div> <div><br> </div> <div id=3D"x_composer_signature"> <div dir=3D"auto" style=3D"font-size:85%; color:#575757">Von meinem Samsung= Galaxy Smartphone gesendet.</div> </div> <div><br> </div> <div><br> </div> <div>-------- Urspr=FCngliche Nachricht --------</div> <div>Von: Yedidyah Bar David <didi@redhat.com> </div> <div>Datum: 19.03.18 07:33 (GMT+01:00) </div> <div>An: Sven Achtelik <Sven.Achtelik@eps.aero> </div> <div>Cc: users@ovirt.org </div> <div>Betreff: Re: [ovirt-users] Workflow after restoring engine from backup= </div> <div><br> </div> </div> <font size=3D"2"><span style=3D"font-size:10pt;"> <div class=3D"PlainText">On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik &l= t;Sven.Achtelik@eps.aero> wrote:<br> > Hi All,<br> ><br> ><br> ><br> > I had issue with the storage that hosted my engine vm. The disk got<br=
> corrupted and I needed to restore the engine from a backup.<br> <br> How did you backup, and how did you restore?<br> <br> Which version was used for each?<br> <br> > That worked as<br> > expected, I just didn=92t start the engine yet.<br> <br> OK.<br> <br> > I know that after the backup<br> > was taken some machines where migrated around before the engine disks<= br> > failed.<br> <br> Are these machines HA?<br> <br> > My question is what will happen once I start the engine service<br> > which has the restored backup on it ? Will it query the hosts for the<= br> > running VMs<br> <br> It will, but HA machines are handled differently.<br> <br> See also:<br> <br> <a href=3D"https://bugzilla.redhat.com/show_bug.cgi?id=3D1441322">https://b= ugzilla.redhat.com/show_bug.cgi?id=3D1441322</a><br> <a href=3D"https://bugzilla.redhat.com/show_bug.cgi?id=3D1446055">https://b= ugzilla.redhat.com/show_bug.cgi?id=3D1446055</a><br> <br> > or will it assume that the VMs are still on the hosts as they<br> > resided at the point of backup ?<br> <br> It does, initially, but then updates status according to what it<br> gets from hosts.<br> <br> But polling the hosts takes time, especially if you have many, and<br> HA policy might require faster handling. So if it polls first a<br> host that had a machine on it during backup, and sees that it's<br> gone, and didn't yet poll the new host, HA handling starts immediately,<br> which eventually might lead to starting the VM on another host.<br> <br> To prevent that, the fixes to above bugs make the restore process<br> mark HA VMs that do not have leases on the storage as "dead".<br> <br> > Would I need to change the DB manual to let<br> > the engine know where VMs are up at this point ?<br> <br> You might need to, if you have HA VMs and a too-old version of restore.<br> <br> > What will happen to HA VMs<br> > ? I feel that it might try to start them a second time. My bigge= st issue is<br> > that I can=92t get a service Windows to shutdown all VMs and then lat = them<br> > restart by the engine.<br> ><br> ><br> ><br> > Is there a known workflow for that ?<br> <br> I am not aware of a tested procedure for handling above if you have<br> a too-old version, but you can check the patches linked from above bugs<br> and manually run the SQL command(s) they include. They are essentially<br> comment 4 of the first bug.<br> <br> Good luck and best regards,<br> -- <br> Didi<br> </div> </span></font> </body> </html> --_000_fda73b8eac2340ef927a807d4e72fbdbepsaero_--

On Mon, Mar 19, 2018 at 11:03 AM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi Didi,
my backups where taken with the end. Backup utility. I have 3 Data centers, two of them with just one host and the third one with 3 hosts running the engine. The backup three days old, was taken on engine version 4.1 (4.1.7) and the restored engine is running on 4.1.9.
Since the bug I mentioned was fixed in 4.1.3, you should be covered.
I have three HA VMs that would be affected. All others are just normal vms. Sounds like it would be the safest to shut down the HA vm S to make sure that nothing happens ?
If you can have downtime, I agree it sounds safer to shutdown the VMs.
Or can I disable the HA action in the DB for now ?
No need to. If you restored with 4.1.9 engine-backup, it should have done this for you. If you still have the restore log, you can verify this by checking it. It should contain 'Resetting HA VM status', and then the result of the sql that it ran. Best regards,
Thank you,
Sven
Von meinem Samsung Galaxy Smartphone gesendet.
-------- Ursprüngliche Nachricht -------- Von: Yedidyah Bar David <didi@redhat.com> Datum: 19.03.18 07:33 (GMT+01:00) An: Sven Achtelik <Sven.Achtelik@eps.aero> Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi All,
I had issue with the storage that hosted my engine vm. The disk got corrupted and I needed to restore the engine from a backup.
How did you backup, and how did you restore?
Which version was used for each?
That worked as expected, I just didn’t start the engine yet.
OK.
I know that after the backup was taken some machines where migrated around before the engine disks failed.
Are these machines HA?
My question is what will happen once I start the engine service which has the restored backup on it ? Will it query the hosts for the running VMs
It will, but HA machines are handled differently.
See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1441322 https://bugzilla.redhat.com/show_bug.cgi?id=1446055
or will it assume that the VMs are still on the hosts as they resided at the point of backup ?
It does, initially, but then updates status according to what it gets from hosts.
But polling the hosts takes time, especially if you have many, and HA policy might require faster handling. So if it polls first a host that had a machine on it during backup, and sees that it's gone, and didn't yet poll the new host, HA handling starts immediately, which eventually might lead to starting the VM on another host.
To prevent that, the fixes to above bugs make the restore process mark HA VMs that do not have leases on the storage as "dead".
Would I need to change the DB manual to let the engine know where VMs are up at this point ?
You might need to, if you have HA VMs and a too-old version of restore.
What will happen to HA VMs ? I feel that it might try to start them a second time. My biggest issue is that I can’t get a service Windows to shutdown all VMs and then lat them restart by the engine.
Is there a known workflow for that ?
I am not aware of a tested procedure for handling above if you have a too-old version, but you can check the patches linked from above bugs and manually run the SQL command(s) they include. They are essentially comment 4 of the first bug.
Good luck and best regards, -- Didi
-- Didi

It looks like I can't get a chance to shut down the HA VMs. I check the restore log and it did mention that it change the HA-VM entries. Just to make sure I looked at the DB and for the vms in question it looks like this. engine=# select vm_guid,status,vm_host,exit_status,exit_reason from vm_dynamic Where vm_guid IN (SELECT vm_guid FROM vm_static WHERE auto_startup='t' AND lease_sd_id is NULL); vm_guid | status | vm_host | exit_status | exit_reason --------------------------------------+--------+-----------------+-------------+------------- 8733d4a6-0844-xxxx-804f-6b919e93e076 | 0 | DXXXX | 2 | -1 4eeaa622-17f9-xxxx-b99a-cddb3ad942de | 0 | xxxxAPP | 2 | -1 fbbdc0a0-23a4-4d32-xxxx-a35c59eb790d | 0 | xxxxDB0 | 2 | -1 45a4e7ce-19a9-4db9-xxxxx-66bd1b9d83af | 0 | xxxxxWOR | 2 | -1 (4 rows) Should that be enough to have a safe start of the engine without any HA action kicking in. ? -----Ursprüngliche Nachricht----- Von: Yedidyah Bar David [mailto:didi@redhat.com] Gesendet: Montag, 19. März 2018 10:18 An: Sven Achtelik Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup On Mon, Mar 19, 2018 at 11:03 AM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi Didi,
my backups where taken with the end. Backup utility. I have 3 Data centers, two of them with just one host and the third one with 3 hosts running the engine. The backup three days old, was taken on engine version 4.1 (4.1.7) and the restored engine is running on 4.1.9.
Since the bug I mentioned was fixed in 4.1.3, you should be covered.
I have three HA VMs that would be affected. All others are just normal vms. Sounds like it would be the safest to shut down the HA vm S to make sure that nothing happens ?
If you can have downtime, I agree it sounds safer to shutdown the VMs.
Or can I disable the HA action in the DB for now ?
No need to. If you restored with 4.1.9 engine-backup, it should have done this for you. If you still have the restore log, you can verify this by checking it. It should contain 'Resetting HA VM status', and then the result of the sql that it ran. Best regards,
Thank you,
Sven
Von meinem Samsung Galaxy Smartphone gesendet.
-------- Ursprüngliche Nachricht -------- Von: Yedidyah Bar David <didi@redhat.com> Datum: 19.03.18 07:33 (GMT+01:00) An: Sven Achtelik <Sven.Achtelik@eps.aero> Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi All,
I had issue with the storage that hosted my engine vm. The disk got corrupted and I needed to restore the engine from a backup.
How did you backup, and how did you restore?
Which version was used for each?
That worked as expected, I just didn’t start the engine yet.
OK.
I know that after the backup was taken some machines where migrated around before the engine disks failed.
Are these machines HA?
My question is what will happen once I start the engine service which has the restored backup on it ? Will it query the hosts for the running VMs
It will, but HA machines are handled differently.
See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1441322 https://bugzilla.redhat.com/show_bug.cgi?id=1446055
or will it assume that the VMs are still on the hosts as they resided at the point of backup ?
It does, initially, but then updates status according to what it gets from hosts.
But polling the hosts takes time, especially if you have many, and HA policy might require faster handling. So if it polls first a host that had a machine on it during backup, and sees that it's gone, and didn't yet poll the new host, HA handling starts immediately, which eventually might lead to starting the VM on another host.
To prevent that, the fixes to above bugs make the restore process mark HA VMs that do not have leases on the storage as "dead".
Would I need to change the DB manual to let the engine know where VMs are up at this point ?
You might need to, if you have HA VMs and a too-old version of restore.
What will happen to HA VMs ? I feel that it might try to start them a second time. My biggest issue is that I can’t get a service Windows to shutdown all VMs and then lat them restart by the engine.
Is there a known workflow for that ?
I am not aware of a tested procedure for handling above if you have a too-old version, but you can check the patches linked from above bugs and manually run the SQL command(s) they include. They are essentially comment 4 of the first bug.
Good luck and best regards, -- Didi
-- Didi

On Fri, Mar 23, 2018 at 10:35 AM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
It looks like I can't get a chance to shut down the HA VMs. I check the restore log and it did mention that it change the HA-VM entries. Just to make sure I looked at the DB and for the vms in question it looks like this.
engine=# select vm_guid,status,vm_host,exit_status,exit_reason from vm_dynamic Where vm_guid IN (SELECT vm_guid FROM vm_static WHERE auto_startup='t' AND lease_sd_id is NULL); vm_guid | status | vm_host | exit_status | exit_reason --------------------------------------+--------+-----------------+-------------+------------- 8733d4a6-0844-xxxx-804f-6b919e93e076 | 0 | DXXXX | 2 | -1 4eeaa622-17f9-xxxx-b99a-cddb3ad942de | 0 | xxxxAPP | 2 | -1 fbbdc0a0-23a4-4d32-xxxx-a35c59eb790d | 0 | xxxxDB0 | 2 | -1 45a4e7ce-19a9-4db9-xxxxx-66bd1b9d83af | 0 | xxxxxWOR | 2 | -1 (4 rows)
Should that be enough to have a safe start of the engine without any HA action kicking in. ?
Looks ok, but check also run_on_vds and migrating_to_vds. See also bz 1446055. Best regards,
-----Ursprüngliche Nachricht----- Von: Yedidyah Bar David [mailto:didi@redhat.com] Gesendet: Montag, 19. März 2018 10:18 An: Sven Achtelik Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
On Mon, Mar 19, 2018 at 11:03 AM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi Didi,
my backups where taken with the end. Backup utility. I have 3 Data centers, two of them with just one host and the third one with 3 hosts running the engine. The backup three days old, was taken on engine version 4.1 (4.1.7) and the restored engine is running on 4.1.9.
Since the bug I mentioned was fixed in 4.1.3, you should be covered.
I have three HA VMs that would be affected. All others are just normal vms. Sounds like it would be the safest to shut down the HA vm S to make sure that nothing happens ?
If you can have downtime, I agree it sounds safer to shutdown the VMs.
Or can I disable the HA action in the DB for now ?
No need to. If you restored with 4.1.9 engine-backup, it should have done this for you. If you still have the restore log, you can verify this by checking it. It should contain 'Resetting HA VM status', and then the result of the sql that it ran.
Best regards,
Thank you,
Sven
Von meinem Samsung Galaxy Smartphone gesendet.
-------- Ursprüngliche Nachricht -------- Von: Yedidyah Bar David <didi@redhat.com> Datum: 19.03.18 07:33 (GMT+01:00) An: Sven Achtelik <Sven.Achtelik@eps.aero> Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi All,
I had issue with the storage that hosted my engine vm. The disk got corrupted and I needed to restore the engine from a backup.
How did you backup, and how did you restore?
Which version was used for each?
That worked as expected, I just didn’t start the engine yet.
OK.
I know that after the backup was taken some machines where migrated around before the engine disks failed.
Are these machines HA?
My question is what will happen once I start the engine service which has the restored backup on it ? Will it query the hosts for the running VMs
It will, but HA machines are handled differently.
See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1441322 https://bugzilla.redhat.com/show_bug.cgi?id=1446055
or will it assume that the VMs are still on the hosts as they resided at the point of backup ?
It does, initially, but then updates status according to what it gets from hosts.
But polling the hosts takes time, especially if you have many, and HA policy might require faster handling. So if it polls first a host that had a machine on it during backup, and sees that it's gone, and didn't yet poll the new host, HA handling starts immediately, which eventually might lead to starting the VM on another host.
To prevent that, the fixes to above bugs make the restore process mark HA VMs that do not have leases on the storage as "dead".
Would I need to change the DB manual to let the engine know where VMs are up at this point ?
You might need to, if you have HA VMs and a too-old version of restore.
What will happen to HA VMs ? I feel that it might try to start them a second time. My biggest issue is that I can’t get a service Windows to shutdown all VMs and then lat them restart by the engine.
Is there a known workflow for that ?
I am not aware of a tested procedure for handling above if you have a too-old version, but you can check the patches linked from above bugs and manually run the SQL command(s) they include. They are essentially comment 4 of the first bug.
Good luck and best regards, -- Didi
-- Didi
-- Didi

I did look at this, for the VMs in question there are no entries on the run_on_vds and migrating_to_vds fields. I'm thinking of giving this a try. -----Ursprüngliche Nachricht----- Von: Yedidyah Bar David [mailto:didi@redhat.com] Gesendet: Sonntag, 25. März 2018 07:46 An: Sven Achtelik Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup On Fri, Mar 23, 2018 at 10:35 AM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
It looks like I can't get a chance to shut down the HA VMs. I check the restore log and it did mention that it change the HA-VM entries. Just to make sure I looked at the DB and for the vms in question it looks like this.
engine=# select vm_guid,status,vm_host,exit_status,exit_reason from vm_dynamic Where vm_guid IN (SELECT vm_guid FROM vm_static WHERE auto_startup='t' AND lease_sd_id is NULL); vm_guid | status | vm_host | exit_status | exit_reason --------------------------------------+--------+-----------------+-------------+------------- 8733d4a6-0844-xxxx-804f-6b919e93e076 | 0 | DXXXX | 2 | -1 4eeaa622-17f9-xxxx-b99a-cddb3ad942de | 0 | xxxxAPP | 2 | -1 fbbdc0a0-23a4-4d32-xxxx-a35c59eb790d | 0 | xxxxDB0 | 2 | -1 45a4e7ce-19a9-4db9-xxxxx-66bd1b9d83af | 0 | xxxxxWOR | 2 | -1 (4 rows)
Should that be enough to have a safe start of the engine without any HA action kicking in. ?
Looks ok, but check also run_on_vds and migrating_to_vds. See also bz 1446055. Best regards,
-----Ursprüngliche Nachricht----- Von: Yedidyah Bar David [mailto:didi@redhat.com] Gesendet: Montag, 19. März 2018 10:18 An: Sven Achtelik Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
On Mon, Mar 19, 2018 at 11:03 AM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi Didi,
my backups where taken with the end. Backup utility. I have 3 Data centers, two of them with just one host and the third one with 3 hosts running the engine. The backup three days old, was taken on engine version 4.1 (4.1.7) and the restored engine is running on 4.1.9.
Since the bug I mentioned was fixed in 4.1.3, you should be covered.
I have three HA VMs that would be affected. All others are just normal vms. Sounds like it would be the safest to shut down the HA vm S to make sure that nothing happens ?
If you can have downtime, I agree it sounds safer to shutdown the VMs.
Or can I disable the HA action in the DB for now ?
No need to. If you restored with 4.1.9 engine-backup, it should have done this for you. If you still have the restore log, you can verify this by checking it. It should contain 'Resetting HA VM status', and then the result of the sql that it ran.
Best regards,
Thank you,
Sven
Von meinem Samsung Galaxy Smartphone gesendet.
-------- Ursprüngliche Nachricht -------- Von: Yedidyah Bar David <didi@redhat.com> Datum: 19.03.18 07:33 (GMT+01:00) An: Sven Achtelik <Sven.Achtelik@eps.aero> Cc: users@ovirt.org Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:
Hi All,
I had issue with the storage that hosted my engine vm. The disk got corrupted and I needed to restore the engine from a backup.
How did you backup, and how did you restore?
Which version was used for each?
That worked as expected, I just didn’t start the engine yet.
OK.
I know that after the backup was taken some machines where migrated around before the engine disks failed.
Are these machines HA?
My question is what will happen once I start the engine service which has the restored backup on it ? Will it query the hosts for the running VMs
It will, but HA machines are handled differently.
See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1441322 https://bugzilla.redhat.com/show_bug.cgi?id=1446055
or will it assume that the VMs are still on the hosts as they resided at the point of backup ?
It does, initially, but then updates status according to what it gets from hosts.
But polling the hosts takes time, especially if you have many, and HA policy might require faster handling. So if it polls first a host that had a machine on it during backup, and sees that it's gone, and didn't yet poll the new host, HA handling starts immediately, which eventually might lead to starting the VM on another host.
To prevent that, the fixes to above bugs make the restore process mark HA VMs that do not have leases on the storage as "dead".
Would I need to change the DB manual to let the engine know where VMs are up at this point ?
You might need to, if you have HA VMs and a too-old version of restore.
What will happen to HA VMs ? I feel that it might try to start them a second time. My biggest issue is that I can’t get a service Windows to shutdown all VMs and then lat them restart by the engine.
Is there a known workflow for that ?
I am not aware of a tested procedure for handling above if you have a too-old version, but you can check the patches linked from above bugs and manually run the SQL command(s) they include. They are essentially comment 4 of the first bug.
Good luck and best regards, -- Didi
-- Didi
-- Didi
participants (2)
-
Sven Achtelik
-
Yedidyah Bar David