Thanks for the reply. I looked at the VDSM logs and found these entries:
2020-01-07 08:10:43,107-0600 INFO (vmrecovery) [vds] recovery: waiting for storage pool
to go up (clientIF:709)
2020-01-07 08:10:48,112-0600 INFO (vmrecovery) [vdsm.api] START
getConnectedStoragePoolsList(options=None) from=in
ternal, task_id=680125a0-7986-438b-a42a-2612bde04006 (api:48)
2020-01-07 08:10:48,112-0600 INFO (vmrecovery) [vdsm.api] FINISH
getConnectedStoragePoolsList return={'poollist':
[]} from=internal, task_id=680125a0-7986-438b-a42a-2612bde04006 (api:54)
2020-01-07 08:10:48,113-0600 INFO (vmrecovery) [vds] recovery: waiting for storage pool
to go up (clientIF:709)
Looks like something is wrong with a storage pool maybe?
I am really in the dark here. Not sure what this means.
-----Original Message-----
From: Yedidyah Bar David (didi(a)redhat.com) <didi(a)redhat.com>
Sent: Tuesday, January 7, 2020 12:06 AM
To: Bob Franzke <Bob.Franzke(a)mdaemon.com>
Cc: users <users(a)ovirt.org>
Subject: [ovirt-users] Re: OVirt Engine Server Died - Steps for Rebuilding the Ovirt
Engine System
On Mon, Jan 6, 2020 at 6:01 PM Bob Franzke <Bob.Franzke(a)mdaemon.com> wrote:
I just had some VMs go offline over the weekend. I really cannot figure out how to tell
why without the engine working.
If you suspect that they failed, as opposed to being shut down from inside them (by an
admin or whatever), then you can check vdsm logs on the host they ran on, and might find a
clue.
I don’t need to really 'control' the VMs but seems without
the engine its not just the control aspect. It’s the visibility it gives you into the
state of your environment. We used Ovirt as also a lab setup for users to access and build
VMs as needed. This is completely offline now without a working Engine. Seems like having
an engine available all the time would be pretty important generally.
I have never understood the idea of having the machine that controls VMs, being in the
same infrastructure its controlling. Seems very 'chicken or the egg' sort of thing
to me. If the engine decides to move itself from one host to another, and it fails for
some reason because the process of moving itself caused a problem (stopping services,
etc.)then not sure what you would end up with there. Seems very iffy to me, but maybe I am
reading too much into it. Again I admittedly don’t know enough about Ovirt to know if this
thinking is off base or not. My own experience with networking systems means you would
never set things up like this. Each system is autonomous and can take over for the other
if one part fails. But then again, if Ovirt Engine had been set up this way, maybe I
wouldn't be in the position I am now with no working engine. Lots to sort out. Thanks
for the help.
Each host participating in the hosted-engine cluster has two small daemons, called agent
and broker, in the package ovirt-hosted-engine-ha, that should take care of the engine
VM.
You are right that this is a chicken-and-egg problem, and this is the solution that oVirt
includes.
-----Original Message-----
From: Yedidyah Bar David (didi(a)redhat.com) <didi(a)redhat.com>
Sent: Monday, January 6, 2020 8:26 AM
To: Bob Franzke <Bob.Franzke(a)mdaemon.com>
Cc: users <users(a)ovirt.org>
Subject: [ovirt-users] Re: OVirt Engine Server Died - Steps for
Rebuilding the Ovirt Engine System
On Mon, Jan 6, 2020 at 4:19 PM Bob Franzke <Bob.Franzke(a)mdaemon.com> wrote:
>
> So I am getting the impression that without a working ovirt engine, you are sort of
cooked from being able to control VMs such that your whole organization can potentially
come down to the availability of a single machine? Is this really correct?
Correct.
This does not mean that the engine itself is necessarily critical - if it's down,
your VMs should still be ok. If _controlling_ VMs is considered critical for you, then yes
- you do need to make sure your engine is alive and well.
> Are there HA options available for the engine server itself?
The standard option is using hosted-engine with several hosts - you get HA
out-of-the-box.
I also heard about people using standalone active/standby clustering/HA solutions for the
engine.
>
> -----Original Message-----
> From: Yedidyah Bar David (didi(a)redhat.com) <didi(a)redhat.com>
> Sent: Monday, January 6, 2020 12:57 AM
> To: Bob Franzke <Bob.Franzke(a)mdaemon.com>
> Cc: users <users(a)ovirt.org>
> Subject: [ovirt-users] Re: OVirt Engine Server Died - Steps for
> Rebuilding the Ovirt Engine System
>
> On Mon, Jan 6, 2020 at 12:00 AM Bob Franzke <Bob.Franzke(a)mdaemon.com> wrote:
> >
> > Thanks for the reply here. Still waiting on a server to rebuild this with.
Should be here tomorrow. The engine was running on bare metal server, and was not a VM.
> >
> > In the mean time we had a few of the VMs go dark for some reason. I discovered
the vdsm-client commands and tried figuring out what happened. Is there any way I can
start a VM via command line on one of the VM hosts? Is the vdsm-client command the way to
do this without a working engine?
>
> It is, in principle, but that's not supported and is risky - because the engine
will not know what you do.
>
> See also e.g.:
>
>
https://www.ovirt.org/develop/release-management/features/integratio
> n/
> cockpit.html
>
> >
> > -----Original Message-----
> > From: Yedidyah Bar David (didi(a)redhat.com) <didi(a)redhat.com>
> > Sent: Tuesday, December 24, 2019 1:50 AM
> > To: Bob Franzke <Bob.Franzke(a)mdaemon.com>
> > Cc: users <users(a)ovirt.org>
> > Subject: [ovirt-users] Re: OVirt Engine Server Died - Steps for
> > Rebuilding the Ovirt Engine System
> >
> > On Mon, Dec 23, 2019 at 7:08 PM Bob Franzke <Bob.Franzke(a)mdaemon.com>
wrote:
> > >
> > > > Which nightly backups? Do they run engine-backup?
> > >
> > > Yes sorry. The backups are the backups created when running the
engine-backup script. So I have the files and the DB backed up and off onto different
storage. I just grabbed a copy of the entire /etc directory as well just in case there was
something needed in there that is not included in the engine-backup solution.
> > >
> > > > In either case, assuming this is a production env, I suggest to first
test on a separate env to see how it all looks like.
> > >
> > > This is a production environment. My plan is to get a new server ordered
and built, removing the old server from the equation (old server is old and needs to be
replaced anyway). Then rebuild the Ovirt bits and restore the data from my backups.
> >
> > I assume, from your first post, that you refer to the host running the engine,
and that this is a standalone engine, not hosted-engine.
> > Right? Meaning, it's running on bare-metal, not inside a VM managed by
itself.
> >
> > For testing you can try stuff on an isolated VM somewhere, no need to wait for
your new server to arrive.
> >
> > >
> > > I just more needed a quick set up steps to take here. From what I gather I
need to basically:
> > >
> > > 1. reinstall CentOS
> > > 2. Reconfigure storage (this server has several ISCSI LUNs its attached to
currently. I don’t know if they are required for this or what).
> >
> > I obviously have no idea what is your storage design and requirements, but this
is largely a local matter, unrelated to the hosts that run VMs. The engine machine's
storage is (normally) not used for that, only for the engine itself (and its db, etc.).
> >
> > > 3. Install PostGreSQL (maybe? Or does the ovirt engine script do
> > > this for you?) 3. Install Ovirt/run ovirt-engine script maybe?
> >
> > Add relevant repo, by installing relevant ovirt-releast* package (see the web
site), and then 'yum install ovirt-engine' - this should grab for you postgresql
etc.
> >
> > > 4. Restore DB and data
> >
> > Yes. Run basically 'engine-backup --mode=restore' and then
'engine-setup'. Please check the backup/restore documentation on the web site.
> > If your current engine used only defaults (meaning, engine+dwh+their DBs all on
the engine machine, provisioned by engine-setup), then the restore command should be
something like:
> >
> > engine-backup --mode=restore --file=your-backup-file
> > --provision-all-databases
> >
> > Again, please test on a test VM somewhere, and make sure it's
> > isolated
> > - that it can't reach your hosts and start to manage them (unless
that's what you want, of course).
> >
> > >
> > > I am not sure the details of the list outlined above (what to run where,
etc.). I am looking for consultants to help me out here as its clear I am a bit behind the
curve on this one. So far not much has worked out on that front. Does the above list seem
reasonable in terms of needed steps to get this going again?
> >
> > See above.
> >
> > For consultants, you might want to check:
> >
> >
https://www.ovirt.org/community/user-stories/users-and-providers.h
> > tm
> > l
> >
> > And/or post again to the list with a subject line that's more likely to
attract them ("Looking for an oVirt consultant...").
> >
> > Good luck and best regards,
> >
> > >
> > >
> > > -----Original Message-----
> > > From: Yedidyah Bar David (didi(a)redhat.com) <didi(a)redhat.com>
> > > Sent: Sunday, December 22, 2019 1:58 AM
> > > To: bob.franzke(a)mdaemon.com
> > > Cc: users <users(a)ovirt.org>
> > > Subject: [ovirt-users] Re: OVirt Engine Server Died - Steps for
> > > Rebuilding the Ovirt Engine System
> > >
> > > On Fri, Dec 20, 2019 at 8:55 PM <bob.franzke(a)mdaemon.com> wrote:
> > > >
> > > > Full disclosure here.....I am not an Ovirt Expert. I am a network
Engineer that has been forced to take over sysadmin duties for a departed co-worker. I
have little experience with Ovirt so apologies up front for anything I say that comes
across as stupid or "RTM" questions. Normally I would do just that but I am in a
bind and am trying to figure this out quickly. We have an OVirt installation setup that
consists of 4 nodes and a server that hosts the ovirt-engine all running CentOS 7. The
server that hosts the engine has a pair of failing hard drives and I need to replace the
hardware ASAP. Need to outline the steps needed to build a new server to serve as and
replace the ovirt engine server. I have backed up the entire /etc directory and the
backups being done nightly by the engine itself.
> > >
> > > Which nightly backups? Do they run engine-backup?
> > >
> > > > I also backed up the iscsi info and took a printout of all the disk
arrangement . The disk has gotten so bad at this point that the DB won't back up any
longer. Get fatal:backup failed error when
> > > > trying to run the ovirt backup tool. Also the Ovirt management site
is not rendering and I am not sure why.
> > > >
> > > > Is there anything else I need to make sure I backup in order to
migrate the engine from one server to another?
> > >
> > > Generally speaking, if you used engine-backup for backups, it should be
enough - it backs up all it needs from /etc.
> > >
> > > If you didn't use that, /etc won't be enough. You also need a
database backup.
> > >
> > > If you do not have a backup of the database, you'll need to create a
new engine from scratch. You can then import the existing storage domains and add the
hosts. This will require downtime, and you'll loose some stuff, so if you do have an
engine-backup backup, better use that.
> > >
> > > In either case, assuming this is a production env, I suggest to first test
on a separate env to see how it all looks like.
> > >
> > > > Also, until I can get the engine running again, is there any tool
available to manage the VMs on the hosts themselves. The VMs on the hosts are running but
need a way to manage them if needed in case something happens while the engine is being
repaired.
> > >
> > > Some management is possible via cockpit. It's much less than what the
engine allows.
> > >
> > > If you search the list archives, you can find suggestions by people to
directly use libvirt/virsh after poking a bit inside your storage domain. I'd not
recommend doing that, unless you know very well what you are doing and have no other
solution (e.g. if storage is corrupted enough so that import to a new engine fails).
> > >
> > > > Any info on this as well as what to backup and the steps to move the
engine from one server to another would be much much appreciated.
> > >
> > > You can search the site for backup, restore, and import storage domain,
and should find the relevant pages. Please note that the pages under /develop are written
during development and are usually not updated after a feature is complete. The official
documentation is under /documentation. That, in turn, is often outdated as well :-(.
> > > You can use RHV docs in addition. These are more up-to-date and should be
99% applicable to oVirt.
> > >
> > > > Sorry I know this a real RTM type post but I am in a bind and need a
solution rather quickly. Thanks in advance.
> > >
> > > Good luck!
> > > --
> > > Didi
> > > _______________________________________________
> > > Users mailing list -- users(a)ovirt.org To unsubscribe send an
> > > email to users-leave(a)ovirt.org Privacy
> > > Statement:
https://www.ovirt.org/site/privacy-policy/
> > > oVirt Code of Conduct:
> > >
https://www.ovirt.org/community/about/community-guidelines/
> > > List Archives:
> > >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FU
> > > 4A
> > > IR
> > > 7S
> > > CTQOQRWLPLPUH5XHDXYI4DD7/
> > >
> >
> >
> > --
> > Didi
> > _______________________________________________
> > Users mailing list -- users(a)ovirt.org To unsubscribe send an email
> > to users-leave(a)ovirt.org Privacy
> > Statement:
https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct:
> >
https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DX2J
> > FA
> > I6
> > T2MXOVOXUVL4QIVPSHQQBSNP/
> >
>
>
> --
> Didi
> _______________________________________________
> Users mailing list -- users(a)ovirt.org To unsubscribe send an email
> to users-leave(a)ovirt.org Privacy
> Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/B56KRG
> FP
> AFGYO7MAF43PJXUCLDNDUSBS/
>
--
Didi
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org Privacy
Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HHE26SHL
6ZE3SVXTWZ5WQ6PSVEVKAXMX/
--
Didi
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org Privacy Statement: