On Fri, Aug 25, 2017 at 10:47 PM, Charles Gruener <cjg9411(a)rit.edu> wrote:
Thank you so much for the detailed response!
Glad it helped :-)
> You can try looking at the dwh_history_timekeeping table in the engine
> (not dwh) database:
>
> su - postgres -c 'psql engine -c "select * from
dwh_history_timekeeping;"'
>
> Most likely you'll find there more than one line with var_name 'lastSync’.
And that I most certainly did. There were double of each var_name. I simply deleted one
of each of the pairs that made the most sense to delete, reran engine-setup and all
appears to be working now!
> How this happened is quite interesting/useful to know, because it
> should not normally happen, and is most likely a bug. If you can
> reproduce this, please file a bug with relevant details. Thanks!
I’m pretty sure this was a self-inflicted issue as a while back when things broke, we
actually had two oVirt heads running but we didn’t catch it for a while. Basically, we
migrated from a VM running the head (on a separate VM solution) to a hardware solution.
How? Using engine-backup? Some kind of duplication/imaging?
Someone ended up turning the VM back on and it started wreaking
havoc on our installation.
Ouch.
This is one of my bad dreams re backup/restore/migration.
We try to emphasize in various guides that you must stop and disable
the engine service on the old machine. If you can think of anything
that could have further helped in your own situation/flow/case, do not
hesitate to ping us! Saying "This was obviously our own fault, the
software was just fine" is helpful only to some extent. That said, I
did not hear so far about exactly-same cases, although this does not
mean there aren't any.
This was likely a leftover from that condition.
Can you think how, exactly?
Can't tell exactly from your emails, but it seems you had engine+dwh
on the same machine.
Did you have DBs on a separate machine (which is not the default)? If
so, it makes sense.
The two machines' processes' both updated the same DBs.
But if you did use local DBs, and accessed them with host 'localhost'
(which is the default), the above should not have happened. Each
machine would then write to its own DBs.
This still quite bad - because you are then having two engines talking
to the same hosts - but in a different way.
Also: if I were you, I'd not keep trusting this system. If it works,
fine. It might break in the future - the above is definitely not part
of the design, not tested, not supported, etc. If at all possible,
perhaps consider reinstalling from scratch (not engine-backup
backup/restore). You can import the existing storage domains, if they
are not damaged as well. Can't even tell you how to test this. If the
individual VMs' disks seem ok, you might backup/restore these.
If it happens to return, I’ll be sure to file a bug.
Very well.
One last question: Data for the Storage section of the Global Utilization part of the
dashboard is empty. We are using Ceph via Cinder for our storage. Is that the issue?
I really have no idea, but it sounds reasonable. If you do not find an
existing open bug/RFE, please open one. Or start a new thread on this
list with a suitable subject header.
Side note: we are now being bitten by this bug -
https://bugzilla.redhat.com/show_bug.cgi?id=1465825
Thanks again for the assistance.
Charles
Best,
--
Didi