Re: Fix corrupt self-hosted engine
by Alex K
For the records,
After having fixed the major fs issues with guestfish and since the DB was
not starting up, I removed everything from DB data dir and recreated it as
below:
rm -rf /var/opt/rh/rh-postgresql10/lib/pgsql/data/*
/opt/rh/rh-postgresql10/root/usr/bin/postgresql-setup --initdb
systemctl restart rh-postgresql10-postgresql.service
Then proceeded with the restoration, where I requested to provision all
missing databases:
engine-backup --mode=restore --file=engine-backup.gz
--provision-all-databases \
--log=restore.log --restore-permissions
Following this, ran engine-setup, as instructed from the restore operation.
Gained engine web access and saw the same running VMs were shown as up
without issues.
I only observed one VM not able to start due to illegal volume, but that's
another story.
On Thu, Nov 19, 2020 at 9:42 PM Alex K <rightkicktech(a)gmail.com> wrote:
>
>
> On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkicktech(a)gmail.com> wrote:
>
>> Hi Didi,
>>
>> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <didi(a)redhat.com>
>> wrote:
>>
>>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkicktech(a)gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a corrupt self-hosted engine (with several file system errors,
>>>> postgres not able to start) and thus it does not give access to the web UI.
>>>> This happened following an unlucky split brain resolution (I am running 2
>>>> nodes). The two hosts are running VMs also which I would like to keep
>>>> running as they are needed.
>>>>
>>>> When trying to boot into rescue mode (using
>>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing
>>>> else.
>>>>
>>>
>>> This means that more than just the DB is corrupt...
>>>
>>>
>>>>
>>>> I have backups of engine files with scope all (using the engine-backup
>>>> tool).
>>>> What is the best approach to try and fix the engine or redeploy.
>>>>
>>>
>>> If you are careful, and know what you are doing, you can try something
>>> like the following. I am not giving many details, hopefully you can find on
>>> the net tutorials about how to use the things I suggest:
>>>
>>> 1. Move to global maintenance
>>>
>>> 2. Stop the current dead vm (if needed)
>>>
>>> 3. Find current vm conf, edit it to boot from a rescue iso image of your
>>> preference or from net/PXE etc., and start the vm with '--vm-conf' pointing
>>> to your edited file.
>>>
>>> 4. Connect a console (hosted-engine --console, or 'virsh console', or
>>> use '--add-console-password' and remote viewer, if needed)
>>>
>>> 5. Clean the disk and install the OS, oVirt, etc.
>>>
>>> 6. Copy your backup into the vm and restore with engine-backup
>>>
>>> 7. Then cleanly stop the machine, exit global maint, and let HA start it
>>> (or start it yourself with --vm-start).
>>>
>>> At the time, we had a bug [1] to document this. The result is [2]. It
>>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db is
>>> dead but fs is ok).
>>> For something somewhat similar to what you want, see also [3], which
>>> uses guestfish. Might be useful, depending on how badly your disk is
>>> corrupted.
>>>
>> I went with the guestfish approach. It has fixed some fs issues and now
>> the yum etc seem fine apart from postgres.
>> I had tried previously to uninstall/install packages so I ended
>> installing them again with yum install ovirt\*setup\*.
>> Now I think I have to run engine-setup but I get the error:
>>
>> Failed to execute stage 'Environment setup': Cannot connect to Engine
>> database using existing credentials: engine@localhost:5432
>>
> Seems that I need to have psql running to be able to run engine-backup
> --mode=restore. Are there any steps how one could manually prepare pgsql
> for ovirt so as to attempt restoration?
>
>>
>> So I guess I need to follow [2]. What do you think?
>>
>>
>>> How did you run into a split brain? There is a lock on the shared
>>> storage that should prevent this.
>>>
>>> Good luck and best regards,
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710
>>> [2]
>>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_S...
>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4
>>> --
>>> Didi
>>>
>>
4 years, 3 months
Can't delete a disk after a failed upload
by lorenzobarbati02@gmail.com
Hi,
After trying to upload an ISO via the web interface, the upload remains in the "paused by system" state.
When I click "cancel" in the upload menu the status stops at "finalizing cleanup".
Is there an alternative way to delete this disk?
4 years, 3 months
Found a host rebooting - ways to watch?
by Chris Adams
I just noticed that one of my oVirt physical hosts has been rebooting
due to an apparent hardware voltage fault. It's a Dell, and I've got
their tools installed and am monitoring status, but the issue clears
itself. It has apparently been doing this for a bit now, and we didn't
catch it because (a) there weren't any VMs on it (probably were the
first time but they were restarted elsewhere fast enough that it wasn't
noticed) and (b) it reboots fast enough that at most it pops up in our
monitoring system for one pass and then clears so our NOC either didn't
see it or assumed it was okay since it cleared.
oVirt has been logging alerts when it happens, but seeing that requires
someone to log in and check the logs (and we've got a bunch of different
systems to manage, including multiple oVirt clusters, so nobody is doing
that on a regular basis). We monitor most things with SNMP and/or CLI
checks (we have PRTG, Nagios, and LibreNMS for various different
things).
What are people doing to monitor the health of their oVirt systems? Is
it possible to get alerts emailed to admins? Is there any SNMP support
in oVirt to allow external systems to monitor its health? This setup is
on 4.3.10 if that matters.
--
Chris Adams <cma(a)cmadams.net>
4 years, 3 months
Re: EPYC CPU not being detected correctly on cluster
by Lucia Jelinkova
Hi,
oVirt CPU detection depends on libvirt (and that depends on qemu) CPU
models. Could you please run the following command to see what libvirt
reports?
virsh domcapabilities
That should give you the list of CPUs known to libvirt with a usability
flag for each CPU.
If you find out that the CPU is not usable by libvirt, you might want to
dig deeper by querying quemu directly.
Locate any VM running on the system by
sudo virsh list --all
Use the name of a VM in the following command:
sudo virsh qemu-monitor-command [your-vm's-name] --pretty
'{"execute":"query-cpu-definitions"}'
That would give you the list of all CPUs supported by qemu and it will list
all cpu's features that are not available on your system.
Regards,
Lucia
On Thu, Nov 19, 2020 at 9:38 PM Vinícius Ferrão via Users <users(a)ovirt.org>
wrote:
> Hi
>
>
>
> I’ve an strange issue with two hosts (not using the hypervisor image) with
> EPYC CPUs, on the engine I got this message:
>
>
>
> The host CPU does not match the Cluster CPU Type and is running in a
> degraded mode. It is missing the following CPU flags: model_EPYC. Please
> update the host CPU microcode or change the Cluster CPU Type.
>
>
>
> But it is an EPYC CPU, the firmware is updated to the latest versions, but
> for some reason oVirt does not like it.
>
>
>
> Here’s the relevant output from VDSM:
>
> "cpuCores": "128",
>
> "cpuFlags":
> "ibs,vme,abm,sep,ssse3,perfctr_core,sse4_2,skip-l1dfl-vmentry,cx16,pae,misalignsse,avx2,smap,movbe,vgif,rdctl-no,extapic,clflushopt,de,sse4_1,xsaveerptr,perfctr_llc,fma,mca,sse,rdtscp,monitor,umip,mwaitx,cr8_legacy,mtrr,stibp,bmi2,pclmulqdq,amd-ssbd,lbrv,pdpe1gb,constant_tsc,vmmcall,f16c,ibrs,fsgsbase,invtsc,nopl,lm,3dnowprefetch,smca,ht,tsc_adjust,popcnt,cpb,bmi1,mmx,arat,aperfmperf,bpext,cqm_occup_llc,virt-ssbd,tce,pse,xsave,xgetbv1,topoext,sha_ni,amd_ppin,rdrand,cpuid,tsc_scale,extd_apicid,cqm,rep_good,tsc,sse4a,flushbyasid,pschange-mc-no,mds-no,ibpb,smep,clflush,tsc-deadline,fxsr,pat,avx,pfthreshold,v_vmsave_vmload,osvw,xsavec,cdp_l3,clzero,svm_lock,nonstop_tsc,adx,hw_pstate,spec-ctrl,arch-capabilities,xsaveopt,skinit,rdt_a,svm,rdpid,lahf_lm,fpu,rdseed,fxsr_opt,sse2,nrip_save,vmcb_clean,sme,cat_l3,cqm_mbm_local,irperf,overflow_recov,avic,mce,mmxext,msr,cx8,hypervisor,wdt,mba,nx,decodeassists,cmp_legacy,x2apic,perfctr_nb,succor,pni,xsaves,clwb,cqm_llc,syscall,apic,pge,npt,pse36,cmov,ssbd,pausefilter,sev,aes,wbnoinvd,cqm_mbm_total,spec_ctrl,model_qemu32,model_Opteron_G3,model_Nehalem-IBRS,model_qemu64,model_Conroe,model_kvm64,model_Penryn,model_SandyBridge,model_pentium,model_pentium2,model_kvm32,model_Nehalem,model_Opteron_G2,model_pentium3,model_Opteron_G1,model_SandyBridge-IBRS,model_486,model_Westmere-IBRS,model_Westmere",
>
> "cpuModel": "AMD EPYC 7H12 64-Core Processor",
>
> "cpuSockets": "2",
>
> "cpuSpeed": "3293.405",
>
> "cpuThreads": "256",
>
>
>
> Any ideia on why ou what to do to fix it?
>
>
>
> Thanks,
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/WP6XL6ODTLJ...
>
4 years, 3 months
Upgrade hyperconverged self-hosted ovirt from 4.3 to 4.4
by ralf@os-s.de
Hi,
has anyone attempted an upgrade from 4.3 to 4.4 in a hyperconverged self-hosted setup?
The posted guidelines seem a bit contradictive and not complete.
Has anyone tried it and could share his experiences? I am currently having problems when deploying the hosted engine and restoring. The host becomes unresponsive and has hung tasks.
Kind regards,
Ralf
4 years, 3 months
change domain name from export domain
by jb
Hello,
I needed to rename all domain names from my NFS storage. VM data and
seconds data domains was no problem. But when I try to rename the domain
from export domain, a red popup on top opens with this message:
Uncaught exception occurred. Please try reloading the page. Details:
(TypeError) : ZOj(...) is null
Please have your administrator check the UI logs
I also notice that, when I rename the path an hit enter, it doesn't
write anything to the engine.log
Is this a known issue?
Best regards
Jonathan
4 years, 3 months
Ovirt 4 2 NIC's
by Facundo Badaracco
Hi everyone!
Hope someone can help me with this..
I have 3 servers with centos 8 and ovirt 4 installed. Each server has 2 nic.
Server A = HE (HA)
Nic1= 192.169.2.24 Nic2=no ip
Server B = HE (HA)
Nic1= 192.169.2.25 Nic2=no ip
Server C = simply host.
Nic1= 192.169.2.26 Nic2=no ip
How can i configure the second NIC in each server in order to use it for
clients connect to the vms?. I want one nic for management, the other for
connections.
4 years, 3 months
Can't use ovirt web interface (500 error)
by lorenzobarbati02@gmail.com
Hi,
After updating to the latest version of ovirt (standalone installed with engine-setup) I am no longer able to use the web ui. After I log in, in fact, I get a modal error entitled "operation canceled" and with content "a request to the server failed, error 500"
Looking at the requests it actually receives an error 500 when it goes to make a request to "/ovirt-engine/webadmin/GenericApiGWTService" which replies "The call failed on the server; see server log for details"
These are the last lines of the engine.log: https://pastebin.com/uFgZASuW
Is anyone experiencing the same problem or know how to fix it?
4 years, 3 months
How do you manage OVN?
by Alex McWhirter
I'm not sure if I' missing something, but it seems there is no way built
in to oVirt to manage OVN outside of network / subnet creation. In
particular routing both between networks and to external networks.
Of course you have the OVN utilities, but it seems that the provider API
is the preffered method of interaction?
As far as i can tell, the only utility that can use this API as intended
is ManageIQ, which is a bit a behemoth if you only need the OVN portion
of things.
So is that it then? Interface with the API directly or use ManageIQ?
Just curious what others are doing in regards to OVN.
4 years, 3 months