Hosts not coming back into oVirt
by Arif Ali
Hi all,
Recently deployed oVirt version 4.3.1
It's in a self-hosted engine environment
Used the steps via cockpit to install the engine, and was able to add
the rest of the oVirt nodes without any specific problems
We tested the HA of the hosted-engine without a problem, and then at one
point of turn off the machine that was hosting the engine, to mimic
failure to see how it goes; the vm was able to move over successfully,
but some of the oVirt started to go into Unassigned. From a total of 6
oVirt hosts, I have 4 of them in this state.
Clicking on the host, I see the following message in the events. I can
get to the hosts via the engine, and ping the machine, so not sure what
it's doing that it's no longer working
VDSM <snip> command Get Host Capabilities failed: Message timeout which
can be caused by communication issues
Mind you, I have been trying to resolve this issue since Monday, and
have tried various things, like rebooting and re-installing the oVirt
hosts, without having much luck
So any assistance on this would be grateful, maybe I've missed something
really simple, and I am overlooking it
--
regards,
Arif Ali
5 years, 8 months
Re: High iowait and low throughput on NFS, oflag=direct fixes it?
by Strahil
What are the mount options ?
I think I read somewhere about poor NFS performance.
I think that 'async' mount option was a kind of solution.
Best Regards,
Strahil NikolovOn Mar 25, 2019 15:45, Frank Wall <fw(a)moov.de> wrote:
>
> Hi,
>
> I've been using oVirt for years and have just discovered a rather strange
> issue that causes EXTREME high iowait when using a NFSv3 storage.
>
> Here's a quick test on a CentOS 7.6 VM running on any oVirt 4.2.x node:
> (5 oVirt nodes, all showing the same results)
>
> # CentOS VM
> $ dd if=/dev/zero of=TEST02 bs=1M count=3000
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 141.649 s, 22.2 MB/s
>
> # iostat output
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> vdb 0.00 0.00 1.00 50.00 0.00 23.02 924.39 121.62 2243.47 2301.00 2242.32 19.61 100.00
>
>
> As you can see iowait is beyond bad both for read and write requests.
> During this test the underlying NFS storage server was idle, disks barely
> doing anything. iowait on the NFS storage server was very low.
>
> However, when using oflag=direct the test shows a completely different result:
>
>
> # CentOS VM
> $ dd if=/dev/zero of=TEST02 bs=1M count=3000 oflag=direct
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 21.0724 s, 149 MB/s
>
> # iostat output
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> vdb 0.00 0.00 4.00 483.00 0.02 161.00 677.13 2.90 5.96 0.00 6.01 1.99 97.10
>
>
> This test shows the *expected* performance in this small oVirt setup.
> Notice how iowait remains healthy, although the throughput is 7x higher now.
>
> I think this 2nd test may prove multiple things: the NFS storage is fast
> enough and there's no networking/switch issue either.
>
> Still, under normal conditions WRITE/READ operations are really slow and
> iowait goes through the roof.
>
> Do these results make sense to anyone? Any hints how to find what's wrong here?
> Any tests I should run or sysctls/tunables that would make sense?
>
> FWIW, iperf result looks good between the oVirt Node and the NFS storage:
>
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-10.0 sec 10.9 GBytes 9.36 Gbits/sec
>
>
> Regards
> - Frank
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GUDZWGRLIVC...
5 years, 8 months
share ISO storage domain between 4.2 and 4.3 ??
by Matthias Leopold
Hi,
My test and production oVirt environments share the ISO domain. When I
upgrade the test environment to 4.3 the ISO domain will be used by oVirt
4.2 and 4.3 at the same time. Is that a problem?
thx
Matthias
5 years, 8 months
4.2.8 to 4.3.2 upgrade
by Leo David
Hi everyone,
I have seen a lot of threads here regarding 4.3.x release regarding
problems a different layers, most of them related to underneath gluster
storage.
I would do an upgrade though, thus benefiting the new added features.
My thoughts would be:
1. did anyone succesfully went through this process, any problems occured
during of after the upgrade ?
2. any sincere recomandation like "if it works don't fix it" considering
the platform is running in production ?
I would really apreciate your oppinion.
Thank you very much !
Leo
5 years, 8 months
Re: Can't connect to storage
by Strahil
You can check if any hosts' network is out of sync.
Also try ssh from engine to each host.
Best Regards,
Strahil NikolovOn Mar 24, 2019 16:05, Julio Cesar Bustamante <julio.cesar.bustamante(a)gmail.com> wrote:
>
> Hi everyone
>
> I hace a problem with ovirtmanager 4.2. in the nodes can't connect to storage. They show these messages. What can I do yo Solve it?
>
>
>
5 years, 8 months
Can't connect to storage
by Julio Cesar Bustamante
Hi everyone
I hace a problem with ovirtmanager 4.2. in the nodes can't connect to
storage. They show these messages. What can I do yo Solve it?
5 years, 8 months
Add Storage Domain to existing Datacenter - side effect
by jeanbaptiste.coupiac@nfrance.com
Hello Guys,
We are evaluating oVirt , and this morning I had to face to a side effect (
small impact because oVIrt Datacenter is actually a small one).
1. I added a new Storage domain from iSCSI SAN
2. Not all hosts HBA were authorized to access to this LUN (lack of
configuration on SAN side)
3. One of ovirt Node on four nodes into my datacenter couldn't access
to the LUN => this ovirt node has gone into "Non-responsible" state. => All
VMs running into this host has gone to another
My question is regarding the "operational mode" :
* If I add a Storage Domain which is not already fully configured /
badly configured ("regarding HBA acces granted on SAN side), when I add the
Storage Domain to my DC, each oVirt node into my datacenter will move to
"non-operationnal" state after fews seconds since no one will be able to log
into the new LUN ? (This state change can have some "major effect" (vm
migration between hosts without any logic) ?)
Does I miss something ?
Is there a guard ? Is Cluster Fencing policy option: Skip fencing on cluster
connectivity issues => Threshold: xx% can protect this type of error ?
Regards,
Jean-Baptiste
5 years, 8 months
OVirt Gluster Fail
by commramius@tiscali.it
Ho una installazione OVirt 4.1 in gluster.
Durante la manutenzione di una macchina abbiamo l'Hosted Engine si è bloccato. A quel punto non c'è stata più possibilità di gestire nulla.
Le vm sono andate in pausa, e non sono state più gestibili.
Ho atteso il riavvio della macchina, ma a quel punto anche tutti i brick non erano più raggiungibili.
Ora mi trovo nella situazione in cui non viene più caricato il mount per l'engine.
Il gluster vede i peer connessi e i servizi accesi per i vari brick, ma non riesce a fare l'heal i messaggi che trovo per ogni macchina sono i seguenti:
# gluster volume heal engine info
Brick 192.170.254.3:/bricks/engine/brick
<gfid:cf9eac3a-b532-4557-8d81-2fca07a0d3f5>
.
.
.
<gfid:cf5eaf66-b532-5444-8d81-2fca07a0d3f5>
Status: Connected
Number of entries: 190
Brick 192.170.254.4:/bricks/engine/brick
Status: Il socket di destinazione non è connesso
Number of entries: -
Brick 192.170.254.6:/bricks/engine/brick
Status: Il socket di destinazione non è connesso
Number of entries: -
questo per tutti i brick (alcuni non hanno alcun heal da fare perchè le macchine all'interno erano state spente).
In pratica tutti i brick vedono solo localhost come connesso.
Come posso fare per ripristinare le macchine?
Esiste un modo per poter leggere i dati dalla macchina fisica ed esportarli in modo da poterli riusare?
Purtroppo abbiamo la necessità di accedere a quei dati
Qualcuno può aiutarmi.
Grazie
Andrea
5 years, 8 months