Re: Change IP Node and Manager
by Strahil
I think that you need to:
1. Put in maintenance
2. Unregister
3. Remove host
4. Change IP
5. Add host to oVirt again
Please, someone more experienced to share any thoughts.
Best Regards,
Strahil NikolovOn Apr 15, 2019 22:49, "Sebastian Antunez N." <antunez.sebastian(a)gmail.com> wrote:
>
> Hello Guys
>
> Have 8 host with Ovirt 4.1 and need change the IP to all node.
>
> There is a procedure to make the change of IP, I have searched for information but I can not find a process to perform.
>
> Someone could help me to know how I can change the IP in the nodes, I know that I must put the nodes in maintenance, but I do not know if I should change the manager first, add an additional IP and then add the nodes, etc.
>
> Thanks for the help
>
> Sebastian
5 years, 8 months
Re: hosted engine does not start
by Strahil
Try with the VNC console 'hosted-engine --add-console-password'
Then connect on the IP:port that the command replies and check what is going on.
Maybe, you will need a rescue DVD and mount all filesystems and dismount them.
After that, just power it off and power it on regularly.
If you can't use custom engine config, use the xml definition in the VDSM log.
You will also need this alias:
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' , so you can use virsh freely (define/start/destroy).
Best Regards,
Strahil NikolovOn Apr 15, 2019 22:35, Stefan Wolf <shb256(a)gmail.com> wrote:
>
> Hello all,
>
>
>
> after a powerloss the hosted engine won’t start up anymore.
>
> I ‘ve the current ovirt installed.
>
> Storage is glusterfs und it is up and running
>
>
>
> It is trying to start up hosted engine but it does not work, but I can’t see where the problem is.
>
>
>
> [root@kvm320 ~]# hosted-engine --vm-status
>
>
>
>
>
> --== Host 1 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date : True
>
> Hostname : kvm380.durchhalten.intern
>
> Host ID : 1
>
> Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "Down"}
>
> Score : 1800
>
> stopped : False
>
> Local maintenance : False
>
> crc32 : 3ad6d0bd
>
> local_conf_timestamp : 14594
>
> Host timestamp : 14594
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=14594 (Mon Apr 15 21:25:12 2019)
>
> host-id=1
>
> score=1800
>
> vm_conf_refresh_time=14594 (Mon Apr 15 21:25:12 2019)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=GlobalMaintenance
>
> stopped=False
>
>
>
>
>
> --== Host 2 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date : True
>
> Hostname : kvm320.durchhalten.intern
>
> Host ID : 2
>
> Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
>
> Score : 0
>
> stopped : False
>
> Local maintenance : False
>
> crc32 : e7d4840d
>
> local_conf_timestamp : 21500
>
> Host timestamp : 21500
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=21500 (Mon Apr 15 21:25:22 2019)
>
> host-id=2
>
> score=0
>
> vm_conf_refresh_time=21500 (Mon Apr 15 21:25:22 2019)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=ReinitializeFSM
>
> stopped=False
>
>
>
>
>
> --== Host 3 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date : True
>
> Hostname : kvm360.durchhalten.intern
>
> Host ID : 3
>
> Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
>
> Score : 1800
>
> stopped : False
>
> Local maintenance : False
>
> crc32 : cf9221cb
>
> local_conf_timestamp : 22121
>
> Host timestamp : 22120
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=22120 (Mon Apr 15 21:25:18 2019)
>
> host-id=3
>
> score=1800
>
> vm_conf_refresh_time=22121 (Mon Apr 15 21:25:18 2019)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=GlobalMaintenance
>
> stopped=False
>
>
>
> [root@kvm320 ~]# virsh -r list
>
> Id Name Status
>
> ----------------------------------------------------
>
> 6 HostedEngine laufend
>
>
>
> [root@kvm320 ~]# hosted-engine --console
>
> The engine VM is running on this host
>
> Verbunden mit der Domain: HostedEngine
>
> Escape-Zeichen ist ^]
>
> Fehler: Interner Fehler: Zeichengerät <null> kann nicht gefunden warden
>
>
>
> In engish it should be this
>
>
>
> [root@mgmt~]# hosted-engine --console
> The engine VM is running on this host
> Connected to domain HostedEngine
> Escape character is ^]
> error: internal error: cannot find character device
>
>
>
> This is in the log
>
>
>
> [root@kvm320 ~]# tail -f /var/log/ovirt-hosted-engine-ha/agent.log
>
> MainThread::INFO::2019-04-15 21:28:33,032::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 1800)
>
> MainThread::INFO::2019-04-15 21:28:43,050::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up..
>
> MainThread::INFO::2019-04-15 21:28:43,165::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 1800)
>
> MainThread::INFO::2019-04-15 21:28:53,183::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up..
>
> MainThread::INFO::2019-04-15 21:28:53,300::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 1800)
>
> MainThread::INFO::2019-04-15 21:29:03,317::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up..
>
> MainThread::INFO::2019-04-15 21:29:03,434::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 1800)
>
> MainThread::INFO::2019-04-15 21:29:13,453::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up..
>
> MainThread::INFO::2019-04-15 21:29:13,571::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 1600 due to gateway status
>
> MainThread::INFO::2019-04-15 21:29:13,571::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 1800)
>
> MainThread::INFO::2019-04-15 21:29:22,589::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up..
>
> MainThread::INFO::2019-04-15 21:29:22,712::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 1800)
>
>
>
> But it is not reachable over the network
>
>
>
> [root@kvm320 ~]# ping 192.168.200.211
>
> PING 192.168.200.211 (192.168.200.211) 56(84) bytes of data.
>
> From 192.168.200.231 icmp_seq=1 Destination Host Unreachable
>
> From 192.168.200.231 icmp_seq=2 Destination Host Unreachable
>
> From 192.168.200.231 icmp_seq=3 Destination Host Unreachable
>
> From 192.168.200.231 icmp_seq=4 Destination Host Unreachable
>
>
>
> I tried to stop and start the vm again, but it didn’t helped
>
>
>
> Maybe someone can give me some advice how to get the hosted engine running again
>
>
>
> Thx by stefan
5 years, 8 months
oVirt 4.3.2 - Cannot update Host via UI
by Strahil Nikolov
Hello guys,
I have the following issue after I have successfully updated by engine from 4.3.1 to 4.3.2 - I canont update any host via the UI.
The event log show startup of the update , but there is no process running on the Host, yum.log is not updated and engine log doesn't show anything meaningful.
Any hint where to look for ?
Thanks in advance.
Best Regards,Strahil Nikolov
5 years, 8 months
How to fix ovn apparent inconsistency?
by Gianluca Cecchi
Hello,
passing from old manual to current OVN in 4.3.1 it seems I have some
problems with OVN now.
I cannot assign network on OVN to VM (powered on or off doesn't change).
When I add//edit a vnic, they are not on the possible choices
Environment composed by three hosts and one engine (external on vSphere).
The mgmt network during time has been configured on network
named ovirtmgmntZ2Z3
On engine it seems there are 2 switches for every defined ovn network
(ovn192 and ovn172)
Below some output of commands in case any inconsistency has remained and I
can purge it.
Thanks in advance.
Gianluca
- On manager ovmgr1:
[root@ovmgr1 ~]# ovs-vsctl show
eae54ff9-b86c-4050-8241-46f44336ba94
ovs_version: "2.10.1"
[root@ovmgr1 ~]#
[root@ovmgr1 ~]# ovn-nbctl show
switch 32367d8a-460f-4447-b35a-abe9ea5187e0 (ovn192)
port affc5570-3e5a-439c-9fdf-d75d6810e3a3
addresses: ["00:1a:4a:17:01:73"]
port f639d541-2118-4c24-b478-b7a586eb170c
addresses: ["00:1a:4a:17:01:75"]
switch 6110649a-db2b-4de7-8fbc-601095cfe510 (ovn192)
switch 64c4c17f-cd67-4e29-939e-2b952495159f (ovn172)
port 32c348d9-12e9-4bcf-a43f-69338c887cfc
addresses: ["00:1a:4a:17:01:72 dynamic"]
port 3c77c2ea-de00-43f9-a5c5-9b3ffea5ec69
addresses: ["00:1a:4a:17:01:74 dynamic"]
switch 04501f6b-3977-4ba1-9ead-7096768d796d (ovn172)
port 0a2a47bc-ea0d-4f1d-8f49-ec903e519983
addresses: ["00:1a:4a:17:01:65 dynamic"]
port 8fc7bed4-7663-4903-922b-05e490c6a5a1
addresses: ["00:1a:4a:17:01:64 dynamic"]
port f2b64f89-b719-484c-ac02-2a1ac8eaacdb
addresses: ["00:1a:4a:17:01:59 dynamic"]
port f7389c88-1ea1-47c2-92fd-6beffb2e2190
addresses: ["00:1a:4a:17:01:58 dynamic"]
[root@ovmgr1 ~]#
- On host ov200 (10.4.192.32 on ovirtmgmntZ2Z3):
[root@ov200 ~]# ovs-vsctl show
ae0a1256-7250-46a2-a1b6-8f0ae6105c20
Bridge br-int
fail_mode: secure
Port br-int
Interface br-int
type: internal
Port "ovn-ddecf0-0"
Interface "ovn-ddecf0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.33"}
Port "ovn-b8872a-0"
Interface "ovn-b8872a-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.34"}
ovs_version: "2.10.1"
[root@ov200 ~]#
- On host ov300 (10.4.192.33 on ovirtmgmntZ2Z3):
[root@ov300 ~]# ovs-vsctl show
f1a41e9c-16fb-4aa2-a386-2f366ade4d3c
Bridge br-int
fail_mode: secure
Port br-int
Interface br-int
type: internal
Port "ovn-b8872a-0"
Interface "ovn-b8872a-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.34"}
Port "ovn-1dce5b-0"
Interface "ovn-1dce5b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.32"}
ovs_version: "2.10.1"
[root@ov300 ~]#
- On host ov301 (10.4.192.34 on ovirtmgmntZ2Z3):
[root@ov301 ~]# ovs-vsctl show
3a38c5bb-0abf-493d-a2e6-345af8aedfe3
Bridge br-int
fail_mode: secure
Port "ovn-1dce5b-0"
Interface "ovn-1dce5b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.32"}
Port "ovn-ddecf0-0"
Interface "ovn-ddecf0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.33"}
Port br-int
Interface br-int
type: internal
ovs_version: "2.10.1"
[root@ov301 ~]#
In web admin gui:
In network -> networks ->
- ovn192
Id: 8fd63a10-a2ba-4c56-a8e0-0bc8d70be8b5
VDSM Name: ovn192
External ID: 32367d8a-460f-4447-b35a-abe9ea5187e0
- ovn172
Id: 7546d5d3-a0e3-40d5-9d22-cf355da47d3a
VDSM Name: ovn172
External ID: 64c4c17f-cd67-4e29-939e-2b952495159f
5 years, 8 months
oVirt 4.3.2.1-1.el7 Errors at VM boot
by Wood Peter
Hi all,
A few weeks ago I did a clean install of the latest oVirt-4.3.2 and
imported some VMs from oVirt-3. Three nodes running oVirt Node and oVirt
Engine installed on a separate system.
I noticed that some times some VMs will boot successfully but the Web UI
will still show "Powering UP" for days after the VM has been up. I can
power down the VM and power back up and it may update the Web UI status to
UP.
While debugging the above issue I noticed that some VMs will trigger errors
during boot. I can power on a VM on one node, see the errors below started
happening every 4-5 seconds, then power down the VM, errors stop, then
power up the VM on a different node without a problem. Another VM though
may trigger the errors on the same node.
Everything is very inconsistent. I can't find a pattern. I tried different
VMs, different nodes, and I'm getting mixed results. Hopefully the errors
will give some clue.
Here is what I'm seeing scrolling every 4-5 seconds:
-------------------------
On oVirt Node:
==> vdsm.log <==
2019-04-12 10:50:31,543-0700 ERROR (jsonrpc/3) [jsonrpc.JsonRpcServer]
Internal server error (__init__:350)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345,
in _handle_request
res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in
_dynamicMethod
result = fn(*methodArgs)
File "<string>", line 2, in getAllVmStats
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1388, in
getAllVmStats
statsList = self._cif.getAllVmStats()
File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 567, in
getAllVmStats
return [v.getStats() for v in self.vmContainer.values()]
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1766, in
getStats
oga_stats = self._getGuestStats()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1967, in
_getGuestStats
stats = self.guestAgent.getGuestInfo()
File "/usr/lib/python2.7/site-packages/vdsm/virt/guestagent.py", line
505, in getGuestInfo
del qga['appsList']
KeyError: 'appsList'
==> mom.log <==
2019-04-12 10:50:31,547 - mom.VdsmRpcBase - ERROR - Command
Host.getAllVmStats with args {} failed:
(code=-32603, message=Internal JSON-RPC error: {'reason': "'appsList'"})
----------------------
On oVirt Engine
2019-04-12 10:50:35,692-07 WARN
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Unexpected return
value: Status [code=-32603, message=Internal JSON-RPC error: {'reason':
"'appsList'"}]
2019-04-12 10:50:35,693-07 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Failed in
'GetAllVmStatsVDS' method
2019-04-12 10:50:35,693-07 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Command
'GetAllVmStatsVDSCommand(HostName = sdod-ovnode-03,
VdsIdVDSCommandParametersBase:{hostId='12e38ad3-6327-4c94-8be4-88912d283729'})'
execution failed: VDSGenericException: VDSErrorException: Failed to
GetAllVmStatsVDS, error = Internal JSON-RPC error: {'reason':
"'appsList'"}, code = -32603
Thank you,
-- Peter
5 years, 8 months
Change IP Node and Manager
by Sebastian Antunez N.
Hello Guys
Have 8 host with Ovirt 4.1 and need change the IP to all node.
There is a procedure to make the change of IP, I have searched for
information but I can not find a process to perform.
Someone could help me to know how I can change the IP in the nodes, I know
that I must put the nodes in maintenance, but I do not know if I should
change the manager first, add an additional IP and then add the nodes, etc.
Thanks for the help
Sebastian
5 years, 8 months
Poor I/O Performance (again...)
by Jim Kusznir
Hi all:
I've had I/O performance problems pretty much since the beginning of using
oVirt. I've applied several upgrades as time went on, but strangely, none
of them have alleviated the problem. VM disk I/O is still very slow to the
point that running VMs is often painful; it notably affects nearly all my
VMs, and makes me leary of starting any more. I'm currently running 12 VMs
and the hosted engine on the stack.
My configuration started out with 1Gbps networking and hyperconverged
gluster running on a single SSD on each node. It worked, but I/O was
painfully slow. I also started running out of space, so I added an SSHD on
each node, created another gluster volume, and moved VMs over to it. I
also ran that on a dedicated 1Gbps network. I had recurring disk failures
(seems that disks only lasted about 3-6 months; I warrantied all three at
least once, and some twice before giving up). I suspect the Dell PERC 6/i
was partly to blame; the raid card refused to see/acknowledge the disk, but
plugging it into a normal PC showed no signs of problems. In any case,
performance on that storage was notably bad, even though the gig-e
interface was rarely taxed.
I put in 10Gbps ethernet and moved all the storage on that none the less,
as several people here said that 1Gbps just wasn't fast enough. Some
aspects improved a bit, but disk I/O is still slow. And I was still having
problems with the SSHD data gluster volume eating disks, so I bought a
dedicated NAS server (supermicro 12 disk dedicated FreeNAS NFS storage
system on 10Gbps ethernet). Set that up. I found that it was actually
FASTER than the SSD-based gluster volume, but still slow. Lately its been
getting slower, too...Don't know why. The FreeNAS server reports network
loads around 4MB/s on its 10Gbe interface, so its not network constrained.
At 4MB/s, I'd sure hope the 12 spindle SAS interface wasn't constrained
either..... (and disk I/O operations on the NAS itself complete much
faster).
So, running a test on my NAS against an ISO file I haven't accessed in
months:
# dd
if=en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso
of=/dev/null bs=1024k count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 2.459501 secs (213168465 bytes/sec)
Running it on one of my hosts:
root@unifi:/home/kusznir# time dd if=/dev/sda of=/dev/null bs=1024k
count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 7.21337 s, 72.7 MB/s
(I don't know if this is a true apples to apples comparison, as I don't
have a large file inside this VM's image). Even this is faster than I
often see.
I have a VoIP Phone server running as a VM. Voicemail and other recordings
usually fail due to IO issues opening and writing the files. Often, the
first 4 or so seconds of the recording is missed; sometimes the entire
thing just fails. I didn't use to have this problem, but its definately
been getting worse. I finally bit the bullet and ordered a physical server
dedicated for my VoIP System...But I still want to figure out why I'm
having all these IO problems. I read on the list of people running 30+
VMs...I feel that my IO can't take any more VMs with any semblance of
reliability. We have a Quickbooks server on here too (windows), and the
performance is abysmal; my CPA is charging me extra because of all the lost
staff time waiting on the system to respond and generate reports.....
I'm at my whits end...I started with gluster on SSD with 1Gbps network,
migrated to 10Gbps network, and now to dedicated high performance NAS box
over NFS, and still have performance issues.....I don't know how to
troubleshoot the issue any further, but I've never had these kinds of
issues when I was playing with other VM technologies. I'd like to get to
the point where I can resell virtual servers to customers, but I can't do
so with my current performance levels.
I'd greatly appreciate help troubleshooting this further.
--Jim
5 years, 8 months
Re: Tuning Gluster Writes
by Strahil
Hi,
What is your dirty cache settings on the gluster servers ?
Best Regards,
Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex(a)triadic.us> wrote:
>
> I have 8 machines acting as gluster servers. They each have 12 drives
> raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as
> one).
>
> They connect to the compute hosts and to each other over lacp'd 10GB
> connections split across two cisco nexus switched with VPC.
>
> Gluster has the following set.
>
> performance.write-behind-window-size: 4MB
> performance.flush-behind: on
> performance.stat-prefetch: on
> server.event-threads: 4
> client.event-threads: 8
> performance.io-thread-count: 32
> network.ping-timeout: 30
> cluster.granular-entry-heal: enable
> performance.strict-o-direct: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> features.shard: on
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: off
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> auth.allow: *
> user.cifs: off
> transport.address-family: inet
> nfs.disable: off
> performance.client-io-threads: on
>
>
> I have the following sysctl values on gluster client and servers, using
> libgfapi, MTU 9K
>
> net.core.rmem_max = 134217728
> net.core.wmem_max = 134217728
> net.ipv4.tcp_rmem = 4096 87380 134217728
> net.ipv4.tcp_wmem = 4096 65536 134217728
> net.core.netdev_max_backlog = 300000
> net.ipv4.tcp_moderate_rcvbuf =1
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_congestion_control=htcp
>
> reads with this setup are perfect, benchmarked in VM to be about 770MB/s
> sequential with disk access times of < 1ms. Writes on the other hand are
> all over the place. They peak around 320MB/s sequential write, which is
> what i expect but it seems as if there is some blocking going on.
>
> During the write test i will hit 320MB/s briefly, then 0MB/s as disk
> access time shoot to over 3000ms, then back to 320MB/s. It averages out
> to about 110MB/s afterwards.
>
> Gluster version is 3.12.15 ovirt is 4.2.7.5
>
> Any ideas on what i could tune to eliminate or minimize that blocking?
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAG...
5 years, 8 months
oVirt 4.3.2 missing/wrong status of VM
by Strahil Nikolov
As I couldn't find the exact mail thread, I'm attaching my /usr/lib/python2.7/site-packages/vdsm/virt/guestagent.py which fixes the missing/wrong status of VMs.
You will need to restart vdsmd (I'm not sure how safe is that with running guests) in order to start working.
Best Regards,Strahil Nikolov
5 years, 8 months
Second host fail to activate (hosted-engine)
by Ricardo Alonso
After installing the second host via the web gui (4.3.2.1-1.el7), it fails to activate telling that wasn't possible to connect to the storage pool default (glusterfs). Those are the logs:
vdsm.log
2019-04-09 15:54:07,409-0400 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:58130 (protocoldetector:61)
2019-04-09 15:54:07,419-0400 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:58130 (protocoldetector:125)
2019-04-09 15:54:07,419-0400 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95)
2019-04-09 15:54:07,420-0400 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompserver:124)
2019-04-09 15:54:07,461-0400 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
2019-04-09 15:54:07,466-0400 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
2019-04-09 15:54:07,469-0400 INFO (jsonrpc/0) [vdsm.api] START getStorageDomainInfo(sdUUID=u'd99fb087-66d5-4adf-9c0c-80e60de17917', options=None) from=::1,58130, task_id=00c843c2-ab43-4813-9ded-29f6742c33b2 (api:48)
2019-04-09 15:54:07,484-0400 INFO (jsonrpc/0) [vdsm.api] FINISH getStorageDomainInfo error='VERSION' from=::1,58130, task_id=00c843c2-ab43-4813-9ded-29f6742c33b2 (api:52)
2019-04-09 15:54:07,484-0400 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='00c843c2-ab43-4813-9ded-29f6742c33b2') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
return fn(*args, **kargs)
File "<string>", line 2, in getStorageDomainInfo
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2741, in getStorageDomainInfo
dom = self.validateSdUUID(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 305, in validateSdUUID
sdDom = sdCache.produce(sdUUID=sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
domain.getRealDomain()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
return findMethod(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 56, in findDomain
return GlusterStorageDomain(GlusterStorageDomain.findDomainPath(sdUUID))
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 394, in __init__
manifest = self.manifestClass(domainPath)
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 179, in __init__
sd.StorageDomainManifest.__init__(self, sdUUID, domaindir, metadata)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 332, in __init__
self._domainLock = self._makeDomainLock()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 553, in _makeDomainLock
domVersion = self.getVersion()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 424, in getVersion
return self.getMetaParam(DMDK_VERSION)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 421, in getMetaParam
return self._metadata[key]
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 91, in __getitem__
return dec(self._dict[key])
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 202, in __getitem__
return self._metadata[key]
KeyError: 'VERSION'
2019-04-09 15:54:07,484-0400 INFO (jsonrpc/0) [storage.TaskManager.Task] (Task='00c843c2-ab43-4813-9ded-29f6742c33b2') aborting: Task is aborted: u"'VERSION'" - code 100 (task:1181)
2019-04-09 15:54:07,484-0400 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH getStorageDomainInfo error='VERSION' (dispatcher:87)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
result = ctask.prepare(func, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
return m(self, *a, **kw)
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in prepare
raise self.error
KeyError: 'VERSION'
2019-04-09 15:54:07,484-0400 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo failed (error 350) in 0.01 seconds (__init__:312)
2019-04-09 15:54:07,502-0400 INFO (jsonrpc/3) [vdsm.api] START connectStorageServer(domType=7, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', u'vfs_type': u'glusterfs', u'connection': u'poseidon:/engine', u'user': u'kvm'}], options=None) from=::1,58130, task_id=d71268fe-0088-44e1-99e8-7bcc868b3b2e (api:48)
2019-04-09 15:54:07,521-0400 INFO (jsonrpc/3) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 0, 'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]} from=::1,58130, task_id=d71268fe-0088-44e1-99e8-7bcc868b3b2e (api:54)
2019-04-09 15:54:07,521-0400 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 0.02 seconds (__init__:312)
2019-04-09 15:54:07,533-0400 INFO (jsonrpc/4) [vdsm.api] START getStorageDomainStats(sdUUID=u'd99fb087-66d5-4adf-9c0c-80e60de17917', options=None) from=::1,58130, task_id=47c23532-80c9-4186-948b-be9ed264bfbf (api:48)
agent.log
MainThread::INFO::2019-04-09 16:07:23,647::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.1 started
MainThread::INFO::2019-04-09 16:07:23,699::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: potential.o2pos.com.br
MainThread::INFO::2019-04-09 16:07:23,825::hosted_engine::524::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2019-04-09 16:07:23,827::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.8.1'}
MainThread::ERROR::2019-04-09 16:07:23,828::hosted_engine::540::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2019-04-09 16:07:23,828::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 413, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 537, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 86, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'ping', options: {'addr': '192.168.8.1'}]
MainThread::ERROR::2019-04-09 16:07:23,829::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2019-04-09 16:07:23,829::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
brocker.log
MainThread::INFO::2019-04-09 16:08:00,892::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.1 started
MainThread::INFO::2019-04-09 16:08:00,892::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2019-04-09 16:08:00,893::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2019-04-09 16:08:00,895::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2019-04-09 16:08:00,895::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2019-04-09 16:08:00,896::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2019-04-09 16:08:00,896::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-load
MainThread::INFO::2019-04-09 16:08:00,896::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2019-04-09 16:08:00,897::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor ping
MainThread::INFO::2019-04-09 16:08:00,897::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2019-04-09 16:08:00,897::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2019-04-09 16:08:00,898::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2019-04-09 16:08:00,898::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2019-04-09 16:08:00,899::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2019-04-09 16:08:00,899::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-load
MainThread::INFO::2019-04-09 16:08:00,899::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2019-04-09 16:08:00,900::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor ping
MainThread::INFO::2019-04-09 16:08:00,900::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2019-04-09 16:08:00,900::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::INFO::2019-04-09 16:08:00,957::storage_backends::345::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2019-04-09 16:08:00,958::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2019-04-09 16:08:00,993::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2019-04-09 16:08:01,025::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::WARNING::2019-04-09 16:08:01,322::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command Image.prepare with args {'imageID': 'e525f96e-ffa3-43a8-a368-d473f064944a', 'storagepoolID': '00000000-0000-0000-0000-000000000000', 'volumeID': '12c2075c-4796-4185-b7f3-ed9f366d95ef', 'storagedomainID': 'd99fb087-66d5-4adf-9c0c-80e60de17917'} failed:
(code=100, message='VERSION')
5 years, 8 months