Ovirt Maximums
by Brian Wilson
Is there a resource available that would describe the configurable maximums for the Ovirt Engine and Ovirt Nodes.
Things like:
How many Logical Networks per host?
If one does not exist does anyone have any experiences with reaching limits that caused issues specifically with regards to the number of networks on hosts.
5 years, 11 months
VM ramdomly unresponsive
by fsoyer
Hi all,
I continue to try to understand my problem between (I suppose) oVirt anf Gluster.
After my recents posts titled 'VMs unexpectidly restarted' that did not provide solution nor search idea, I submit to you another (related ?) problem.
Parallely with the problem of VMs down (that did not reproduce since Oct 16), I have ramdomly some events in the GUI saying "VM xxxxx is not responding." For example, VM "patjoub1" on 2018-11-11 14:34. Never the same hour, not all the days, often this VM patjoub1 but not always : I had it on two others. All VMs disks are on a volume DATA02 (with leases on the same volume).
Searching in engine.log, I found :
2018-11-11 14:34:32,953+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'Up' --> 'NotResponding'
2018-11-11 14:34:33,116+01 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Invalid or unknown guest architecture type '' received from guest agent
2018-11-11 14:34:33,176+01 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: VM_NOT_RESPONDING(126), VM patjoub1 is not responding.
...
...
2018-11-11 14:34:48,278+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-48) [] VM '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'NotResponding' --> 'Up'So it becomes up 15s after, and the VM (and the monitoring) see no downtime.
At this time, I see in vdsm.log of the nodes :
2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata (monitor:498)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in _pathChecked
delay = result.delay()
File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay
raise exception.MiscFileReadException(self.path, self.rc, self.err)
MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata', 1, 'Read timeout')
2018-11-11 14:33:49,450+0100 INFO (check/loop) [storage.Monitor] Domain ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469)
2018-11-11 14:33:59,451+0100 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata' is blocked for 20.00 seconds (check:282)
2018-11-11 14:34:09,480+0100 INFO (event/37) [storage.StoragePool] Linking /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937 to /rhev/data-center/6efda7f8-b62f-11e8-9d16-00163e263d21/ffc53fd8-c5d1-4070-ae51-2e91835cd937 (sp:1230)OK : so, DATA02 marked as blocked for 20s ? I definitly have a problem with gluster ? I'll inevitably find the reason in the gluster logs ? Uh : not at all.
Please see gluster logs here :
https://seafile.systea.fr/d/65df86cca9d34061a1e4/
Unfortunatly I discovered this morning that I have not the sanlock.log for this date. I don't understand why, the log rotate seems OK with "rotate 3", but I have no backups files :(.
But, luck in bad luck, the same event occurs this morning ! Same VM patjoub1, 2018-11-13 08:01:37. So I have added the sanlock.log for today, maybe it can help.
IMPORTANT NOTE : don't forget that Gluster log with on hour shift. For this event at 14:34, search at 13h34 in gluster logs.
I recall my configuration :
Gluster 3.12.13
oVirt 4.2.3
3 nodes where the third is arbiter (volumes in replica 2)
The nodes are never overloaded (CPU average 5%, no peak detected at the time of the event, mem 128G used at 15% (only 10 VMs on this cluster)). Network underused, gluster is on a separate network on a bond (2 NICs) 1+1Gb mode 4 = 2Gb, used in peak at 10%.
Here is the configuration for the given volume :
# gluster volume status DATA02
Status of volume: DATA02
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick victorstorage.local.systea.fr:/home/d
ata02/data02/brick 49158 0 Y 4990
Brick gingerstorage.local.systea.fr:/home/d
ata02/data02/brick 49153 0 Y 8460
Brick eskarinastorage.local.systea.fr:/home
/data01/data02/brick 49158 0 Y 2470
Self-heal Daemon on localhost N/A N/A Y 8771
Self-heal Daemon on eskarinastorage.local.s
ystea.fr N/A N/A Y 11745
Self-heal Daemon on victorstorage.local.sys
tea.fr N/A N/A Y 17055
Task Status of Volume DATA02
------------------------------------------------------------------------------
There are no active volume tasks
# gluster volume info DATA02
Volume Name: DATA02
Type: Replicate
Volume ID: 48bf5871-339b-4f39-bea5-9b5848809c83
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: victorstorage.local.systea.fr:/home/data02/data02/brick
Brick2: gingerstorage.local.systea.fr:/home/data02/data02/brick
Brick3: eskarinastorage.local.systea.fr:/home/data01/data02/brick (arbiter)
Options Reconfigured:
network.ping-timeout: 30
server.allow-insecure: on
cluster.granular-entry-heal: enable
features.shard-block-size: 64MB
performance.stat-prefetch: on
server.event-threads: 3
client.event-threads: 8
performance.io-thread-count: 32
storage.owner-gid: 36
storage.owner-uid: 36
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.server-quorum-ratio: 51%
So : is there someone around trying to make me understand what append ? Pleeease :/
--
Regards,
Frank
5 years, 11 months
Cloud-init reset network configuration to default dhcp after reboot and regular run
by Mike Lykov
Hi All!
I'm trying to configure network in VMs created from template, and see
strange behaviour from cloud-init.
cloud-init installed and enabled in template.
ver cloud-init-0.7.9-24.el7.centos.1.x86_64
guest Centos 7.5
ovirt 4.2.7 from ovirt-releases-42-pre repo
1. I use "run once" with initial run - use cloud-init - networks
in-guest net iface : eth0
add new - static
enter address, mask, gw
ipv6 none
Run (once) and cloud-init configure ifcfg-eth0 for that address
(successfully).
2. I shutdown that VM and use "Run" (regular) without "use cloud init"
in VM properties, awaiting that above configurations are saved (and
booted with it).
But because "use cloud init" not checked, and cloud-init service
enabled, it start, cannot find datasource and drop configuration to
default (dhcp).
In cloud-init.log
2018-11-20 13:53:24,153 - util.py[WARNING]: No instance datasource
found! Likely bad things to come!
2018-11-20 13:53:24,153 - util.py[DEBUG]: No instance datasource found!
Likely bad things to come!
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cloudinit/cmd/main.py", line
236, in main_init
init.fetch(existing=existing)
File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line
343, in fetch
return self._get_data_source(existing=existing)
File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line
253, in _get_data_source
pkg_list, self.reporter)
File
"/usr/lib/python2.7/site-packages/cloudinit/sources/__init__.py", line
320, in find_source
raise DataSourceNotFoundException(msg)
DataSourceNotFoundException: Did not find any data source, searched
classes: (DataSourceNoCloudNet)
2018-11-20 13:53:24,194 - stages.py[DEBUG]: applying net config names
for {'version': 1, 'config': [{'subnets': [{'type': 'dhcp'}],
'type': 'physical', 'name': 'eth0', 'mac_address': '56:6f:21:4a:00:04'}]}
It reverts config as in here
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/NE27UO4WNZIC...
3. then I found bug
https://bugzilla.redhat.com/show_bug.cgi?id=1439373#c5
and when "run once" I disable network config as in that comment.
Shutdown, Run (not once) and voila! Ip address are static!
2018-11-20 15:07:31,601 - handlers.py[DEBUG]: finish:
init-network/search-NoCloudNet: SUCCESS: no network data found from
DataSource NoCloudNet
.....
2018-11-20 15:07:31,602 - util.py[WARNING]: No instance datasource
found! Likely bad things to come!
2018-11-20 15:07:31,602 - util.py[DEBUG]: No instance datasource found!
Likely bad things to come!
2018-11-20 15:07:31,639 - stages.py[DEBUG]: network config disabled by
system_cfg
2018-11-20 15:07:31,639 - stages.py[INFO]: network config is disabled by
system_cfg
2018-11-20 15:07:31,639 - main.py[DEBUG]: [net] Exiting without
datasource in local mode
2018-11-20 15:07:31,640 - util.py[DEBUG]: Reading from /proc/uptime
(quiet=False)
2018-11-20 15:07:31,640 - util.py[DEBUG]: Read 12 bytes from /proc/uptime
2018-11-20 15:07:31,640 - util.py[DEBUG]: cloud-init mode 'init' took
0.287 seconds (0.29)
2018-11-20 15:07:31,640 - handlers.py[DEBUG]: finish: init-network:
SUCCESS: searching for network datasources
"cloud-init used to use a "marker" file that it created on initial
execution. If that "marker" file existed it would not rerun on reboot. "
- are it not working in ovirt/this cloud-init version ?
--
Mike
5 years, 11 months
Repository ovirt-4.2 is listed more than once in the configuration
by Markus Frei
Hello everybody
I am frequently receiving the following messages from ovirt:
Repository ovirt-4.2-epel is listed more than once in the configuration
Repository ovirt-4.2-centos-gluster312 is listed more than once in the configuration
Repository ovirt-4.2-virtio-win-latest is listed more than once in the configuration
Repository ovirt-4.2-centos-qemu-ev is listed more than once in the configuration
Repository ovirt-4.2-centos-opstools is listed more than once in the configuration
Repository centos-sclo-rh-release is listed more than once in the configuration
Repository ovirt-4.2-centos-ovirt42 is listed more than once in the configuration
Repository ovirt-4.2 is listed more than once in the configuration
I`ve got following repos configured in directory /etc/yum.repos.d:
ovirt-4.2-dependencies.repo
ovirt-4.2-pre-dependencies.repo
ovirt-4.2-pre.repo
ovirt-4.2.repo
I deleted the 'pre' repos several times but apparently they are being recreated somehow.
Can anybody explain this behaviour please and eventually the meaning of these 'pre' repos in generell as well?
Thank you in advance.
Kind regards,
Chris
5 years, 11 months
Best Openstack version to integrate with oVirt 4.2.7
by Gianluca Cecchi
Hello,
do you think it is ok to use Rocky version of Openstack to integrate its
services with oVirt 4.2.7 on CentOS 7?
I see on https://repos.fedorapeople.org/repos/openstack/ that, if Rocky is
too new, between the older releases available there are, from newer to
older:
Queens
Pike
Ocata
Newton
At the moment I have two separate lab environments:
oVirt with 4.2.7
Openstack with Rocky (single host with packstack allinone)
just trying first integration steps with these versions, it seems I'm not
able to communicate with glance, because I get in engine.log
2018-11-10 17:32:58,386+01 ERROR
[org.ovirt.engine.core.bll.provider.storage.AbstractOpenStackStorageProviderProxy]
(default task-51) [e2fccee7-1bb2-400f-b8d3-b87b679117d1] Not Found
(OpenStack response error code: 404)
Nothing in glance logs on openstack, apparently.
In my test I'm using
http://xxx.xxx.xxx.xxx:9292 as provider url
checked the authentication check box and
glance user with its password
35357 as the port and services as the tenant
a telnet on port 9292 of openstack server from engine to openstack is ok
similar with cinder I get:
2018-11-10 17:45:42,226+01 ERROR
[org.ovirt.engine.core.bll.provider.storage.AbstractOpenStackStorageProviderProxy]
(default task-50) [32a31aa7-fe3f-460c-a8b9-cc9b277deab7] Not Found
(OpenStack response error code: 404)
So before digging more I would lile to be certain which one is currently
the best combination, possibly keeping as fixed the oVirt version to 4.2.7.
Thanks,
Gianluca
5 years, 11 months
FCoE wont initialize on reboot oVirt 4.2.
by Jacob Green
I have HP BL460c Gen9 blades with BCM57840 NetXtreme II
10/20-Gigabit Ethernet via a HP Virtual connection. In my ovirt 4.1
environment I have fibre channel working great.
However in the new environment that I want to bring the data domain over
too ultimately, I am having issues with ovirt interfering with the hosts
ability to see the Fiber Channel storage. If i build a clean CentOS 7
installation and get my FCoE module installed and Fiber channel set up
on my appropriate interfaces, it works and sees the fiber channel
interface every time I type fcoeadm -i, I can reboot a million times,
does not go away. However once I turn it into a oVirt 4.2 node and add
it to my environment and reboot the blade it is hit or miss if fcoeadm
-i is going to return interface information.
If I then type "systemctl restart network" my fiber channel comes
online, but I should not need to do this. I can see in my dmesg logs
that the fiber channel is initializing on boot.
[ 39.465578] cnic: QLogic cnicDriver v2.5.22 (July 20, 2015)
[ 39.465594] bnx2x 0000:06:00.2 eno51: Added CNIC device
[ 39.475618] bnx2x 0000:06:00.3 eno52: Added CNIC device
[ 39.495575] bnx2fc: QLogic FCoE Driver bnx2fc v2.11.8 (October 15, 2015)
[ 39.505971] bnx2fc: bnx2fc: FCoE initialized for eno52.
[ 39.506299] bnx2fc: [06]: FCOE_INIT passed
[ 39.516308] bnx2fc: bnx2fc: FCoE initialized for eno51.
[ 39.516654] bnx2fc: [06]: FCOE_INIT passed
Reminder I have not added a FC storage domain yet, because I need to
turn off and detach the domain from the old 4.1 environment first.
However that should not keep the fiber channel interfaces from coming up...
And I need to know its working before I do that.
Below is what an fcoeadm -i should return when it seems to be working.
____________________________________________________________________________________________________
fcoeadm -i
Description: BCM57840 NetXtreme II 10/20-Gigabit Ethernet
Revision: 11
Manufacturer: Broadcom Limited
Serial Number: 5CB901C7EE00
Driver: bnx2x 1.712.30-0
Number of Ports: 1
Symbolic Name: bnx2fc (QLogic BCM57840) v2.11.8 over eno51
OS Device Name: host10
Node Name: 0x50060b0000c27a05
Port Name: 0x50060b0000c27a04
Fabric Name: 0x1000c4f57c218ff4
Speed: unknown
Supported Speed: 1 Gbit, 10 Gbit
MaxFrameSize: 2048 bytes
FC-ID (Port ID): 0x0a025c
State: Online
Description: BCM57840 NetXtreme II 10/20-Gigabit Ethernet
Revision: 11
Manufacturer: Broadcom Limited
Serial Number: 5CB901C7EE00
Driver: bnx2x 1.712.30-0
Number of Ports: 1
Symbolic Name: bnx2fc (QLogic BCM57840) v2.11.8 over eno52
OS Device Name: host11
Node Name: 0x50060b0000c27a07
Port Name: 0x50060b0000c27a06
Fabric Name: 0x1000c4f57c21979d
Speed: unknown
Supported Speed: 1 Gbit, 10 Gbit
MaxFrameSize: 2048 bytes
FC-ID (Port ID): 0x14037b
State: Online
____________________________________________________________________________________________________
However if I reboot the node from the ovirt console and wait a few
minutes after it has rebooted then type fcoeadm -i I get the following.
fcoeadm -i
fcoeadm: No action was taken
Try 'fcoeadm --help' for more information.
it is not until I perform a systemctl restart network, that I get the
correct output from above.
Any help or insight into fiber channel with ovirt 4.2 would be greatly
appreciated.
--
Jacob Green
Systems Admin
American Alloy Steel
713-300-5690
5 years, 11 months
SPICE QXL Crashes Linux Guests
by Alex McWhirter
I'm having an odd issue that i find hard to believe could be a bug, and
not some kind of user error, but im at a loss for where else to look.
when booting a linux ISO with QXL SPICE graphics, the boot hangs as soon
as kernel modesetting kicks in. Tried with latest debian, fedora, and
centos. Sometimes it randomly works, but most often it does not. QXL /
VGA VNC work fine. However if i wait a few minutes after starting the VM
for the graphics to start, then there are no issues and i can install as
usual.
So after install, i reboot, hangs on reboot right after graphics switch
back to text mode with QXL SPICE, not with VNC. So i force power off,
reboot, and wait a while for it to boot. If i did text only install,
when i open a spice console it will hang after typing a few characters.
If i did a graphical install then as long as i waited long enough for X
to start, then it works perfectly fine.
I tried to capture some logs, but since the whole guest OS hangs it's
rather hard to pull off. I did see an occasional error about the mouse
driver, so that's really all i have to go on.
As for the spice client, im using virt-viewer on windows 10 x64, tried
various versions of virt-viewer just to be sure, no change. I also have
a large amount on windows guests with QXL SPICE. These all work with no
issue. Having guest agent installed in the linux guest seems to make no
difference.
There are no out of the ordinary logs on the VDSM hosts, but i can
provide anything you may need. It's not specific to any one host, i have
10 VM hosts in the cluster, they all do. They are westmere boxes if that
makes a difference.
Any ideas on how i should approach this? VNC works well enough for text
only linux guest, but not being able to reboot my GUI linux guests
without also closing my spice connection is a small pain.
as far as ovirt versions im on the latest, this is a rather fresh
install. just set it up a few days ago, but i've been a long time ovirt
user. I am using a squid spice proxy if that makes a difference.
5 years, 11 months
Out of sync, hosts network config differs from DC
by femi adegoke
Property: default route, host - true, DC - false
I have 4 nics.
bond0 = 2 x 10g
eno1 = ovirtmgmt
eno2 = for vm traffic.
eno2 says it's out of sync, hosts network config differs from DC,
default route: host - true, DC - false.
I have tried "sync all networks" but the message still remains
See attached.
Where can I look to fix the issue?
5 years, 11 months
UPS with USB Support
by florentl
Hi everybody,
I'm currently setting up an ovirt solution.
I have three servers with glusterfs. They run hosted engine.
I configured the power management of the node for using idrac agent (I
have Dell servers).
The communication is ok but I don't really know if configuring power
management is very usefull with only three servers.
My three servers are powered via an ups which have only one usb port.
I'm wondering what is the best option to shutdown the datacenter in case
of power failure.
I have read in previous posts that I have to write a script.
So I think I'm going to :
- install apcupsd on the node connected with the ups via usb
- configure apcupsd.conf to lanch a script and send a mail
- write a script to shutdown properly all the wm and to put the
datacenter in maintenance mode
What do you think about this solution ?
Is there a better way to manage the vm's and datacenter's shutdown with
the power management solution associated with ovirt host ?
Regards,
Florent
5 years, 11 months