A possible bug on Fedora 27
by Valentin Bajrami
Hi Community,
Recently we discovered that our VM's became unstable after upgrading
from Fedora 26 to Fedora 27. The journalctl log shows the following
Jan 29 20:03:28 host1.project.local libvirtd[2741]: 2018-01-29
19:03:28.789+0000: 2741: error : qemuMonitorIO:705 : internal error: End
of file from qemu monitor
Jan 29 20:09:14 host1.project.local libvirtd[2741]: 2018-01-29
19:09:14.111+0000: 2741: error : qemuMonitorIO:705 : internal error: End
of file from qemu monitor
Jan 29 20:10:29 host1.project.local libvirtd[2741]: 2018-01-29
19:10:29.584+0000: 2741: error : qemuMonitorIO:705 : internal error: End
of file from qemu monitor
A similar bug report is already present here:
https://bugzilla.redhat.com/show_bug.cgi?id=1523314 but doesn't reflect
our problem entirely. This bug seems to be triggered only when a VM is
shut down gracefully. In our case this is being triggered without
attempting to shutdown a VM. Again, this is causing the VM's to be
unstable and eventually they'll shut down by themselves.
Do you have any clue what could be causing this?
--
Met vriendelijke groeten,
Valentin Bajrami
6 years, 10 months
qemu-kvm images corruption
by Nicolas Ecarnot
TL;DR:
How to avoid images corruption?
Hello,
On two of our old 3.6 DC, a recent series of VM migrations lead to some
issues :
- I'm putting a host into maintenance mode
- most of the VM are migrating nicely
- one remaining VM never migrates, and the logs are showing :
* engine.log : "...VM has been paused due to I/O error..."
* vdsm.log : "...Improbable extension request for volume..."
After digging amongst the RH BZ tickets, I saved the day by :
- stopping the VM
- lvchange -ay the adequate /dev/...
- qemu-img check [-r all] /rhev/blahblah
- lvchange -an...
- boot the VM
- enjoy!
Yesterday this worked for a VM where only one error occurred on the qemu
image, and the repair was easily done by qemu-img.
Today, facing the same issue on another VM, it failed because the errors
were very numerous, and also because of this message :
[...]
Rebuilding refcount structure
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device
[...]
The PV/VG/LV are far from being full, so I guess I don't where to look at.
I tried many ways to solve it but I'm not comfortable at all with qemu
images, corruption and solving, so I ended up exporting this VM (to an
NFS export domain), importing it into another DC : this had the side
effect to use qemu-img convert from qcow2 to qcow2, and (maybe?????) to
solve some errors???
I also copied it into another qcow2 file with the same qemu-img convert
way, but it is leading to another clean qcow2 image without errors.
I saw that on 4.x some bugs are fixed about VM migrations, but this is
not the point here.
I checked my SANs, my network layers, my blades, the OS (CentOS 7.2) of
my hosts, but I see nothing special.
The real reason behind my message is not to know how to repair anything,
rather than to understand what could have lead to this situation?
Where to keep a keen eye?
--
Nicolas ECARNOT
6 years, 10 months
ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up
by Christopher Cox
Like the subject says.. I tried to clear the status from the vm_dynamic
for a VM, but it just goes back to 8.
Any hints on how to get things back to a known state?
I tried marking the node in maint, but it can't move the "Unknown" VMs,
so that doesn't work. I tried rebooting a VM, that doesn't work.
The state of the VMs is up.... and I think they are running on the node
they say they are running on, we just have the Unknown problem with VMs
on that one node. So... can't move them, reboot VMs doens't fix....
Any trick to restoring state so that oVirt is ok???
(what a mess)
6 years, 10 months
VM paused due unknown storage error
by Misak Khachatryan
Hi,
After upgrade to 4.2 i'm getting "VM paused due unknown storage
error". When i was upgrading i had some gluster problem with one of
the hosts, which i was fixed readding it to gluster peers. Now i see
something weir in bricks configuration, see attachment - one of the
bricks uses 0% of space.
How I can diagnose this? Nothing wrong in logs as I can see.
Best regards,
Misak Khachatryan
6 years, 10 months
Using upstream QEMU
by Harry Mallon
Hello all,
Has anyone used oVirt with non-oVirt provided QEMU versions?
I need a feature provided by upstream QEMU, but it is disabled in the oVirt/CentOS7 QEMU RPM.
I have two possible methods to avoid the issue:
1. Fedora has a more recent QEMU which is closer to 'stock'. I see that oVirt 4.2 has no Fedora support, but is it possible to install the host onto a Fedora machine? I am trying to use the master branch rpms as recommended in the "No Fedora Support" note with no luck currently.
2. Is it safe/sensible to use oVirt with a CentOS7 host running an upstream QEMU version?
Thanks,
Harry
Harry Mallon
CODEX | Senior Software Engineer
60 Poland Street | London | England | W1F 7NT
E harry.mallon(a)codex.online | T +44 203 7000 989
6 years, 10 months
ovirt 4.2.1 pre hosted engine deploy failure
by Gianluca Cecchi
Hello,
at the end of the command
hosted-engine --deploy
I get
[ INFO ] TASK [Detect ovirt-hosted-engine-ha version]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Set ha_version]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Create configuration templates]
[ INFO ] TASK [Create configuration archive]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Create ovirt-hosted-engine-ha run directory]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Copy configuration files to the right location on host]
[ INFO ] TASK [Copy configuration archive to storage]
[ ERROR ] [WARNING]: Failure using method (v2_runner_on_failed) in
callback plugin
[ ERROR ] (<ansible.plugins.callback.1_otopi_json.CallbackModule object at
0x2dd7d90>):
[ ERROR ] 'ascii' codec can't encode character u'\u2018' in position 496:
ordinal not in
[ ERROR ] range(128)
[ ERROR ] Failed to execute stage 'Closing up': Failed executing
ansible-playbook
[ INFO ] Stage: Clean up
[ INFO ] Cleaning temporary resources
[ INFO ] TASK [Gathering Facts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Remove local vm dir]
[ INFO ] changed: [localhost]
[ INFO ] Generating answer file
'/var/lib/ovirt-hosted-engine-setup/answers/answers-20180129164431.conf'
[ INFO ] Stage: Pre-termination
[ INFO ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable,
please check the issue,fix and redeploy
Log file is located at
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180129160956-a7itm9.log
[root@ov42 ~]#
Is there any known bug for this?
In log file I have:
2018-01-29 16:44:28,159+0100 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173
[WARNING]: Failure using method (v2_runner_on_failed) in callback plugin
2018-01-29 16:44:28,160+0100 DEBUG otopi.plugins.otopi.dialog.human
human.format:69 newline sent to logger
2018-01-29 16:44:28,160+0100 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173
(<ansible.plugins.callback.1_otopi_json.CallbackModule object at
0x2dd7d90>):
2018-01-29 16:44:28,160+0100 DEBUG otopi.plugins.otopi.dialog.human
human.format:69 newline sent to logger
2018-01-29 16:44:28,160+0100 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 'ascii'
codec can't encode character u'\u2018' in position 496: ordinal not in
2018-01-29 16:44:28,161+0100 DEBUG otopi.plugins.otopi.dialog.human
human.format:69 newline sent to logger
2018-01-29 16:44:28,161+0100 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173
range(128)
2018-01-29 16:44:28,161+0100 DEBUG otopi.plugins.otopi.dialog.human
human.format:69 newline sent to logger
2018-01-29 16:44:28,161+0100 DEBUG otopi.context context._executeMethod:143
method exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in
_executeMethod
method['method']()
File
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/target_vm.py",
line 193, in _closeup
r = ah.run()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/ansible_utils.py",
line 175, in run
raise RuntimeError(_('Failed executing ansible-playbook'))
RuntimeError: Failed executing ansible-playbook
2018-01-29 16:44:28,162+0100 ERROR otopi.context context._executeMethod:152
Failed to execute stage 'Closing up': Failed executing ansible-playbook
I'm testing deploy of nested self hosted engine with HE on NFS.
Thanks,
Gianluca
6 years, 10 months
Node network setup
by spfma.tech@e.mail.fr
--=_40d9d471a95f6be5f29c7f93c9060b7c
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Hi, I am trying to setup a cluster of two nodes, with self hoste Engine=
. Things went fine for the first machine, but it as rather messy about t=
he second one. I would like to have load balancing and failover for both=
management network and storage (NFS repository). So what should I exa=
ctly do to get a working network stack which can be recognized when I tr=
y to add this host to the cluster ? Have tried configuring bonds and b=
riges using Cockpit, using manual "ifcfg" files, but all the time I see=
the bridges and the bonds not linked in the Engine interface, so the ne=
w host cannot be enrolled. If I try to link "ovirtmgmt" to the the assoc=
iated bond, I have a connectivity loss because it is the management devi=
ce, and I have te restart the network services. As management configurat=
ion is not OK, I can't setup the storage connection. And if I just try=
to activate the host, I will install and configure things and then comp=
lain about missing "ovirtmgmt" and "nfs" networks, which both exist and=
work and Centos level. The interface, bonds and bridge names are copy=
/paste from the first server. # brctl show ovirtmgmt=0Abridge name bri=
dge id STP enabled interfaces=0Aovirtmgmt 8000.44a842394200 no bond0 # i=
p addr show bond0=0A33: bond0: mtu 1500 qdisc noqueue master ovirtmgmt=
state UP qlen 1000=0A link/ether 44:a8:42:39:42:00 brd ff:ff:ff:ff:ff:f=
f=0A inet6 fe80::46a8:42ff:fe39:4200/64 scope link =0A valid_lft forever=
preferred_lft forever=0A# ip addr show em1=0A2: em1: mtu 1500 qdisc mq=
master bond0 state UP qlen 1000=0A link/ether 44:a8:42:39:42:00 brd ff:=
ff:ff:ff:ff:ff=0A# ip addr show em3=0A4: em3: mtu 1500 qdisc mq master=
bond0 state UP qlen 1000=0A link/ether 44:a8:42:39:42:00 brd ff:ff:ff:f=
f:ff:ff By the way, is it mandatory to stop and disable NetworkManager=
or not ? Thanks for any kind of help :-)
--=_40d9d471a95f6be5f29c7f93c9060b7c
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<div>Hi,</div>=0A<div>=0A<div>I am trying to setup a cluster of two node=
s, with self hoste Engine.</div>=0A<div>Things went fine for the first m=
achine, but it as rather messy about the second one.</div>=0A<div>I woul=
d like to have load balancing and failover for both management network a=
nd storage (NFS repository).</div>=0A<div> </div>=0A<div>So what sh=
ould I exactly do to get a working network stack which can be recognized=
when I try to add this host to the cluster ?</div>=0A<div> </div>=
=0A<div>Have tried configuring bonds and briges using Cockpit, using man=
ual "ifcfg" files, but all the time I see the bridges and the bonds not=
linked in the Engine interface, so the new host cannot be enrolled.</di=
v>=0A<div>If I try to link "ovirtmgmt" to the the associated bond, I hav=
e a connectivity loss because it is the management device, and I have te=
restart the network services. As management configuration is not OK, I=
can't setup the storage connection.</div>=0A<div> </div>=0A<div>An=
d if I just try to activate the host, I will install and configure thing=
s and then complain about missing "ovirtmgmt" and "nfs" networks, which=
both exist and work and Centos level.</div>=0A<div> </div>=0A<div>=
The interface, bonds and bridge names are copy/paste from the first serv=
er.</div>=0A<div> </div>=0A<div># brctl show ovirtmgmt<br />bridge=
name bridge id S=
TP enabled interfaces<br />ovirtmgmt  =
; 8000.44a842394200 no &=
nbsp; bond0</div>=0A<div># ip addr show bond0<br />33:=
bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc no=
queue master ovirtmgmt state UP qlen 1000<br /> link/e=
ther 44:a8:42:39:42:00 brd ff:ff:ff:ff:ff:ff<br /> ine=
t6 fe80::46a8:42ff:fe39:4200/64 scope link <br />  =
; valid_lft forever preferred_lft forever<br /># ip addr sho=
w em1<br />2: em1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 150=
0 qdisc mq master bond0 state UP qlen 1000<br /> link/=
ether 44:a8:42:39:42:00 brd ff:ff:ff:ff:ff:ff<br /># ip addr show em3<br=
/>4: em3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc=
mq master bond0 state UP qlen 1000<br /> link/ether 4=
4:a8:42:39:42:00 brd ff:ff:ff:ff:ff:ff</div>=0A<div> </div>=0A<div>=
By the way, is it mandatory to stop and disable NetworkManager or not ?<=
/div>=0A<div> </div>=0A<div>Thanks for any kind of help :-)</div>=
=0A</div>
--=_40d9d471a95f6be5f29c7f93c9060b7c--
6 years, 10 months
Upgrade via reinstall?
by Jamie Lawrence
Hello,
I currently have an Ovirt 4.1.8 installation with a hosted engine using Gluster for storage, with the DBs hosted on a dedicated PG cluster.
For reasons[1], it seems possibly simpler for me to upgrade our installation by reinstalling rather than upgrading. In this case, I can happily bring down the running VMs/otherwise do things that one normally can't.
Is there any technical reason I can't/shouldn't rebuild from bare-metal, including creating a fresh hosted engine, without losing anything? I suppose a different way of asking this is, is there anything on the engine/host filesystems that I should preserve/restore for this to work?
Thanks,
-j
[1] If this isn't an option, I'll go in to them in order to figure out a plan B; just avoiding a lot of backstory that isn't needed for the question.
6 years, 10 months
engine add hosts
by 李强华
hello~,I want to add hosts ,but my host offline , can not connect to internal .engine add hosts , the message: installing hosts node failed . My engine was setup successful , i use engine-setup --offline OK,please help~
[cid:4eed0bd5-8485-41c4-b73b-b82855c988e7]
[cid:06876a73-d34b-435f-8479-32aec82c5a0b]
6 years, 10 months