Migrating from RHV to ovirt
by Andrés Jiménez
Hi, all
I do have a 5 hosts cluster with over 300 VMs in production with Red Hat Virtualization version 4.2.8 and I am willing
to switch to a pure CentOS + oVirt 4.2 setup.
Management is based in a RHVM hosted-engine in a dedicated iSCSI DataStore and our hosts are a mix of RHVH nodes (3) and
CentOS with oVirt 4.2 repositories (2).
I was wondering if I can just set up a new hosted-engine in a CentOS host and recover a backup of our current ovirt-
engine (rhvm). Would that work straight away without affecting my running VMs?
Cheers,
--
Andrés Jiménez Gómez
Linube
e-mail: soporte(a)linube.com
www.linube.com
La información contenida en este mensaje y/o archivo(s) adjunto(s) es confidencial/privilegiada y está destinada a ser
leída sólo por la(s) persona(s) a la(s) que va dirigida. Si usted lee este mensaje y no es el destinatario señalado, el
empleado o el agente responsable de entregar el mensaje al destinatario, o ha recibido esta comunicación por error, le
informamos que está totalmente prohibida, y puede ser ilegal, cualquier divulgación, distribución o reproducción de esta
comunicación, y le rogamos que nos lo notifique inmediatamente y nos devuelva el mensaje original a la dirección arriba
mencionada. Gracias
5 years, 6 months
Please Help, Ovirt Node Hosted Engine Deployment Problems 4.3.2
by Todd Barton
I've having to rebuild an environment that started back in the early 3.x days. A lot has changed and I'm attempting to use the Ovirt Node based setup to build a new environment, but I can't get through the hosted engine deployment process via the cockpit (I've done command line as well). I've tried static DHCP address and static IPs as well as confirmed I have resolvable host-names. This is a test environment so I can work through any issues in deployment.
When the cockpit is displaying the waiting for host to come up task, the cockpit gets disconnected. It appears to a happen when the bridge network is setup. At that point, the deployment is messed up and I can't return to the cockpit. I've tried this with one or two nic/interfaces and tried every permutation of static and dynamic ip addresses. I've spent a week trying different setups and I've got to be doing something stupid.
Attached is a screen capture of the resulting IP info after my latest try failing. I used two nics, one for the gluster and bridge network and the other for the ovirt cockpit access. I can't access cockpit on either ip address after the failure.
I've attempted this setup as both a single host hyper-converged setup and a three host hyper-converged environment...same issue in both.
Can someone please help me or give me some thoughts on what is wrong?
Thanks!
Todd Barton
5 years, 6 months
Stale hosted engine node information harmful ?
by Andreas Elvers
I have 5 nodes (node01 to node05). Originally all those nodes were part of our default datacenter/cluster with a NFS storage domain for vmdisk, engine and iso-images. All five nodes were engine HA nodes.
Later node01, node02 and node03 were re-installed to have engine HA removed. Then those nodes were removed from the default cluster. Eventually node01,02 and 03 were completely re-installed to host our new Ceph/Gluster based datecenter. The engine is still running on the old default Datacenter. Now I wish to move it over to our ceph/gluster datacenter.
when I look at the current output of "hosted-engine --vm-status" I see:
--== Host node01.infra.solutions.work (id: 1) status ==--
conf_on_shared_storage : True
Status up-to-date : False
Hostname : node01.infra.solutions.work
Host ID : 1
Engine status : unknown stale-data
Score : 0
stopped : True
Local maintenance : False
crc32 : e437bff4
local_conf_timestamp : 155627
Host timestamp : 155877
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=155877 (Fri Aug 3 13:09:19 2018)
host-id=1
score=0
vm_conf_refresh_time=155627 (Fri Aug 3 13:05:08 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True
--== Host node02.infra.solutions.work (id: 2) status ==--
conf_on_shared_storage : True
Status up-to-date : False
Hostname : node02.infra.solutions.work
Host ID : 2
Engine status : unknown stale-data
Score : 0
stopped : True
Local maintenance : False
crc32 : 11185b04
local_conf_timestamp : 154757
Host timestamp : 154856
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=154856 (Fri Aug 3 13:22:19 2018)
host-id=2
score=0
vm_conf_refresh_time=154757 (Fri Aug 3 13:20:40 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True
--== Host node03.infra.solutions.work (id: 3) status ==--
conf_on_shared_storage : True
Status up-to-date : False
Hostname : node03.infra.solutions.work
Host ID : 3
Engine status : unknown stale-data
Score : 0
stopped : False
Local maintenance : True
crc32 : 9595bed9
local_conf_timestamp : 14363
Host timestamp : 14362
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=14362 (Thu Aug 2 18:03:25 2018)
host-id=3
score=0
vm_conf_refresh_time=14363 (Thu Aug 2 18:03:25 2018)
conf_on_shared_storage=True
maintenance=True
state=LocalMaintenance
stopped=False
--== Host node04.infra.solutions.work (id: 4) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : node04.infra.solutions.work
Host ID : 4
Engine status : {"health": "good", "vm": "up", "detail": "Up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 245854b1
local_conf_timestamp : 317498
Host timestamp : 317498
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=317498 (Thu May 2 09:44:47 2019)
host-id=4
score=3400
vm_conf_refresh_time=317498 (Thu May 2 09:44:47 2019)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
--== Host node05.infra.solutions.work (id: 5) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : node05.infra.solutions.work
Host ID : 5
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 0711afa0
local_conf_timestamp : 318044
Host timestamp : 318044
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=318044 (Thu May 2 09:44:45 2019)
host-id=5
score=3400
vm_conf_refresh_time=318044 (Thu May 2 09:44:45 2019)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
The old node01, node02 and node03 are still present.
The new incarnations of node01, node02 and node03 will be the destination the the deployment of the new home of our engine to which I wish to restore the backup to. But I'm not sure, if (and how) the old date should be removed first.
5 years, 6 months
oVirt 4.3.3 - servers cannot be reached
by Wood Peter
Hi,
I setup AD authentication and from command line all looks good.
Unfortunately, on the Web UI users sometimes login successfully but most of
the times the login screen just hangs and after 2-3 min. it displays
"Unable to log in because servers cannot be reached. Try again later."
In engine.log I see this:
2019-05-02 11:12:11,581-07 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(default task-72) [] EVENT_ID: USER_VDC_LOGIN_FAILED(114), User
peter(a)ad.mycompany.com connecting from '10.12.29.48' failed to log in :
'Unable to log in because servers cannot be reached. Try again later.'.
2019-05-02 11:12:11,583-07 ERROR
[org.ovirt.engine.core.sso.servlets.InteractiveAuthServlet] (default
task-68) [] Cannot authenticate user 'peter(a)ad.mycompany.com' connecting
from '10.12.29.48': Unable to log in because servers cannot be reached. Try
again later.
Even when my login attempt on the Web UI is hanging I can still
successfully run the login test from shell:
ovirt-engine-extensions-tool aaa login-user --profile=ad.mycompany.com
--user-name=peter
The above command never fails. That makes me wonder why am I getting the
"servers cannot be reached" error?
I assume the AD servers cannot be reached but from the command line it
works perfect every time.
Any idea what could be the problem or where to look for the error.
Thank you,
-- Peter
5 years, 6 months
Re: Stale hosted engine node information harmful ?
by Strahil
In a Red Hat Solution , it is recommended to restart ovirt-ha-agent & ovirt-ha-broker.
I usually set the global maintenance and wait 20s-30s . Then I just stop on all nodes ovirt-ha-agent.service & ovirt-ha-broker.service . Once everywhere is stopped, start the 2 services on all nodes and wait 4-5min.
Last verify the status from each host, before removing global maintenance.
Best Regards,
Strahil NikolovOn May 2, 2019 12:30, Andreas Elvers <andreas.elvers+ovirtforum(a)solutions.work> wrote:
>
> I have 5 nodes (node01 to node05). Originally all those nodes were part of our default datacenter/cluster with a NFS storage domain for vmdisk, engine and iso-images. All five nodes were engine HA nodes.
> Later node01, node02 and node03 were re-installed to have engine HA removed. Then those nodes were removed from the default cluster. Eventually node01,02 and 03 were completely re-installed to host our new Ceph/Gluster based datecenter. The engine is still running on the old default Datacenter. Now I wish to move it over to our ceph/gluster datacenter.
>
> when I look at the current output of "hosted-engine --vm-status" I see:
>
> --== Host node01.infra.solutions.work (id: 1) status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : node01.infra.solutions.work
> Host ID : 1
> Engine status : unknown stale-data
> Score : 0
> stopped : True
> Local maintenance : False
> crc32 : e437bff4
> local_conf_timestamp : 155627
> Host timestamp : 155877
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=155877 (Fri Aug 3 13:09:19 2018)
> host-id=1
> score=0
> vm_conf_refresh_time=155627 (Fri Aug 3 13:05:08 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
>
>
> --== Host node02.infra.solutions.work (id: 2) status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : node02.infra.solutions.work
> Host ID : 2
> Engine status : unknown stale-data
> Score : 0
> stopped : True
> Local maintenance : False
> crc32 : 11185b04
> local_conf_timestamp : 154757
> Host timestamp : 154856
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=154856 (Fri Aug 3 13:22:19 2018)
> host-id=2
> score=0
> vm_conf_refresh_time=154757 (Fri Aug 3 13:20:40 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
>
>
> --== Host node03.infra.solutions.work (id: 3) status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : node03.infra.solutions.work
> Host ID : 3
> Engine status : unknown stale-data
> Score : 0
> stopped : False
> Local maintenance : True
> crc32 : 9595bed9
> local_conf_timestamp : 14363
> Host timestamp : 14362
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=14362 (Thu Aug 2 18:03:25 2018)
> host-id=3
> score=0
> vm_conf_refresh_time=14363 (Thu Aug 2 18:03:25 2018)
> conf_on_shared_storage=True
> maintenance=True
> state=LocalMaintenance
> stopped=False
>
>
> --== Host node04.infra.solutions.work (id: 4) status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : True
> Hostname : node04.infra.solutions.work
> Host ID : 4
> Engine status : {"health": "good", "vm": "up", "detail": "Up"}
> Score : 3400
> stopped : False
> Local maintenance : False
> crc32 : 245854b1
> local_conf_timestamp : 317498
> Host timestamp : 317498
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=317498 (Thu May 2 09:44:47 2019)
> host-id=4
> score=3400
> vm_conf_refresh_time=317498 (Thu May 2 09:44:47 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineUp
> stopped=False
>
>
> --== Host node05.infra.solutions.work (id: 5) status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : True
> Hostname : node05.infra.solutions.work
> Host ID : 5
> Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score : 3400
> stopped : False
> Local maintenance : False
> crc32 : 0711afa0
> local_conf_timestamp : 318044
> Host timestamp : 318044
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=318044 (Thu May 2 09:44:45 2019)
> host-id=5
> score=3400
> vm_conf_refresh_time=318044 (Thu May 2 09:44:45 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineDown
> stopped=False
>
>
> The old node01, node02 and node03 are still present.
>
> The new incarnations of node01, node02 and node03 will be the destination the the deployment of the new home of our engine to which I wish to restore the backup to. But I'm not sure, if (and how) the old date should be removed first.
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WF5BNCWZHS2...
5 years, 6 months
Re: All hosts non-operational after upgrading from 4.2 to 4.3
by Strahil
Are you able to access your iSCSI via the /rhev/data-center/mnt... mount point ?
Best Regards,
Strahil NikolovOn Apr 5, 2019 19:04, John Florian <jflorian(a)doubledog.org> wrote:
>
> I am in a severe pinch here. A while back I upgraded from 4.2.8 to 4.3.3 and only had one step remaining and that was to set the cluster compat level to 4.3 (from 4.2). When I tried this it gave the usual warning that each VM would have to be rebooted to complete, but then I got my first unusual piece when it then told me next that this could not be completed until each host was in maintenance mode. Quirky I thought, but I stopped all VMs and put both hosts into maintenance mode. I then set the cluster to 4.3. Things didn't want to become active again and I eventually noticed that I was being told the DC needed to be 4.3 as well. Don't remember that from before, but oh well that was easy.
>
> However, the DC and SD remains down. The hosts are non-op. I've powered everything off and started fresh but still wind up in the same state. Hosts will look like their active for a bit (green triangle) but then go non-op after about a minute. It appears that my iSCSI sessions are active/logged in. The one glaring thing I see in the logs is this in vdsm.log:
>
> 2019-04-05 12:03:30,225-0400 ERROR (monitor/07bb1bf) [storage.Monitor] Setting up monitor for 07bb1bf8-3b3e-4dc0-bc43-375b09e06683 failed (monitor:329)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 326, in _setupLoop
> self._setupMonitor()
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 348, in _setupMonitor
> self._produceDomain()
> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 158, in wrapper
> value = meth(self, *a, **kw)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 366, in _produceDomain
> self.domain = sdCache.produce(self.sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
> domain.getRealDomain()
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
> return self._cache._realProduce(self._sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
> domain = self._findDomain(sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
> return findMethod(sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, in _findUnfetchedDomain
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist: (u'07bb1bf8-3b3e-4dc0-bc43-375b09e06683',)
>
> How do I proceed to get back operational?
5 years, 6 months
Re: Fwd: [Gluster-users] Announcing Gluster release 5.5
by Strahil
Hi Darrel,
Will it fix the cluster brick sudden death issue ?
Best Regards,
Strahil NikolovOn Mar 21, 2019 21:56, Darrell Budic <budic(a)onholyground.com> wrote:
>
> This release of Gluster 5.5 appears to fix the gluster 3.12->5.3 migration problems many ovirt users have encountered.
>
> I’ll try and test it out this weekend and report back. If anyone else gets a chance to check it out, let us know how it goes!
>
> -Darrell
>
>> Begin forwarded message:
>>
>> From: Shyam Ranganathan <srangana(a)redhat.com>
>> Subject: [Gluster-users] Announcing Gluster release 5.5
>> Date: March 21, 2019 at 6:06:33 AM CDT
>> To: announce(a)gluster.org, gluster-users Discussion List <gluster-users(a)gluster.org>
>> Cc: GlusterFS Maintainers <maintainers(a)gluster.org>
>>
>> The Gluster community is pleased to announce the release of Gluster
>> 5.5 (packages available at [1]).
>>
>> Release notes for the release can be found at [3].
>>
>> Major changes, features and limitations addressed in this release:
>>
>> - Release 5.4 introduced an incompatible change that prevented rolling
>> upgrades, and hence was never announced to the lists. As a result we are
>> jumping a release version and going to 5.5 from 5.3, that does not have
>> the problem.
>>
>> Thanks,
>> Gluster community
>>
>> [1] Packages for 5.5:
>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/
>>
>> [2] Release notes for 5.5:
>> https://docs.gluster.org/en/latest/release-notes/5.5/
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users(a)gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
5 years, 6 months