Add Gluster only nodes?
by mathias.westerlund@wdmab.se
Hello!
We have a medium sized hyper converged setup where we run SSD based bricks on the hosts. we also have an iscsi storage domain towards some spinner boxes for slow bulk storage but now we would like to add an medium sized storage option where mid sized requirements and mid level I/O requirements can exist.
Our hyperconverged is NVME Our spinner is Sata. We now want to add an sata SSD storage option.
We have some 25 slot hosts that are going to fill this usecase and we are now wondering, is there a way we can deploy gluster on these nodes using ovirt without giving them the VM role?
Due to this storage being aimed towards things such as databases and other intensive (But not as intensive as our NVME layer) use cases we dont want there to be any VM load on these machines because of the pure number of bricks on the machines and the redhat best practice of 1 core per brick.
Can i use cockpit/ovirt to deploy the above or do i have to set up an gluster cluster manually and then add it to ovirt???
3 years, 8 months
New Setup - hosts - are not synchronized with their Logical Network configuration: ovirtmgmt.
by morgan cox
Hi
After installing new hosts they show as status 'up'. however I get a warning such as
Host ng2-ovirt-kvm4's following network(s) are not synchronized with their Logical Network configuration: ovirtmgmt.
Which means ovirtmgmt is not working.
I realise the issue is due to no network config for it - its set to DHCP and there is no server.
The way I have setup the network on the hosts is
(I used the ovirt-node 4.4.4 iso to install the nodes)
/var/lib/vdsm/persistence/
├── netconf -> /var/lib/vdsm/persistence/netconf.JkcBbtya
└── netconf.JkcBbtya
├── bonds
│ └── bond0
├── devices
└── nets
├── ovirtmgmt
└── td-hv
td-hv - is a bonded bridge + VLAN and has the network config with correct ip and where the firewall access is allowing the ports used for ovirt (see config below)
Can I make the existing td-hv profile replace ovirtmgmt ? (i.e add ovirtmgmt functionality to existing network device td-hv)
Or should I use the existing settings from td-hv and add them to ovirtmgmt and remove td-hv ?
Config below - any advice would be great
---
/etc/sysconfig/network-scripts/ifcfg-bond0.1700
BONDING_OPTS="mode=active-backup downdelay=0 miimon=100 updelay=0"
TYPE=Bond
BONDING_MASTER=yes
NAME=bond0
UUID=5ab633aa-cd30-4ca8-9109-dbb4541f039b
DEVICE=bond0
ONBOOT=yes
HWADDR=
MACADDR=3C:A8:2A:15:F6:12
MTU=1500
LLDP=no
BRIDGE=ovirtmgmt
---
---
[root@ng2-ovirt-kvm4 mcox]# cat /etc/sysconfig/network-scripts/ifcfg-bond0.1700
VLAN=yes
TYPE=Vlan
PHYSDEV=bond0
VLAN_ID=1700
REORDER_HDR=yes
GVRP=no
MVRP=no
HWADDR=
NAME=bond0.1700
UUID=e2b318e8-83af-4573-a637-fe20326f2c1a
DEVICE=bond0.1700
ONBOOT=yes
MTU=1500
LLDP=no
BRIDGE=td-hv
---
---
[root@ng2-ovirt-kvm4 mcox]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
STP=no
TYPE=Bridge
HWADDR=
MTU=1500
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
DHCP_CLIENT_ID=mac
IPV4_DHCP_TIMEOUT=2147483647
IPV4_FAILURE_FATAL=no
IPV6_DISABLED=yes
IPV6INIT=no
NAME=ovirtmgmt
UUID=202586b0-ebe7-48ae-b316-e2cfd2dc4cc8
DEVICE=ovirtmgmt
ONBOOT=yes
AUTOCONNECT_SLAVES=yes
LLDP=no
---
3 years, 8 months
Need help recovering ovirt engine (was: New Management VLAN for hyperconverged cluster)
by David White
I was able to fix the connectivity issues between all 3 hosts.
It turned out that I hadn't completely deleted the old vlan settings from the host. I re-ran "nmcli connection delete" on the old vlan. After that, I had to edit a network-scripts file and change/fix the bridge to use ifcfg-ovirtmgmt. After I did all that, the problematic host was accessible again. All 3 Gluster peers are now able to see each other and communicate over the management network.
From the command line, I was then able to successfully run "hosted-engine --connect-storage" without errors. I was also able to then run "hosted-engine --vm-start".
Unfortunately, the engine itself is still unstable, and when I access the web UI / oVirt Manager, it shows that all 3 hosts are inaccessible and down.
I don't understand how the web UI is operational at all if the engine thinks that all 3 hosts are inaccessible. What's going on there?
Although the initial problem was my own doing (I changed the management vlan), I'm deeply concerned with how unstable everything became - and has continued to be- ever since I lost connectivity to the 1 host. I thought the point of all of this was that things would (should) continue to work if 1 of the hosts went away.
Anyway, at that point, all 3 hosts are able to communicate with each other over the management network, but the engine still thinks that all 3 hosts are down, and is unable to manage anything.
Any suggestions on how to proceed would be much appreciated.
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, April 7, 2021 8:28 PM, David White <dmwhite823(a)protonmail.com> wrote:
> I still haven't been able to resurrect the 1st host, so I've spent some time trying to get the hosted engine stable. I would welcome input on how to fix the problematic host so that it can be accessible again.
>
> As per my original email, this all started when I tried to change the management vlan. I honestly cannot remember what I did (if anything) to the actual hosts when this all started, but my troubleshooting steps today have been to try to fiddle with the vlan settings and /etc/sysconfig/network-scripts/ files on the problematic host to switch from the original vlan (1) to the new vlan (10).
>
> Until then, I'm troubleshooting why the hosted engine isn't really working, since the other two hosts are operational.
>
> The hosted engine is "running" -- I can access and navigate around the oVirt Manager.
> However, it appears that all of the storage domains are down, and all of the hosts are "NonOperational". I was, however, able to put two of the hosts into Maintenance Mode, including the problematic 1st host.
>
> This is what I see on the 2nd host:
>
> [root@cha2-storage network-scripts]# gluster peer status
> Number of Peers: 2
>
> Hostname: cha1-storage.mgt.example.com
> Uuid: 348de1f3-5efe-4e0c-b58e-9cf48071e8e1
> State: Peer in Cluster (Disconnected)
>
> Hostname: cha3-storage.mgt.example.com
> Uuid: 0563c3e8-237d-4409-a09a-ec51719b0da6
> State: Peer in Cluster (Connected)
>
> [root@cha2-storage network-scripts]# hosted-engine --vm-status
> The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
>
> [root@cha2-storage network-scripts]# hosted-engine --connect-storage
> Traceback (most recent call last):
> File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
> File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
> exec(code, run_globals)
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/connect_storage_server.py", line 30, in <module>
> timeout=ohostedcons.Const.STORAGE_SERVER_TIMEOUT,
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 312, in connect_storage_server
> sserver.connect_storage_server(timeout=timeout)
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 394, in connect_storage_server
> 'Connection to storage server failed'
> RuntimeError: Connection to storage server failed
>
> The ovirt-engine-ha service seems to be continuously trying to load / activate, but failing:
> [root@cha2-storage network-scripts]# systemctl status -l ovirt-ha-agent
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
> Active: activating (auto-restart) (Result: exit-code) since Wed 2021-04-07 20:24:46 EDT; 60ms ago
> Process: 124306 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157)
> Main PID: 124306 (code=exited, status=157)
>
> Some recent entries in /var/log/ovirt-hosted-engine-ha/agent.log
> MainThread::ERROR::2021-04-07 20:22:59,115::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
> MainThread::INFO::2021-04-07 20:22:59,115::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
> MainThread::INFO::2021-04-07 20:23:09,717::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.4.6 started
> MainThread::INFO::2021-04-07 20:23:09,742::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Certificate common name not found, using hostname to identify host
> MainThread::INFO::2021-04-07 20:23:09,837::hosted_engine::548::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
> MainThread::INFO::2021-04-07 20:23:09,838::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'addr': '10.1.0.1', 'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}
> MainThread::ERROR::2021-04-07 20:23:09,839::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
> MainThread::ERROR::2021-04-07 20:23:09,842::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 85, in start_monitor
> response = self._proxy.start_monitor(type, options)
> File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
> return self.__send(self.__name, args)
> File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
> verbose=self.__verbose
> File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
> return self.single_request(host, handler, request_body, verbose)
> File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
> http_conn = self.send_request(host, handler, request_body, verbose)
> File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
> self.send_content(connection, request_body)
> File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
> connection.endheaders(request_body)
> File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
> self._send_output(message_body, encode_chunked=encode_chunked)
> File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
> self.send(msg)
> File "/usr/lib64/python3.6/http/client.py", line 974, in send
> self.connect()
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect
> self.sock.connect(base64.b16decode(self.host))
> FileNotFoundError: [Errno 2] No such file or directory
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
> return action(he)
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
> return he.start_monitoring()
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring
> self._initialize_broker()
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 561, in _initialize_broker
> m.get('options', {}))
> File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 91, in start_monitor
> ).format(t=type, o=options, e=e)
> ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'addr': '10.1.0.1', 'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}]
>
> MainThread::ERROR::2021-04-07 20:23:09,842::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
> MainThread::INFO::2021-04-07 20:23:09,842::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, April 7, 2021 5:36 PM, David White via Users <users(a)ovirt.org> wrote:
>
> > I'm working on setting up my environment prior to production, and have run into an issue.
> >
> > I got most things configured, but due to a limitation on one of my switches, I decided to change the management vlan that the hosts communicate on. Over the course of changing that vlan, I wound up resetting my router to default settings.
> >
> > I have the router operational again, and I also have 1 of my switches operational.
> > Now, I'm trying to bring the oVirt cluster back online.
> > This is oVirt 4.5 running on RHEL 8.3.
> >
> > The old vlan is 1, and the new vlan is 10.
> >
> > Currently, hosts 2 & 3 are accessible over the new vlan, and can ping each other.
> > I'm able to ssh to both hosts, and when I run "gluster peer status", I see that they are connected to each other.
> >
> > However, host 1 is not accessible from anything. I can't ping it, and it cannot get out.
> >
> > As part of my troubleshooting, I've done the following:
> > From the host console, I ran `nmcli connection delete` to delete the old vlan (VLAN 1).
> > I moved the /etc/sysconfig/network-scripts/interface.1 file to interface.10, and edited the file accordingly to make sure the vlan and device settings are set to 10 instead of 1, and I rebooted the host.
> >
> > The engine seems to be running, but I don't understand why.
> > From each of the hosts that are working (host 2 and host 3), I ran "hosted-engine --check-liveliness" and both hosts indicate that the engine is NOT running.
> >
> > Yet the engine loads in a web browser, and I'm able to log into /ovirt-engine/webadmin/.
> > The engine thinks that all 3 hosts is nonresponsive. See screenshot below:
> >
> > [Screenshot from 2021-04-07 17-33-48.png]
> >
> > What I'm really looking for help with is to get the first host back online.
> > Once it is healthy and gluster is healthy, I feel confident I can get the engine operational again.
> >
> > What else should I look for on this host?
> >
> > Sent with ProtonMail Secure Email.
3 years, 8 months
what host os version or ovirt node version I choose for my production setup
by dhanaraj.ramesh@yahoo.com
We had done successful POC of ovirt node & HE with 4.4.5 version and now planning for production. However since 4.4.5 ovirt node& HE based on centOS 8.3, yet Red hat announced CentOs8.3 EOL by End of this year, is there any progress/plan to release ovirt nodes with CentOS stream soon?
if we choose to go with Centos 8.3 based oVirt node, what could be the consequences in near term especially when it's come to next production release upgrade?
Kindly suggest me which Host version I should choose for Production setup?
3 years, 8 months
Adding physical RAM to the hosts
by David White
As readers will already know, I have three oVirt hosts in a hyperconverged cluster.
I had a DIMM with too many errors on it, so I scavenged from a spare server that I had in my office precisely for this purpose.
I went ahead and replaced all of the RAM, because the DIMMs in my spare were a different speed. While I was at it, I doubled the RAM on this 1 oVirt host.
So now, two hosts have 32GB each.
And the 3rd host now has 64GB.
So total, I should have 128GB of RAM available to the cluster.
The engine recognizes that the host now has 64GB of RAM. Screenshot below:
[Screenshot from 2021-04-12 19-59-59.png]
However, the engine dashboard still shows that only ~93GB of RAM are available to the cluster (screenshot below).
[Screenshot from 2021-04-12 19-58-15.png]
Should I be worried that the oVirt dashboard hasn't updated to ~128GB (or a bit less, I guess)? Do I need to do anything else?
Sent with ProtonMail Secure Email.
3 years, 8 months
Re: Engine Upgrade to 4.4.5 Version Number Question
by Yedidyah Bar David
On Tue, Apr 13, 2021 at 7:42 AM Nur Imam Febrianto <nur_imam(a)outlook.com> wrote:
>
> Fixes the issue by changing OVESETUP_ENGINE_CORE/enable=bool:'False' to OVESETUP_ENGINE_CORE/enable=bool:'True’
>
> Still don’t know why it shows false, but now I can confirm after changing that to True, running engine-setup again upgraded my Engien to 4.4.5.
>
>
>
> Thanks for your help.
Glad that it worked, and thanks for the report!
I strongly recommend that you spend some time investigating
what/who/why it was changed.
This is most likely a mistake someone made at some point. It might be
broader than just this file, or it might be some other kind of problem
(e.g. a machine crashed in the middle of engine-setup and left files
corrupted).
If you can't find why/who, and suspect it's a bug, we'd definitely
like to get more information.
Some things you can check (other than usual site-specific things like
who logged in, when you changed stuff there, etc.):
- All /var/log/ovirt-engine/setup/* logs, their timestamps, and their
contents (e.g. grep them all for "OVESETUP_ENGINE_CORE/enable").
- /etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf* - the file
itself, and backups (made by engine-setup or otherwise), timestamps
etc.
- /etc/ovirt-engine/uninstall.d/* - engine-setup (and other related
tools) keep there information about all files it created/chagned -
mainly used by engine-cleanup
Good luck and best regards,
--
Didi
3 years, 8 months
Re: Engine Upgrade to 4.4.5 Version Number Question
by Yedidyah Bar David
On Sun, Apr 11, 2021 at 10:19 AM Nur Imam Febrianto
<nur_imam(a)outlook.com> wrote:
>
> Only DWH are deployed on another machine.
Please provide more details.
The setup log you provided shows nothing as configured - all of
engine+dwh+grafana are disabled.
Best regards,
> I don't edit any configuration manually beside the one that used for SSO. Any clue why is this happened?
>
> Thanks before.
>
> Regards,
>
> Nur Imam Febrianto
> Sent from Nine
> ________________________________
> From: Yedidyah Bar David <didi(a)redhat.com>
> Sent: Sunday, April 11, 2021 13:45
> To: Nur Imam Febrianto
> Cc: oVirt Users
> Subject: Re: [ovirt-users] Engine Upgrade to 4.4.5 Version Number Question
>
> On Sat, Apr 10, 2021 at 8:08 AM Nur Imam Febrianto <nur_imam(a)outlook.com> wrote:
> >
> > Here is the setup logs.
>
> Do you have "everything" (engine, dwh, etc.) on the same machine, or
> different stuff set up on different machines?
>
> Any chance you manually edited
> /etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf for any
> reason?
>
> I see there:
>
> 2021-03-18 22:45:25,952+0700 DEBUG otopi.context
> context.dumpEnvironment:775 ENV
> OVESETUP_ENGINE_CORE/enable=bool:'False'
>
> This normally happens only if the engine was not configured on this machine.
>
> Best regards,
>
> >
> > Thanks.
> >
> >
> >
> > From: Yedidyah Bar David
> > Sent: 07 April 2021 12:36
> > To: Nur Imam Febrianto
> > Cc: oVirt Users
> > Subject: Re: [ovirt-users] Engine Upgrade to 4.4.5 Version Number Question
> >
> >
> >
> > On Tue, Apr 6, 2021 at 6:49 PM Nur Imam Febrianto <nur_imam(a)outlook.com> wrote:
> > >
> > > I’m currently trying to upgrade our cluster from 4.4.4 to 4.4.5. All using oVirt Node.
> > >
> > > All host are successfully upgraded to 4.4.5 (I can see the image layer was changed from 4.4.4 to 4.4.5, cockpit also shows same version), but in Engine VM, already run engine-upgrade-check, upgrading ovirt-setup, running engine-setup successfully, reboot the engine but whenever I open the engine web page, it still showing Version 4.4.4.5-1.el8. Is this version correct ? Because my second Cluster that use their own hosted engine shows different version (4.4.5.11-1.el8). Anybody having a same issues like me ?
> >
> > Please share the setup log (in /var/log/ovirt-engine/setup). Thanks.
> >
> > Best regards,
> > --
> > Didi
> >
> >
>
>
>
> --
> Didi
>
--
Didi
3 years, 8 months
Ovirt with Oracle Linux 8.3 & Cockpit - Direct Luns
by hansontodd@gmail.com
Is Ovirt compatible with Oracle Linux 8.3? I have KVM & Cockpit running on Oracle Linux 8.3 but don't see a way to map direct luns (like RDM's) to the VM in the Cockpit interface.
Do I need to move back to Centos 7.x or RHEL 7.x with Ovirt to be able to use direct luns? Can Ovirt and Cockpit coexist?
3 years, 8 months
ipxe support with Q35 chipset and UEFI
by Gianluca Cecchi
Hello,
I'm doing some tests with iPXE (downloaded latest version cloning the git
repo some days ago).
I'm using oVirt 4.4.5 with VMs configured with different chipset/firmware
type.
It seems that using dhcpd.conf directive of type
if exists user-class and option user-class = "iPXE" {
filename "http://my_http_server/...";
}
else {
...
}
the VM boot catches it when I use Q35 Chipset with BIOS, while it goes
inside the "else" section if using Q35 Chipset with UEFI (not the
SecureBoot one)
Does this mean that the Q35 UEFI doesn't support iPXE?
BTW: if anyone has suggestions about an utility that can let me boot via
network and give the user a general menu from which he/she can go and
choose standard pxe with bios or uefi boot, to be able to install both
Linux based systems and other ones, such as ESXi hosts, both with BIOS and
UEFI, it is welcome
Gianluca
3 years, 8 months