login to self hosted endine during deployment
by jost.rakovec@snt.si
Is it possible to ssh or any other way connect to self hosted engine during deployment, since anssible is using ssh? I would like to check something?
something like this:
ssh -C -o ControlMaster=auto -o ControlPersist=60s -o User="root" -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ControlPath=/root/.ansible/cp/45a9ce675c -tt olvman.example.local /bin/sh -c
Thanks
3 years, 9 months
Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused
by souvaliotimaria@mail.com
Hello everyone,
Any help would be greatly appreciated in the following problem.
In my lab, the day before yesterday, we had power issues, with a UPS going off-line and following the power outage of the NFS/DNS server I have set up to serve ovirt with isos and as a DNS server (our other DNS servers are located as VMs within the oVirt environment). We found a broadcast storm on the switch (due to a faulty NIC on the aformentioned UPS) that the ovirt nodes are connected and later on had to re-establish several of the virtual connections as well. The above led to one of the hosts becoming NonResponsive, two machines becoming unresponsive and three VMs shuting down.
The oVirt environment, version 4.3.5.2, is a replica 2 + arbiter 1 environment and runs GlusterFS with the recommended volumes of data, engine and vmstore.
So far, the times there was some kind of a problem, usually oVirt was able to solve it by its own.
This time, however, after we recovered from the above state, the volumes of data and vmstore successfully healing , the volume engine became stuck to the healing process (Up, unsynched entries, needs healing), and from the web GUI I see that the VM HostedEngine is paused due to a storage I/O error while the output of virsh list --all command shows that the HostedEngine is running.. How is that happening?
I tried to manually trigger the healing process for the volume but nothing with
gluster volume heal engine
The command
gluster volume heal engine info
shows the following
[root@ov-no3 ~]# gluster volume heal engine info
Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
Status: Connected
Number of entries: 1
Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
Status: Connected
Number of entries: 1
This morning I came upon this Reddit post https://www.reddit.com/r/gluster/comments/fl3yb7/entries_stuck_in_heal_pe... where it seems that after a graceful reboot one of the ovirt hosts, the gluster came back online after it completed the appropriate healing processes. The thing is from what I have read that when there are unsynched entries in the gluster a host cannot be put into maintenance mode so that it can be rebooted, correct?
Should I try to restart the glusterd service.
Could someone tell me what I should do?
Thank you all for your time and help,
Maria Souvalioti
3 years, 9 months
VM CPU Pinning
by lavi.buchnik@exlibrisgroup.com
Hi,
When trying to pin a VM CPU's to Physical CPU's, I'm getting this error in case the number of physical CPU slots are > 35 (e.g. 40):
Error attempting to pin CPUs.
Full error: Fault reason is "Operation Failed". Fault detail is "[size must be between 0 and 4000, Attribute: vmStatic.cpuPinning]". HTTP response code is "400". HTTP response message is "Bad Request".
It is working for me up to 35 physical CPU's. but above it, I get that error.
Can you please tell what this error means and how to overcome it?
Version we are using is: 4.3.6.6-1.0.9.el7
Thanks,
Lavi
3 years, 9 months
Update Package Conflict
by penguin pages
Fresh install of minimal CentOS8
Then deploy:
- EPEL
- Add ovirt repo https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm
Install all nodes:
- cockpit-ovirt-dashboard
- gluster-ansible-roles
- vdsm-gluster
- ovirt-host
- ovirt-ansible-roles
- ovirt-ansible-infra
Install on "first node of cluster"
- ovirt-engine-appliance
Now each node is stuck with same package conflict error: (and this blocks GUI "upgrades")
[root@medusa ~]# yum update
Last metadata expiration check: 0:55:35 ago on Wed 10 Mar 2021 08:14:22 AM EST.
Error:
Problem 1: package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package cockpit-bridge-238.1-1.el8.x86_64 conflicts with cockpit-dashboard < 233 provided by cockpit-dashboard-217-1.el8.noarch
- cannot install the best update candidate for package ovirt-host-4.4.1-4.el8.x86_64
- cannot install the best update candidate for package cockpit-bridge-217-1.el8.x86_64
Problem 2: problem with installed package ovirt-host-4.4.1-4.el8.x86_64
- package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package cockpit-system-238.1-1.el8.noarch obsoletes cockpit-dashboard provided by cockpit-dashboard-217-1.el8.noarch
- cannot install the best update candidate for package cockpit-dashboard-217-1.el8.noarch
Problem 3: package ovirt-hosted-engine-setup-2.4.9-1.el8.noarch requires ovirt-host >= 4.4.0, but none of the providers can be installed
- package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package ovirt-host-4.4.1-1.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package ovirt-host-4.4.1-2.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package ovirt-host-4.4.1-3.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package cockpit-system-238.1-1.el8.noarch obsoletes cockpit-dashboard provided by cockpit-dashboard-217-1.el8.noarch
- cannot install the best update candidate for package ovirt-hosted-engine-setup-2.4.9-1.el8.noarch
- cannot install the best update candidate for package cockpit-system-217-1.el8.noarch
(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
[root@medusa ~]# yum update --allowerasing
Last metadata expiration check: 0:55:56 ago on Wed 10 Mar 2021 08:14:22 AM EST.
Dependencies resolved.
=========================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=========================================================================================================================================================================================================================================
Upgrading:
cockpit-bridge x86_64 238.1-1.el8 baseos 535 k
cockpit-system noarch 238.1-1.el8 baseos 3.4 M
replacing cockpit-dashboard.noarch 217-1.el8
Removing dependent packages:
cockpit-ovirt-dashboard noarch 0.14.17-1.el8 @ovirt-4.4 16 M
ovirt-host x86_64 4.4.1-4.el8 @ovirt-4.4 11 k
ovirt-hosted-engine-setup noarch 2.4.9-1.el8 @ovirt-4.4 1.3 M
Transaction Summary
=========================================================================================================================================================================================================================================
Upgrade 2 Packages
Remove 3 Packages
##
Initially I assumed this was a path I was taking that was not standard.. but now I think this is some ovirt vs CentOS package repo issue. Any work arounds or root cause to fix this from repo conflict?
3 years, 9 months
How to replace a failed oVirt Hyperconverged Host
by Ramon Sierra
Hi,
We have a three hosts hyperconverged ovirt setup. A few weeks ago one of
the hosts failed and we lost a RAID5 array on it. We removed it from the
cluster and repaired. We are trying to setup and add it back to the
cluster but we are not clear on how to proceed. There is a replica 2
with 1 arbiter Gluster setup on the cluster and I have no idea on how to
recreate the LVM partitions, gluster bricks, and then add it to the
cluster in order to start the healing process.
Any help on how to proceed with this scenario will be very welcome.
Ramon
3 years, 9 months
oVirt 4.3.6 and Security Measures
by scroodj@gmail.com
Hello team,
Due to security policy in the our customer`s company there is need to implement some changes into machines in their oVirt cluster (Standalone Engine + 2 KVM Host).
1. The home drives of user sanlock (/var/run/sanlock) and gluster (/run/gluster) have permission of 775. We would like to have them at least 755 if not stricter. Is that possible?
2. NFS mount of storage has ‘nodev’ and ‘nosuid’ disabled. Is it safe to use those options for NFS Storage doamin?
3. Usually bridged routing is not allowed on managed servers. Security scan asks us to set the following four parameters to 0
Network Parameter "net.ipv4.conf.all.send_redirects" = 1 (expected: 0)
Network Parameter "net.ipv4.conf.all.secure_redirects" = 1 (expected: 0)
Network Parameter "net.ipv6.conf.all.accept_redirects" = 1 (expected: 0)
Network Parameter "net.ipv4.conf.all.accept_redirects" = 1 (expected: 0)
Would changing them interfere with ovirtmgmt network?
Those are valid for all three machines in the cluster.
On the engine though there is httpd installed now and we have some findings there too:
1. There are modules installed that are on a blacklist. Can they be removed? The modules are:
mod_dav_lock
mod_userdir
mod_include
mod_dav_fs
mod_autoindex
mod_dav
mod_info
2. HTTP traces should be blocked so we would set “TraceEnable” to off in virtual host config. If HTTP traces are needed we would have to limit the verbs that are allowed.
3. Apache version information should be turned off to not inform potential attackers of which web server is running. Is that a problem for oVirt?
4. TLSv1.0 and TLSv1.1 are enabled but should be turned off.
5. HSTS should be turned on but is not yet.
6. Can we use X-Frame-Options header to append X-Frame-Options DENY (or SAEMORIGIN or at least ALLOW-FROM)?
7. Can we implement the X-Content-Type-Options HTTP header with “nosniff”?
8. Can we implement the X-XSS-Protection header with “1; mode=block”?
I know, this is quite a bit. But maybe you know the answers.
BR
Aleksandr
3 years, 9 months
engine - gluster volume import
by penguin pages
I keep going through cycles to get a HCI cluster to deploy.
Gluster is working fine. Standard from HCI wizard:
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore 49154 0 Y 40968
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore 49157 0 Y 3771
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore 49154 0 Y 5138
Self-heal Daemon on localhost N/A N/A Y 41172
Self-heal Daemon on medusast.penguinpages.l
ocal N/A N/A Y 5150
Self-heal Daemon on odinst.penguinpages.loc
al N/A N/A Y 3142
What I think happens when the engine is installed "on existing disk" is that it does not install components for gluster needed to present those to be deployed upon by gluster.
What is this service / install / means to get the engine to "add" gluster volumes?
Ex: When I click -> Storage -> (no volumes listed) -> New volume -> <empty>.... all drops downs are blank and cannot be populated.
3 years, 9 months
Commvault
by Colin Coe
Hi all
My workplace is considering replacing Arcserve (which has been terrible)
with Commvault.
Our virtualisation is a mix of about 95% RHV and 5% HyperV. We have a
handful of physical servers, a 50/50 mix of Windows and RHEL.
We run dual hot data centers, and want to backup locally (disk to disk)
then replicate to the other data center. Cloud is off the table.
Anyone have any experience with Commvault? War stories?
Happy to be pointed at other alternatives.
Thanks
3 years, 9 months
Setup Hosts Network Disabled
by Andrei Verovski
Hi !
I run into a problem which looks like a software bug.
Network -> Networks -> My_Net_Name -> Hosts
Setup Hosts Network button is disabled (greyed out). I deleted this network, created again, restarted hosted engine - no changes.
Is it possible to fix this for example from command line ?
Thanks in advance.
3 years, 9 months