HCL: 4.3.7: Hosted engine fails
by Christian Reiss
Hey all,
Using a homogeneous ovirt-node-ng-4.3.7-0.20191121.0 freshly created
cluster using node installer I am unable to deploy the hosted engine.
Everything else worked.
In vdsm.log is a line, just after attempting to start the engine:
libvirtError: the CPU is incompatible with host CPU: Host CPU does not
provide required features: virt-ssbd
I am using AMD EPYC 7282 16-Core Processors.
I have attached
- vdsm.log (during and failing the start)
- messages (for bootup / libvirt messages)
- dmesg (grub / boot config)
- deploy.log (browser output during deployment)
- virt-capabilites (virsh -r capabilities)
I can't think -or don't know- off any other log files of interest here,
but I am more than happy to oblige.
notectl check tells me
Status: OK
Bootloader ... OK
Layer boot entries ... OK
Valid boot entries ... OK
Mount points ... OK
Separate /var ... OK
Discard is used ... OK
Basic storage ... OK
Initialized VG ... OK
Initialized Thin Pool ... OK
Initialized LVs ... OK
Thin storage ... OK
Checking available space in thinpool ... OK
Checking thinpool auto-extend ... OK
vdsmd ... OK
layers:
ovirt-node-ng-4.3.7-0.20191121.0:
ovirt-node-ng-4.3.7-0.20191121.0+1
bootloader:
default: ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64)
entries:
ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64):
index: 0
title: ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64)
kernel:
/boot/ovirt-node-ng-4.3.7-0.20191121.0+1/vmlinuz-3.10.0-1062.4.3.el7.x86_64
args: "ro crashkernel=auto rd.lvm.lv=onn_node01/swap
rd.lvm.lv=onn_node01/ovirt-node-ng-4.3.7-0.20191121.0+1 rhgb quiet
LANG=en_GB.UTF-8 img.bootid=ovirt-node-ng-4.3.7-0.20191121.0+1"
initrd:
/boot/ovirt-node-ng-4.3.7-0.20191121.0+1/initramfs-3.10.0-1062.4.3.el7.x86_64.img
root: /dev/onn_node01/ovirt-node-ng-4.3.7-0.20191121.0+1
current_layer: ovirt-node-ng-4.3.7-0.20191121.0+1
The odd thing is the hosted engine vm does get started during initial
configuration and works. Just when the ansible stuff is done an its
moved over to ha storage the CPU quirks start.
So far I learned that ssbd is a mitigation protection but the flag is
not in my cpu. Well, ssbd is virt-ssbd is not.
I am *starting* with ovirt. I would really, really welcome it if
recommendations would include clues on how to make it happen.
I do rtfm, but I was unable to find anything (or any solution) anywhere.
Not after 80 hours of working on this.
Thank you all.
-Chris.
--
Christian Reiss - email(a)christian-reiss.de /"\ ASCII Ribbon
support(a)alpha-labs.net \ / Campaign
X against HTML
WEB alpha-labs.net / \ in eMails
GPG Retrieval https://gpg.christian-reiss.de
GPG ID ABCD43C5, 0x44E29126ABCD43C5
GPG fingerprint = 9549 F537 2596 86BA 733C A4ED 44E2 9126 ABCD 43C5
"It's better to reign in hell than to serve in heaven.",
John Milton, Paradise lost.
4 years, 10 months
OVirt Engine Server Died - Steps for Rebuilding the Ovirt Engine System
by bob.franzke@mdaemon.com
Full disclosure here.....I am not an Ovirt Expert. I am a network Engineer that has been forced to take over sysadmin duties for a departed co-worker. I have little experience with Ovirt so apologies up front for anything I say that comes across as stupid or "RTM" questions. Normally I would do just that but I am in a bind and am trying to figure this out quickly. We have an OVirt installation setup that consists of 4 nodes and a server that hosts the ovirt-engine all running CentOS 7. The server that hosts the engine has a pair of failing hard drives and I need to replace the hardware ASAP. Need to outline the steps needed to build a new server to serve as and replace the ovirt engine server. I have backed up the entire /etc directory and the backups being done nightly by the engine itself. I also backed up the iscsi info and took a printout of all the disk arrangement . The disk has gotten so bad at this point that the DB won't back up any longer. Get fatal:backup failed error when
trying to run the ovirt backup tool. Also the Ovirt management site is not rendering and I am not sure why.
Is there anything else I need to make sure I backup in order to migrate the engine from one server to another? Also, until I can get the engine running again, is there any tool available to manage the VMs on the hosts themselves. The VMs on the hosts are running but need a way to manage them if needed in case something happens while the engine is being repaired. Any info on this as well as what to backup and the steps to move the engine from one server to another would be much much appreciated. Sorry I know this a real RTM type post but I am in a bind and need a solution rather quickly. Thanks in advance.
4 years, 10 months
Question about HCI gluster_inventory.yml
by John Call
Hi ovirt-users,
I'm trying to automate my HCI deployment, but can't figure out how to
specify multiple network interfaces in gluster_inventory.yml. My servers
have two NICs, one for ovirtmgmt (and everything else), and the other is
just for Gluster. How should I populate the inventory/vars file? Is this
correct?
[root@rhhi1 hc-ansible-deployment]# pwd
/etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment
[root@rhhi1 hc-ansible-deployment]# cat gluster_inventory.yml
--lots of stuff omitted--
hc_nodes:
hosts:
host1-STORAGE-fqdn:
host2-STORAGE-fqdn:
host3-STORAGE-fqdn:
vars:
cluster_nodes:
- host1-ovirtmgmt-fqdn
- host2-ovirtmgmt-fqdn
- host3-ovirtmgmt-fqdn
gluster_features_hci_cluster: "{{ cluster_nodes }}"
gluster:
host2-STORAGE-fqdn:
host3-STORAGE-fqdn:
storage_domains:
[{"name":"data","host":"host1-STORAGE-fqdn","address":"host1-STORAGE-fqdn","path":"/data","mount_options":"backup-volfile-servers=host2-STORAGE-fqdn:host3-STORAGE-fqdn"},{"name":"vmstore","host":"host1-STORAGE-fqdn","address":"host1-STORAGE-fqdn","path":"/vmstore","mount_options":"backup-volfile-servers=host2-STORAGE-fqdn:host3-STORAGE-fqdn"}]
4 years, 10 months
Re: null
by Strahil
I do use automatic migration policy.
The main question you have to solve is:
1. Why the nodes became 'Non-operational' .Usually this happens when the management interface (in your case HoatedEngine VM) could not reach the nodes over the management network.
By default, management is going over the ovirtmgmt network. I guess you have created the new network. Marked that new network as management network and then the switch was off , causing 'Non-Operational state'.
2. Migrating VMs is usually a safe approach, but this behavior is quite strange. If a node ia Non-operational -> there could be no successful migration.
3. Some of the VMs got paused due to storage issue. Are you using GlusterFS, NFS or iSCSI ? If yes, you need to clarify why you lost your storage.
I guess for now you can mark each VM to be migrated only manually (VM -> Edit) and if they are critical VMs , set a high Availability from each VM's Edit options.
In such case, if a node fails, the VMs will be restarted on another node.
4. Have you setup node fencing ? For example APC, iLo, iDRAC and other fencing mechanisms can allow the HostedEngine use another Host as fencing proxy and to reset the problematic Hypervisor.
P.S.: You can define the following alias in '~/.bashrc' :
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then you can verify your VMs even when a HostedEngine is down:
'virsh list --all'
Best Regards,
Strahil NikolovOn Dec 27, 2019 08:40, zhouhao(a)vip.friendtimes.net wrote:
>
> I had a crash yesterday in my ovirt cluster, which is made up of 3 nodes.
>
> I just tried to add a new network, but the whole cluster crashed
>
> I added a new network to my cluster, but while I was debugging the newswitch, when the switch was poweroff, the node detected the network card status down and then moved to Non-Operational state.
>
>
>
> At this time all of 3 nodes moved to Non-Operational state.
>
> All virtual machines have started automatic migration,When I received the alert email, all virtual machines were suspended
>
>
>
>
>
> In 15 minutes my newswitch were power up again.The 3 ovirt-nodes become active again, but many virtual machines become unresponsive or suspended due to forced migration, and only a few virtual machines are pulled up again due to cancelled migration
>
> After I tried to terminate the migration tasks and restart ovirt-engine service, I was still unable to restore most of the virtual machines, so I had to restart 3 ovirt-nodes to restore my virtual machine
>
> I didn't recover all the virtual machines until an hour later
>
>
> Then I modify my migration policy to " Do Not Migrate Virtual Machines"
>
> Which migration Policy do you recommend?
>
> I'm afraid to use cluster...
>
> ________________________________
> zhouhao(a)vip.friendtimes.net
4 years, 10 months
ovirt 4.3.7 + Gluster in hyperconverged (production design)
by adrianquintero@gmail.com
Hi,
After playing a bit with oVirt and Gluster in our pre-production environment for the last year, we have decided to move forward with a our production design using ovirt 4.3.7 + Gluster in a hyperconverged setup.
For this we are looking get answers to a few questions that will help out with our design and eventually lead to our production deployment phase:
Current HW specs (total servers = 18):
1.- Server type: DL380 GEN 9
2.- Memory: 256GB
3.-Disk QTY per hypervisor:
- 2x600GB SAS (RAID 0) for the OS
- 9x1.2TB SSD (RAID[0, 6..]..? ) for GLUSTER.
4.-Network:
- Bond1: 2x10Gbps
- Bond2: 2x10Gbps (for migration and gluster)
Our plan is to build 2x9 node clusters, however the following questions come up:
1.-Should we build 2 separate environments each with its own engine? or should we do 1 engine that will manage both clusters?
2.-What would be the best gluster volume layout for #1 above with regards to RAID configuration:
- JBOD or RAID6 or…?.
- what is the benefit or downside of using JBOD vs RAID 6 for this particular scenario?
3.-Would you recommend Ansible-based deployment (if supported)? If yes where would I find the documentation for it? or should we just deploy using the UI?.
- I have reviewed the following and in Chapter 5 it only mentions Web UI https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infr...
- Also looked at https://github.com/gluster/gluster-ansible/tree/master/playbooks/hc-ansib... but could not get it to work properly.
4.-what is the recommended max server qty in a hyperconverged setup with gluster, 12, 15, 18...?
Thanks,
Adrian
4 years, 10 months
Re: Issue deploying self hosted engine on new install
by Sang Un Ahn
Hi,
I have figured it out that the root cause of the deployment failure is timing out while the hosted engine was trying to connect to host vis SSH as shown in engine.log (located in /var/log/ovirt-hosted-engine-setup/engine-logs-2019-12-31T06:34:38Z/ovirt-engine):
2019-12-31 15:43:06,082+09 ERROR [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Failed to establish session with host 'alice-ovirt-01.sdfarm.kr': SSH connection timed out connecting to 'root(a)alice-ovirt-01.sdfarm.kr'
2019-12-31 15:43:06,085+09 WARN [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Validation of action 'AddVds' failed for user admin@internal-authz. Reasons: VAR__ACTION__ADD,VAR__TYPE__HOST,$server alice-ovirt-01.sdfarm.kr,VDS_CANNOT_CONNECT_TO_SERVER
2019-12-31 15:43:06,129+09 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details.]
The FQDN of hosted engine (alice-ovirt-engine.sdfarm.kr <http://alice-ovirt-engine.sdfarm.kr/>) is resolved as well as the host (alice-ovirt-01.sdfarm.kr <http://alice-ovirt-01.sdfarm.kr/>) and SSH is the one of services that are allowed by firewalld. I believe the rules of firewalld is automatically configured during the deployment to work with hosted engine and the host. Also root access is configured to be allowed at the first stage of deployment.
I was just wondering how I can verify the hosted engine can access to the host at this stage? Once it fails to deploy, the deployment script make all things rolled back (I believe it cleans all up) and the vm-status of hosted-engine is un-deployed.
Thank you in advance,
Best regards,
Sang-Un
4 years, 10 months
Re: 4.2.8 to 4.3.7 > Management slow
by Strahil
You can manually change the I/O scheduler of the disks and if that works better for you, put a rule in udev.
Here is mine:
[root@engine rules.d]# cat /etc/udev/rules.d/90-default-io-scheduler.rules
ACTION=="add|change", KERNEL=="sd*[!0-9]", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="vd*[!0-9]", ATTR{queue/scheduler}="none"
[root@engine rules.d]# cat /sys/block/vda/queue/scheduler
[none] mq-deadline kyber
Best Regards,
Strahil NikolovOn Dec 31, 2019 12:30, Demeter Tibor <tdemeter(a)itsmart.hu> wrote:
>
> Dear Users,
>
> I've successfully upgraded my 4 node hyperconverged system from 4.2.8 to 4.3.7.
> After upgrade everything seems to working fine, but the whole management system seems very slow.
> Spends many seconds when I clicking on "virtual machines" or I want to edit a virtual machines.
> The speed of vms and the IO is fine.
>
> It is running on a glusterfs (distributed replicate, on 3 node, 9 bricks). There are no errors, everything fine. but terrible slow:(
> The engine vm has 0.2-0.3 load.
>
> What can I do?
>
> Thanks in advance and I wish Happy New Year!
>
> Regards,
> Tibor
>
>
4 years, 10 months
Re: Issue deploying self hosted engine on new install
by thomas@hoberg.net
Yes, I have had the same and posted about it here somewhere: I believe it's an incompatible Ansible change.
Here is the critical part if the message below:
"The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts"
and that change was made in the transition of Ansible 2.8* to 2.9, from what I gathered.
I guess I should just make it a bug report if you find the same message in your logs.
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "xdrd1022s.priv.atos.fr", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "priv.atos.fr", "subject": "O=priv.atos.fr,CN=xdrd1022s.priv.atos.fr"}, "cluster": {"href": "/ovirt-engine/api/clusters/c407e776-1c3c-11ea-aeed-00163e56112a", "id": "c407e776-1c3c-11ea-aeed-00163e56112a"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/a5bb73a1-f923-4568-8dda-434e07f7e243", "id": "a5bb73a1-f923-4568-8dda-434e07f7e243", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "xdrd1022s.priv.atos.fr", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supporte
d": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:wqpBWq9Kb9+Nb3Jwtw61QJzo+R4gGOP2dLubssU5EPs", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]}
4 years, 10 months
Re: Node not starting | blk_cloned_rq_check_limits: over max size limit
by Strahil
By the way, why do you use multipath for local storage like this EVO nvme ?
Happy New Year !
Best Regards,
Strahil NikolovOn Dec 31, 2019 21:51, Strahil <hunter86_bg(a)yahoo.com> wrote:
>
> You can check https://access.redhat.com/solutions/2437991 & https://access.redhat.com/solutions/3014361
>
> You have 2 options:
> 1. Set a udev rule like this one (replace NETAPP with your storage)
> ACTION!="add|change", GOTO="rule_end" ENV{ID_VENDOR}=="NETAPP*", RUN+="/bin/sh -c 'echo 4096 > /sys%p/queue/max_sectors_kb'" LABEL="rule_end"
>
> 2. Set max_sectors_kb in devices section of multipath.conf
> You will need to stop LVM and then flush the device map , so the new option to take effect (faster is to reboot)
>
> Good Luck & Happy New Year.
>
> Best Regards,
> Strahil Nikolov
> On Dec 31, 2019 17:53, Stefan Wolf <shb256(a)gmail.com> wrote:
> >
> > hi all,
> >
> > i ve 4 nodes running with current ovirt.
> > I ve only a problem on one host even after a fresh installation .
> > I ve installed the latest image.
> > Than I add the node to the cluster
> > Everything is working good.
> > After this I configure the network.
> > BUT, after a restart the host does not come up again.
> > I got this error: blk_cloned_rq_check_limits: over max size limit
> > every 5 seconds
> >
> > I can continue with control-D
> > or I can login with root password to fix the problem. but i dont know what is the problem and where does it came from
> >
> > I ve also changed the sas disk to nvme storage, but I ve changed this on every host. And this problem exists only on one host
> >
> > i found this https://lists.centos.org/pipermail/centos/2017-December/167727.html
> > the output is
> > [root@kvm380 ~]# ./test.sh
> > Sys Block Node : Device max_sectors_kb max_hw_sectors_kb
> > /sys/block/dm-0 : onn_kvm380-pool00_tmeta 256 4096
> > /sys/block/dm-1 : onn_kvm380-pool00_tdata 256 4096
> > /sys/block/dm-10 : onn_kvm380-var 256 4096
> > /sys/block/dm-11 : onn_kvm380-tmp 256 4096
> > /sys/block/dm-12 : onn_kvm380-home 256 4096
> > /sys/block/dm-13 : onn_kvm380-var_crash 256 4096
> > /sys/block/dm-2 : onn_kvm380-pool00-tpool 256 4096
> > /sys/block/dm-3 : onn_kvm380-ovirt--node--ng--4.3.7--0.20191121.0+1 256 4096
> > /sys/block/dm-4 : onn_kvm380-swap 256 4096
> > /sys/block/dm-5 : eui.0025385991b1e27a 512 2048
> > /sys/block/dm-6 : eui.0025385991b1e27a1 512 2048
> > /sys/block/dm-7 : onn_kvm380-pool00 256 4096
> > /sys/block/dm-8 : onn_kvm380-var_log_audit 256 4096
> > /sys/block/dm-9 : onn_kvm380-var_log 256 4096
> > cat: /sys/block/nvme0n1/device/vendor: Datei oder Verzeichnis nicht gefunden
> > /sys/block/nvme0n1: Samsung SSD 970 EVO 1TB 512 2048
> > /sys/block/sda : HP LOGICAL VOLUME 256 4096
> >
> > is the nvme not starting correct
> > [root@kvm380 ~]# systemctl status multipathd
> > ● multipathd.service - Device-Mapper Multipath Device Controller
> > Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
> > Active: active (running) since Di 2019-12-31 16:16:32 CET; 31min ago
> > Process: 1919 ExecStart=/sbin/multipathd (code=exited, status=0/SUCCESS)
> > Process: 1916 ExecStartPre=/sbin/multipath -A (code=exited, status=0/SUCCESS)
> > Process: 1911 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, status=0/SUCCESS)
> > Main PID: 1921 (multipathd)
> > Tasks: 7
> > CGroup: /system.slice/multipathd.service
> > └─1921 /sbin/multipathd
> >
> > Dez 31 16:47:58 kvm380.durchhalten.intern multipathd[1921]: nvme0n1: mark as failed
> > Dez 31 16:47:58 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: Entering recovery mode: max_retries=4
> > Dez 31 16:47:58 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: remaining active paths: 0
> > Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: 259:0: reinstated
> > Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: queue_if_no_path enabled
> > Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: Recovered to normal mode
> > Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: remaining active paths: 1
> > Dez 31 16:48:03 kvm380.durchhalten.intern multipathd[1921]: nvme0n1: mark as failed
> > Dez 31 16:48:03 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: Entering recovery mode: max_retries=4
> > Dez 31 16:48:03 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: remaining active paths: 0
> >
> > why is it marked as failed?
> >
> > if i create a new volume with cockpit and use it for bricks for gluster, every thing is fine. until reboot
> >
> >
> > maybe some one can point me the direction
> > _______________________________________________
> > Users mailing list -- users(a)ovirt.org
> > To unsubscribe send an email to users-leave(a)ovirt.org
> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MHHFFWAY5T5...
4 years, 10 months
Re: Node not starting | blk_cloned_rq_check_limits: over max size limit
by Strahil
You can check https://access.redhat.com/solutions/2437991 & https://access.redhat.com/solutions/3014361
You have 2 options:
1. Set a udev rule like this one (replace NETAPP with your storage)
ACTION!="add|change", GOTO="rule_end" ENV{ID_VENDOR}=="NETAPP*", RUN+="/bin/sh -c 'echo 4096 > /sys%p/queue/max_sectors_kb'" LABEL="rule_end"
2. Set max_sectors_kb in devices section of multipath.conf
You will need to stop LVM and then flush the device map , so the new option to take effect (faster is to reboot)
Good Luck & Happy New Year.
Best Regards,
Strahil Nikolov
On Dec 31, 2019 17:53, Stefan Wolf <shb256(a)gmail.com> wrote:
>
> hi all,
>
> i ve 4 nodes running with current ovirt.
> I ve only a problem on one host even after a fresh installation .
> I ve installed the latest image.
> Than I add the node to the cluster
> Everything is working good.
> After this I configure the network.
> BUT, after a restart the host does not come up again.
> I got this error: blk_cloned_rq_check_limits: over max size limit
> every 5 seconds
>
> I can continue with control-D
> or I can login with root password to fix the problem. but i dont know what is the problem and where does it came from
>
> I ve also changed the sas disk to nvme storage, but I ve changed this on every host. And this problem exists only on one host
>
> i found this https://lists.centos.org/pipermail/centos/2017-December/167727.html
> the output is
> [root@kvm380 ~]# ./test.sh
> Sys Block Node : Device max_sectors_kb max_hw_sectors_kb
> /sys/block/dm-0 : onn_kvm380-pool00_tmeta 256 4096
> /sys/block/dm-1 : onn_kvm380-pool00_tdata 256 4096
> /sys/block/dm-10 : onn_kvm380-var 256 4096
> /sys/block/dm-11 : onn_kvm380-tmp 256 4096
> /sys/block/dm-12 : onn_kvm380-home 256 4096
> /sys/block/dm-13 : onn_kvm380-var_crash 256 4096
> /sys/block/dm-2 : onn_kvm380-pool00-tpool 256 4096
> /sys/block/dm-3 : onn_kvm380-ovirt--node--ng--4.3.7--0.20191121.0+1 256 4096
> /sys/block/dm-4 : onn_kvm380-swap 256 4096
> /sys/block/dm-5 : eui.0025385991b1e27a 512 2048
> /sys/block/dm-6 : eui.0025385991b1e27a1 512 2048
> /sys/block/dm-7 : onn_kvm380-pool00 256 4096
> /sys/block/dm-8 : onn_kvm380-var_log_audit 256 4096
> /sys/block/dm-9 : onn_kvm380-var_log 256 4096
> cat: /sys/block/nvme0n1/device/vendor: Datei oder Verzeichnis nicht gefunden
> /sys/block/nvme0n1: Samsung SSD 970 EVO 1TB 512 2048
> /sys/block/sda : HP LOGICAL VOLUME 256 4096
>
> is the nvme not starting correct
> [root@kvm380 ~]# systemctl status multipathd
> ● multipathd.service - Device-Mapper Multipath Device Controller
> Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
> Active: active (running) since Di 2019-12-31 16:16:32 CET; 31min ago
> Process: 1919 ExecStart=/sbin/multipathd (code=exited, status=0/SUCCESS)
> Process: 1916 ExecStartPre=/sbin/multipath -A (code=exited, status=0/SUCCESS)
> Process: 1911 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, status=0/SUCCESS)
> Main PID: 1921 (multipathd)
> Tasks: 7
> CGroup: /system.slice/multipathd.service
> └─1921 /sbin/multipathd
>
> Dez 31 16:47:58 kvm380.durchhalten.intern multipathd[1921]: nvme0n1: mark as failed
> Dez 31 16:47:58 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: Entering recovery mode: max_retries=4
> Dez 31 16:47:58 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: remaining active paths: 0
> Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: 259:0: reinstated
> Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: queue_if_no_path enabled
> Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: Recovered to normal mode
> Dez 31 16:48:02 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: remaining active paths: 1
> Dez 31 16:48:03 kvm380.durchhalten.intern multipathd[1921]: nvme0n1: mark as failed
> Dez 31 16:48:03 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: Entering recovery mode: max_retries=4
> Dez 31 16:48:03 kvm380.durchhalten.intern multipathd[1921]: eui.0025385991b1e27a: remaining active paths: 0
>
> why is it marked as failed?
>
> if i create a new volume with cockpit and use it for bricks for gluster, every thing is fine. until reboot
>
>
> maybe some one can point me the direction
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MHHFFWAY5T5...
4 years, 10 months