Ceph cluster on oVirt Guests
by Anantha Raghava
Hi,
We are trying to install Ceph Cluster on Cent OS7 Guests on oVirt. We
are receiving many errors and it is unable to create master or the node.
Has anyone tried to deploy Ceph Cluster on CentOS 7 Guest in oVirt?
--
Thanks & regards,
Anantha Raghava
Do not print this e-mail unless required. Save Paper & trees.
5 years, 1 month
improve file operation speed in HCI
by Jayme
I'm wondering if anyone has any tips to improve file/directory operations
in HCI replica 3 (no arbtr) configuration with SSDs and 10Gbe storage
network.
I am running stock optimize for virt store volume settings currently and am
wondering what if any improvements I can make for VM write speed and more
specifically anything I can tune to increase performance of small file
operations such as copying, untar, npm installs etc.
For some context, I'm seeing ~50MB/s write speeds inner VM with: dd
if=/dev/zero of=./test bs=512k count=2048 oflag=direct -- I am not sure how
this compares to other HCI setups, I feel like it should be higher with SSD
backed storage. Same command from gluster mount is over 400MB/s
I've read some things about meta data caching, read ahead and other
options. There are so many and I'm not sure where to start, I'm also not
sure which could potentially have a negative impact on VM
stability/reliability.
Here are options for one of my volumes:
Volume Name: prod_b
Type: Replicate
Volume ID: c3e7447e-8514-4e4a-9ff5-a648fe6aa537
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster0.example.com:/gluster_bricks/prod_b/prod_b
Brick2: gluster1.example.com:/gluster_bricks/prod_b/prod_b
Brick3: gluster2.example.com:/gluster_bricks/prod_b/prod_b
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable
server.allow-insecure: on
cluster.choose-local: off
5 years, 1 month
Re: [oVirt HC] Gluster traffic still flows on mgmt even after choosing a different Gluster nw
by Strahil
For example my gluster network IP is 10.10.10.1/24 and the /etc/hosts entry is:
10.10.10.1 gluster1.localdomain gluster1
Then I did 'gluster volume replace-brick ovirt1:/gluster_bricks/data/data gluster1:/gluster_bricks/data/data commit'
So you use a hostname that is resolved either by DNS or /etc/hosts to your desired IP.
In my case, the change was done after initial deployment.
Best Regards,
Strahil NikolovOn Oct 17, 2019 16:51, Jayme <jaymef(a)gmail.com> wrote:
>
> Thanks for the info, but where does it get the new hostname from? Do I need to change the actual server hostnames of my nodes? If I were to do that then the hosts would not be accessible due to the fact that the gluster storage subnet is isolated.
>
> I guess I'm confused about what gdeploy does during a new HCI deployment. I understand that you are now suppose to use hostnames that resolve to the storage network subnet in the first step and then specify FQDNs for management in the next step. Where do the FQDNs actually get used?
>
> Can someone confirm if the hostname of oVirt host nodes should as shown by the "#hostname" command should resolve to IPs on the gluster storage network?
>
> On Thu, Oct 17, 2019 at 10:40 AM Strahil <hunter86_bg(a)yahoo.com> wrote:
>>
>> The reset-brick and replace-brick affects only one brick and notifies the gluster cluster that a new hostname:/brick_path is being used.
>>
>> Of course, you need a hostname that resolves to the IP that is on the storage network.
>>
>> WARNING: Ensure that no heals are pending as the commands are wiping the brick and data there is lost.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Oct 17, 2019 15:28, Jayme <jaymef(a)gmail.com> wrote:
>>>
>>> What does the reset brick option do and is it safe to do this on a live system or do all VMs need to be brought down first? How does resetting the brick fix the issue with gluster peers using the server hostnames which are attached to IPs on the ovirtmanagement network?
>>>
>>> On Thu, Oct 17, 2019 at 4:16 AM Sahina Bose <sabose(a)redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Oct 16, 2019 at 8:38 PM Jayme <jaymef(a)gmail.com> wrote:
>>>>>
>>>>> Is there a way to fix this on a hci deployment which is already in operation? I do have a separate gluster network which is chosen for migration and gluster network but when I originally deployed I used just one set of host names which resolve to management network subnet.
>>>>
>>>>
>>>> You will need to change the interface that's used by the bricks. You can do this by using the "Reset brick" option, once the gluster management network is set correctly on the storage interface from ovirt-engine
>>>>>
>>>>>
>>>>>
>>>>> I appear to have a situation where gluster traffic may be going through both networks in seeing what looks like gluster traffic on both the gluster interface and ovirt management.
>>>>>
>>>>> On Wed, Oct 16, 2019 at 11:34 AM Stefano Stagnaro <stefanos(a)prismatelecomtesting.com> wrote:
>>>>>>
>>>>>> Thank you Simone for the clarifications.
>>>>>>
>>>>>> I've redeployed with both management and storage FQDNs; now everything seems to be in its place.
>>>>>>
>>>>>> I only have a couple of questions:
>>>>>>
>>>>>> 1) In the Gluster deployment Wizard, section 1 (Hosts) and 2 (Additional Hosts) are misleading; should be renamed in something like "Host Configuration: Storage side" / "Host Configuration: Management side".
>>>>>>
>>>>>> 2) what is the real function of the "Gluster Network" cluster traffic type? What it actually does?
>>>>>>
>>>>>> Thanks,
>>>>>> Stefano.
>>>>>> _______________________________________________
>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
5 years, 1 month
Re: [oVirt HC] Gluster traffic still flows on mgmt even after choosing a different Gluster nw
by Strahil
The reset-brick and replace-brick affects only one brick and notifies the gluster cluster that a new hostname:/brick_path is being used.
Of course, you need a hostname that resolves to the IP that is on the storage network.
WARNING: Ensure that no heals are pending as the commands are wiping the brick and data there is lost.
Best Regards,
Strahil NikolovOn Oct 17, 2019 15:28, Jayme <jaymef(a)gmail.com> wrote:
>
> What does the reset brick option do and is it safe to do this on a live system or do all VMs need to be brought down first? How does resetting the brick fix the issue with gluster peers using the server hostnames which are attached to IPs on the ovirtmanagement network?
>
> On Thu, Oct 17, 2019 at 4:16 AM Sahina Bose <sabose(a)redhat.com> wrote:
>>
>>
>>
>> On Wed, Oct 16, 2019 at 8:38 PM Jayme <jaymef(a)gmail.com> wrote:
>>>
>>> Is there a way to fix this on a hci deployment which is already in operation? I do have a separate gluster network which is chosen for migration and gluster network but when I originally deployed I used just one set of host names which resolve to management network subnet.
>>
>>
>> You will need to change the interface that's used by the bricks. You can do this by using the "Reset brick" option, once the gluster management network is set correctly on the storage interface from ovirt-engine
>>>
>>>
>>>
>>> I appear to have a situation where gluster traffic may be going through both networks in seeing what looks like gluster traffic on both the gluster interface and ovirt management.
>>>
>>> On Wed, Oct 16, 2019 at 11:34 AM Stefano Stagnaro <stefanos(a)prismatelecomtesting.com> wrote:
>>>>
>>>> Thank you Simone for the clarifications.
>>>>
>>>> I've redeployed with both management and storage FQDNs; now everything seems to be in its place.
>>>>
>>>> I only have a couple of questions:
>>>>
>>>> 1) In the Gluster deployment Wizard, section 1 (Hosts) and 2 (Additional Hosts) are misleading; should be renamed in something like "Host Configuration: Storage side" / "Host Configuration: Management side".
>>>>
>>>> 2) what is the real function of the "Gluster Network" cluster traffic type? What it actually does?
>>>>
>>>> Thanks,
>>>> Stefano.
>>>> _______________________________________________
>>>> Users mailing list -- users(a)ovirt.org
>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>>>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NTNXPJMOZEY...
>>>
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KO5CLJNLRZO...
5 years, 1 month
oVirt Self-Hosted Engine - VLAN error
by ccesario@blueit.com.br
Hi folks,
We are some days trying deploy oVirt Self-Hosted Engine, but it seems there are something wrong.
The deploy process fail in this point
[ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : set_fact]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Collect error events from the Engine]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Generate the error message from the engine events]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Fail with error description]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, deployment errors: code 505: Host srv-virt7.cloud.blueit installation failed. Failed to configure management network on the host., code 9000: Failed to verify Power Management configuration for Host srv-virt7.cloud.blueit., fix accordingly and re-deploy."}
[ INFO ] TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set destination directory path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Create destination directory]
[ INFO ] changed: [localhost]
and this
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove local vm dir]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO ] Stage: Clean up
[ INFO ] Cleaning temporary resources
[ INFO ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove local vm dir]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO ] Stage: Clean up
[ INFO ] Cleaning temporary resources
[ INFO ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
But we did not found nothing relevant in the log.
Our environtment is:
OS Version: RHEL - 7 - 7.1908.0.el7.centos
OS Description: oVirt Node 4.3.6
Kernel Version: 3.10.0 - 1062.1.1.el7.x86_64
KVM Version: 2.12.0 - 33.1.el7
LIBVIRT Version: libvirt-4.5.0-23.el7_7.1
VDSM Version: vdsm-4.30.33-1.el7
The ovirtmgmt bridge has TAGGED VLAN 10
[root@srv-virt7 ~]# brctl show
bridge name bridge id STP enabled interfaces
ovirtmgmt 8000.6cae8b284832 no eno2.10
Could someone has any idea or tip about this?
Regards
Carlos
5 years, 1 month
[oVirt HC] Gluster traffic still flows on mgmt even after choosing a different Gluster nw
by Stefano Stagnaro
Hi,
I've deployed an oVirt HC starting with latest oVirt Node 4.3.6; this is my simple network plan (FQDNs only resolves the front-end addresses):
front-end back-end
engine.ovirt 192.168.110.10
node1.ovirt 192.168.110.11 192.168.210.11
node2.ovirt 192.168.110.12 192.168.210.12
node3.ovirt 192.168.110.13 192.168.210.13
at the end I followed the RHHI-V 1.6 Deployment Guide where, at chapter 9 [1], it suggests to create a logical network for Gluster traffic. Now I can see, indeed, back-end addresses added in the address pool:
[root@node1 ~]# gluster peer status
Number of Peers: 2
Hostname: node3.ovirt
Uuid: 3fe33e8b-d073-4d7a-8bda-441c42317c92
State: Peer in Cluster (Connected)
Other names:
192.168.210.13
Hostname: node2.ovirt
Uuid: a95a9233-203d-4280-92b9-04217fa338d8
State: Peer in Cluster (Connected)
Other names:
192.168.210.12
The problem is that the Gluster traffic seems still to flow on the management interfaces:
[root@node1 ~]# tcpdump -i ovirtmgmt portrange 49152-49664
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ovirtmgmt, link-type EN10MB (Ethernet), capture size 262144 bytes
14:04:58.746574 IP node2.ovirt.49129 > node1.ovirt.49153: Flags [.], ack 484303246, win 18338, options [nop,nop,TS val 6760049 ecr 6760932], length 0
14:04:58.753050 IP node2.ovirt.49131 > node1.ovirt.49152: Flags [P.], seq 2507489191:2507489347, ack 2889633200, win 20874, options [nop,nop,TS val 6760055 ecr 6757892], length 156
14:04:58.753131 IP node2.ovirt.49131 > node1.ovirt.49152: Flags [P.], seq 156:312, ack 1, win 20874, options [nop,nop,TS val 6760055 ecr 6757892], length 156
14:04:58.753142 IP node2.ovirt.49131 > node1.ovirt.49152: Flags [P.], seq 312:468, ack 1, win 20874, options [nop,nop,TS val 6760055 ecr 6757892], length 156
14:04:58.753148 IP node2.ovirt.49131 > node1.ovirt.49152: Flags [P.], seq 468:624, ack 1, win 20874, options [nop,nop,TS val 6760055 ecr 6757892], length 156
14:04:58.753203 IP node2.ovirt.49131 > node1.ovirt.49152: Flags [P.], seq 624:780, ack 1, win 20874, options [nop,nop,TS val 6760055 ecr 6757892], length 156
14:04:58.753216 IP node2.ovirt.49131 > node1.ovirt.49152: Flags [P.], seq 780:936, ack 1, win 20874, options [nop,nop,TS val 6760055 ecr 6757892], length 156
14:04:58.753231 IP node1.ovirt.49152 > node2.ovirt.49131: Flags [.], ack 936, win 15566, options [nop,nop,TS val 6760978 ecr 6760055], length 0
...
and no yet to the eth1 I dedicated to gluster:
[root@node1 ~]# tcpdump -i eth1 portrange 49152-49664
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
What am I missing here? What can I do to force the Gluster traffic to really flow on dedicated Gluster network?
Thank you,
Stefano.
[1] https://red.ht/2MiZ4Ge
5 years, 1 month
Re: oVirt 4.3.5/6 HC: Reinstall fails from WEB UI
by Strahil
Check why the sanlock.service reports no pid.
Also check the logs of the broker and agent located at /var/log/ovirt-hosted-engine-ha
You might have to increase the verbosity of the broker and agent.
Best Regards,
Strahil NikolovOn Oct 17, 2019 08:00, adrianquintero(a)gmail.com wrote:
>
> Strahil,
> this is what i see for each service
> all services are active and running except for ovirt-ha-agent which says "activating", even though the rest of the services are Active/running they still show a few errors warnings.
>
> -------------------------------------------------------------------------------------------------------
> ● sanlock.service - Shared Storage Lease Manager
> Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
> Active: active (running) since Thu 2019-10-17 00:47:20 EDT; 2min 1s ago
> Process: 16495 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
> Main PID: 2023
> Tasks: 10
> CGroup: /system.slice/sanlock.service
> └─2023 /usr/sbin/sanlock daemon
>
> Oct 17 00:47:20 host1.example.com systemd[1]: Starting Shared Storage Lease Manager...
> Oct 17 00:47:20 host1.example.com systemd[1]: Started Shared Storage Lease Manager.
> Oct 17 00:47:20 host1.example.com sanlock[16496]: 2019-10-17 00:47:20 33920 [16496]: lockfile setlk error /var/run/sanlock/sanlock.pid: Resource temporarily unavailable
> ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root
> Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled)
> Active: active (running) since Thu 2019-10-17 00:43:06 EDT; 6min ago
> Main PID: 15277 (supervdsmd)
> Tasks: 5
> CGroup: /system.slice/supervdsmd.service
> └─15277 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
>
> Oct 17 00:43:06 host1.example.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root.
> Oct 17 00:43:07 host1.example.com supervdsmd[15277]: failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory
> ● vdsmd.service - Virtual Desktop Server Manager
> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
> Active: active (running) since Thu 2019-10-17 00:47:27 EDT; 1min 54s ago
> Process: 16402 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
> Process: 16499 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
> Main PID: 16572 (vdsmd)
> Tasks: 38
> CGroup: /system.slice/vdsmd.service
> └─16572 /usr/bin/python2 /usr/share/vdsm/vdsmd
>
> Oct 17 00:47:28 host1.example.com vdsm[16572]: WARN MOM not available.
> Oct 17 00:47:28 host1.example.com vdsm[16572]: WARN MOM not available, KSM stats will be missing.
> Oct 17 00:47:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:47:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:47:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:48:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:48:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:48:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:48:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:49:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
> Active: active (running) since Thu 2019-10-17 00:44:11 EDT; 5min ago
> Main PID: 16379 (ovirt-ha-broker)
> Tasks: 2
> CGroup: /system.slice/ovirt-ha-broker.service
> └─16379 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>
> Oct 17 00:44:11 host1.example.com systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
> Active: activating (auto-restart) (Result: exit-code) since Thu 2019-10-17 00:49:13 EDT; 8s ago
> Process: 16925 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157)
> Main PID: 16925 (code=exited, status=157)
>
> Oct 17 00:49:13 host1.example.com systemd[1]: Unit ovirt-ha-agent.service entered failed state.
> Oct 17 00:49:13 host1.example.com systemd[1]: ovirt-ha-agent.service failed.
> [root@host1]# systemctl status sanlock;echo ---------;systemctl status supervdsmd;echo -------------;systemctl status vdsmd;echo ---------;systemctl status ovirt-ha-broker;echo ----------;systemctl status ovirt-ha-agent
> ● sanlock.service - Shared Storage Lease Manager
> Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
> Active: active (running) since Thu 2019-10-17 00:47:20 EDT; 3min 14s ago
> Process: 16495 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
> Main PID: 2023
> Tasks: 10
> CGroup: /system.slice/sanlock.service
> └─2023 /usr/sbin/sanlock daemon
>
> Oct 17 00:47:20 host1.example.com systemd[1]: Starting Shared Storage Lease Manager...
> Oct 17 00:47:20 host1.example.com systemd[1]: Started Shared Storage Lease Manager.
> Oct 17 00:47:20 host1.example.com sanlock[16496]: 2019-10-17 00:47:20 33920 [16496]: lockfile setlk error /var/run/sanlock/sanlock.pid: Resource temporarily unavailable
> ---------
> ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root
> Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled)
> Active: active (running) since Thu 2019-10-17 00:43:06 EDT; 7min ago
> Main PID: 15277 (supervdsmd)
> Tasks: 5
> CGroup: /system.slice/supervdsmd.service
> └─15277 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
>
> Oct 17 00:43:06 host1.example.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root.
> Oct 17 00:43:07 host1.example.com supervdsmd[15277]: failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory
> -------------
> ● vdsmd.service - Virtual Desktop Server Manager
> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
> Active: active (running) since Thu 2019-10-17 00:47:27 EDT; 3min 7s ago
> Process: 16402 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
> Process: 16499 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
> Main PID: 16572 (vdsmd)
> Tasks: 38
> CGroup: /system.slice/vdsmd.service
> └─16572 /usr/bin/python2 /usr/share/vdsm/vdsmd
>
> Oct 17 00:48:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:48:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:48:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:48:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:49:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:49:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:49:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:49:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:50:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> Oct 17 00:50:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
> ---------
> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
> Active: active (running) since Thu 2019-10-17 00:44:11 EDT; 6min ago
> Main PID: 16379 (ovirt-ha-broker)
> Tasks: 2
> CGroup: /system.slice/ovirt-ha-broker.service
> └─16379 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>
> Oct 17 00:44:11 host1.example.com systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
> ----------
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
> Active: activating (auto-restart) (Result: exit-code) since Thu 2019-10-17 00:50:27 EDT; 7s ago
> Process: 17107 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157)
> Main PID: 17107 (code=exited, status=157)
>
> Oct 17 00:50:27 host1.example.com systemd[1]: Unit ovirt-ha-agent.service entered failed state.
> Oct 17 00:50:27 host1.example.com systemd[1]: ovirt-ha-agent.service failed.
>
> -------------------------------------------------------------------------------------------------------
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4P4WPUBORKI...
5 years, 1 month
Re: Delete snapshots task hung
by Strahil
Have you checked this thread :
https://lists.ovirt.org/pipermail/users/2016-April/039277.html
You can switch to postgre user, then 'source /opt/rhn/postgresql10/enable' & then 'psql engine'.
As per the thread you can find illegal snapshots via 'select image_group_id,imagestatus from images where imagestatus =4;'
And then update them via 'update images set imagestatus =1 where imagestatus = 4 and <other criteria>; commit'
Best Regards,
Strahil Nikolov
On Oct 13, 2019 15:45, Leo David <leoalex(a)gmail.com> wrote:
>
> Hi Everyone,
> Im still not being able to start the vms... Could anyone give me an advice on sorign this out ?
> Still having th "Bad volume specification" error, although the disk is present on the storage.
> This issue would force me to reinstall a 10 nodes Openshift cluster from scratch, which would not be so funny..
> Thanks,
>
> Leo.
>
> On Fri, Oct 11, 2019 at 7:12 AM Strahil <hunter86_bg(a)yahoo.com> wrote:
>>
>> Nah...
>> It's done directly on the DB and I wouldn't recommend such action for Production Cluster.
>> I've done it only once and it was based on some old mailing lists.
>>
>> Maybe someone from the dev can assist?
>>
>> On Oct 10, 2019 13:31, Leo David <leoalex(a)gmail.com> wrote:
>>>
>>> Thank you Strahil,
>>> Could you tell me what do you mean by changing status ? Is this something to be done in the UI ?
>>>
>>> Thanks,
>>>
>>> Leo
>>>
>>> On Thu, Oct 10, 2019, 09:55 Strahil <hunter86_bg(a)yahoo.com> wrote:
>>>>
>>>> Maybe you can change the status of the VM in order the engine to know that it has to blockcommit the snapshots.
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>>> On Oct 9, 2019 09:02, Leo David <leoalex(a)gmail.com> wrote:
>>>>>
>>>>> Hi Everyone,
>>>>> Please let me know if any thoughts or recommandations that could help me solve this issue..
>>>>> The real bad luck in this outage is that these 5 vms are part on an Openshift deployment, and now we are not able to start it up...
>>>>> Before trying to sort this at ocp platform level by replacing the failed nodes with new vms, I would rather prefer to do it at the oVirt level and have the vms starting since the disks are still present on gluster.
>>>>> Thank you so much !
>>>>>
>>>>>
>>>>> Leo
>
>
>
> --
> Best regards, Leo David
5 years, 1 month
oVirt 4.3.5/6 HC: Reinstall fails from WEB UI
by adrianquintero@gmail.com
Hi,
I am trying to re-install a host from the web UI in oVirt 4.3.5, but it always fails and goes to "Setting Host state to Non-Operational"
From the engine.log I see the following WARN/ERROR:
2019-10-16 16:32:57,263-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default-DC1. Setting Host state to Non-Operational.
2019-10-16 16:32:57,271-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings.
2019-10-16 16:32:57,276-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1
2019-10-16 16:35:06,151-04 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (EE-ManagedThreadFactory-engine-Thread-137245) [] Could not connect host 'host1.example.com' to pool 'Default-DC1': Error storage pool connection: (u"spUUID=7d3fb14c-ebf0-11e9-9ee5-00163e05e135, msdUUID=4b87a5de-c976-4982-8b62-7cffef4a22d8, masterVersion=1, hostID=1, domainsMap={u'8c2df9c6-b505-4499-abb9-0d15db80f33e': u'active', u'4b87a5de-c976-4982-8b62-7cffef4a22d8': u'active', u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128': u'active', u'fe24d88e-6acf-42d7-a857-eaf1f8deb24a': u'active'}",)
2019-10-16 16:35:06,248-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default-DC1. Setting Host state to Non-Operational.
2019-10-16 16:35:06,256-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings.
2019-10-16 16:35:06,261-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1
2019-10-16 16:37:46,011-04 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'host1.example.com', last response arrived 1501 ms ago.
2019-10-16 16:41:57,095-04 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (EE-ManagedThreadFactory-engine-Thread-137527) [17f3aadd] Could not connect host 'host1.example.com' to pool 'Default-DC1': Error storage pool connection: (u"spUUID=7d3fb14c-ebf0-11e9-9ee5-00163e05e135, msdUUID=4b87a5de-c976-4982-8b62-7cffef4a22d8, masterVersion=1, hostID=1, domainsMap={u'8c2df9c6-b505-4499-abb9-0d15db80f33e': u'active', u'4b87a5de-c976-4982-8b62-7cffef4a22d8': u'active', u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128': u'active', u'fe24d88e-6acf-42d7-a857-eaf1f8deb24a': u'active'}",)
2019-10-16 16:41:57,199-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default-DC1. Setting Host state to Non-Operational.
2019-10-16 16:41:57,211-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings.
2019-10-16 16:41:57,216-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1
Any ideas why this might be happening?
I have researched, however I have not been able to find a solution.
thanks,
Adrian
5 years, 1 month