Gluster rebuild: request suggestions (poor IO performance)
by Jim Kusznir
Hi:
I've been having one heck of a time for nearly the entire time I've been
running ovirt with disk IO performance. I've tried a variety of things,
I've posted to this list for help several times, and it sounds like in most
cases the problems are due to design decisions and such.
My cluster has been devolving into nearly unusable performance, and I
believe its mostly disk IO related. I'm currently using FreeNAS as my
primary VM storage (via NFS), but now it too is performing slowly (it
started out reasonable, but slowly devolved for unknown reasons).
I'm ready to switch back to gluster if I can get specific recommendations
as to what I need to do to make it work. I feel like I've been trying
random things, and sinking money into this to try and make it work, but
nothing has really fixed the problem.
I have 3 Dell R610 servers with 750GB SSDs as their primary drive. I had
used some Seagate SSHDs, but the internal Dell DRAC raid controller (which
had been configured to pass them through as a single disk volume, but still
wasn't really JBOD), but it started silently failing them, and causing
major issues for gluster. I think the DRAC just doesn't like those HDDs.
I can put some real spinning disks in; perhaps a RAID-1 pair of 2TB? These
servers only take 2.5" hdd's, so that greatly limits my options.
I'm sure others out there are using Dell R610 servers...what do you use
for storage? How does it perform? What do I need to do to get this
cluster actually usable again? Are PERC-6i storage controllers usable?
I'm not even sure where to go troubleshooting now...everything is so
sloooowwwww.
BTW: I had a small data volume on the SSDs, and the gluster performance on
those was pretty poor. performance of the hosted engine is pretty poor
still, and it is still on the SSDs.
5 years, 9 months
Re: Scale out ovirt 4.3 (from 3 to 6 or 9 nodes) with hyperconverged setup and Gluster
by Strahil
> EUREKA: After doing the above I was able to get past the filter issues, however I am still concerned if during a reboot the disks might come up differently. For example /dev/sdb might come up as /dev/sdx...
Even if they change , you don't have to worry about as each PV contains LVM metadata (including VG configuration) which is read by LVM on boot (actually everything that is not in the LVM filter is being scanned like that).
Once all PVs are available the VG is activated and then the LVs are also activated.
> I am trying to make sure this setup is always the same as we want to move this to production, however seems I still don't have the full hang of it and the RHV 4.1 course is way to old :)
>
> Thanks again for helping out with this.
It's a plain KVM with some management layer.
Just a hint:
Get your HostedEngine's configuration xml from the vdsm log (for emergencies) and another copy with reverse boot order where DVD is booting first.
Also get the xml for the ovirtmgmt network.
It helped me a lot of times when I wanted to recover my HostedEngine.
I'm too lazy to rebuild it.
Hint2:
Vdsm logs contain each VM's configuration xml when the VMs are powered on.
Hint3:
Get regular backups of the HostedEngine and patch it from time to time.
I would go in prod as follows:
Let's say you are on 4.2.8
Next step would be to go to 4.3.latest and then to 4.4.latest .
A test cluster (even in VMs ) is also benefitial.
Despite the hiccups I have stumbled upon, I think that the project is great.
Best Regards,
Strahil Nikolov
5 years, 9 months
Migrating self-HostedEngine from NFS to iSCSI
by Miha Verlic
Hello,
I have a few questions regarding migration of HostedEngine. Currently I
have a cluster of 3 oVirt 4.3.3 nodes, all three of them are capable of
running HE and I can freely migrate HostedEngine and regular VMs between
them. However I deployed HostedEngine storage on rather edgy NFS server
and I would like to migrate it to iSCSI based storage with multipathing.
Quite a few VMs are already running on cluster and are using iSCSI data
storage.
Documentation is rather chaotic and fragmented, but from what I gathered
the path of migration is someting like:
- place one host (#1), the "failover" host, into maintenance mode prior
to backup
- export configuration with engine-backup
- set global maintenance mode on all hosts
- install ovirt engine on that host (#1) (already installed, since this
is HE capable host)
- restore engine configuration using engine-backup
- run engine-setup with new parameters regarding storage
- after engine-setup, log into admin portal and remove old host (#1)
- redeploy hosts #2 and #3
Last two steps are a bit confusing as I'm not sure how removing old
failover host on which new HE is running would work. Also not
understanding the part where hosts 2 and 3 are described as
unrecoverable (but with running VMs, which I'd have to live migrate to
other hosts - how, if they're not operational?).
Few other things:
- Should I first remove & re-add host #1 without HE already deployed on
host?
- Should I set global maintenance mode on all hosts before migration?
I'm guessing this is required if I want to prevent HE being started on
random host during transition...
- Which host should be selected as SPM during the transition phase?
- How can I configure iSCSI multipathing? Self-hosted engine
documentation mentions Multipath Helper tool, however I cannot find any
info about it. Is this tool freely available or only a part od RHEL
subscription?
- Can I configure existing iSCSI Domain which already hosts some VMs as
HE storage? Or do I have to assign extra LUN/target exclusively for HE?
Cheers
--
Miha
5 years, 9 months
Re: Scale out ovirt 4.3 (from 3 to 6 or 9 nodes) with hyperconverged setup and Gluster
by Strahil
Thanks for the clarification.
It seems that my nvme (used by vdo) is not locked.
I will check again before opening a bug.
Best Regards,
Strahil NikolovOn May 21, 2019 09:52, Sahina Bose <sabose(a)redhat.com> wrote:
>
>
>
> On Tue, May 21, 2019 at 2:36 AM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
>>
>> Hey Sahina,
>>
>> it seems that almost all of my devices are locked - just like Fred's.
>> What exactly does it mean - I don't have any issues with my bricks/storage domains.
>
>
>
> If the devices show up as locked - it means the disk cannot be used to create a brick. This is when the disk either already has a filesystem or is in use.
> But if the device is a clean device and it still shows up as locked - this could be a bug with how python-blivet/ vdsm reads this
>
> The code to check is implemented as
> _canCreateBrick(device):
> if not device or device.kids > 0 or device.format.type or \
> hasattr(device.format, 'mountpoint') or \
> device.type in ['cdrom', 'lvmvg', 'lvmthinpool', 'lvmlv', 'lvmthinlv']:
> return False
> return True
>
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> В понеделник, 20 май 2019 г., 14:56:11 ч. Гринуич+3, Sahina Bose <sabose(a)redhat.com> написа:
>>
>>
>> To scale existing volumes - you need to add bricks and run rebalance on the gluster volume so that data is correctly redistributed as Alex mentioned.
>> We do support expanding existing volumes as the bug https://bugzilla.redhat.com/show_bug.cgi?id=1471031 has been fixed
>>
>> As to procedure to expand volumes:
>> 1. Create bricks from UI - select Host -> Storage Device -> Storage device. Click on "Create Brick"
>> If the device is shown as locked, make sure there's no signature on device. If multipath entries have been created for local devices, you can blacklist those devices in multipath.conf and restart multipath.
>> (If you see device as locked even after you do this -please report back).
>> 2. Expand volume using Volume -> Bricks -> Add Bricks, and select the 3 bricks created in previous step
>> 3. Run Rebalance on the volume. Volume -> Rebalance.
>>
>>
>> On Thu, May 16, 2019 at 2:48 PM Fred Rolland <frolland(a)redhat.com> wrote:
>>>
>>> Sahina,
>>> Can someone from your team review the steps done by Adrian?
>>> Thanks,
>>> Freddy
>>>
5 years, 9 months
VM Windows on 4.2
by gpesoli@it.iliad.com
Hi all,
I still cannot install or migrate windows virtual machines (64bit and UEFI BIOS) on my oVirt platform (4.2.8.2-1), is it possible?
On migration from ESXi, after converted them, VM doesn't want to start in anyway.
Further I noticed that on new installation I can't install 64bit version of Windows, because seems that 64bit disk driver are not recognised.
5 years, 9 months
Wrong disk size in UI after expanding iscsi direct LUN
by Bernhard Dick
Hi,
I've extended the size of one of my direct iSCSI LUNs. The VM is seeing
the new size but in the webinterface there is still the old size
reported. Is there a way to update this information? I already took a
look into the list but there are only reports regarding updating the
size the VM sees.
Best regards
Bernhard
5 years, 9 months
hosts becomes NonResponsive
by Jiří Sléžka
Hi,
time to time one of our four ovirt hosts become NonResponsive.
From engine point of view it looks this way (engine.log)
2019-05-21 13:10:30,261+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-95) [] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt03.net.slu.cz command Get
Host Capabilities failed: Message timeout which can be caused by
communication issues
2019-05-21 13:10:30,261+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-95) [] Unable to
RefreshCapabilities: VDSNetworkException: VDSGenericException:
VDSNetworkException: Message timeout which can be caused by
communication issues
from host (which is reachable) it looks like (vdsm.log)
2019-05-21 13:10:27,154+0200 INFO (vmrecovery) [vdsm.api] START
getConnectedStoragePoolsList(options=None) from=internal,
task_id=a1bebf2f-7070-4344-90b7-1d709ba94b5c (api:48)
2019-05-21 13:10:27,154+0200 INFO (vmrecovery) [vdsm.api] FINISH
getConnectedStoragePoolsList return={'poollist': []} from=internal,
task_id=a1bebf2f-7070-4344-90b7-1d709ba94b5c (api:54)
2019-05-21 13:10:27,155+0200 INFO (vmrecovery) [vds] recovery: waiting
for storage pool to go up (clientIF:709)
2019-05-21 13:10:31,245+0200 INFO (jsonrpc/4) [api.host] START
getAllVmStats() from=::1,39144 (api:48)
2019-05-21 13:10:31,247+0200 INFO (jsonrpc/4) [api.host] FINISH
getAllVmStats return={'status': {'message': 'Done', 'code': 0},
'statsList': (suppressed)} from=::1,39144 (api:54)
2019-05-21 13:10:31,249+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer]
RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
hosts are latest CentOS7 (but old AMD Opteron HW), oVirt is 4.3.3.7-1.el7
I cannot track it down to network layer. We have 4 other RHV hosts on
the same infrastructure and it works well. Some clues what is happening?
Thanks in advance,
Jiri Slezka
5 years, 9 months
RHEL 8 Template Seal failed
by Vinícius Ferrão
Hello,
I’m trying to seal a RHEL8 template but the operation is failing.
Here’s the relevant information from engine.log:
2019-05-17 01:30:31,153-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-58) [91e1acd6-efc5-411b-8c76-970def4ebbbe] FINISH, GetHostJobsVDSCommand, return: {b80c0bbd-25b8-4007-9b91-376cb0a18e30=HostJobInfo:{id='b80c0bbd-25b8-4007-9b91-376cb0a18e30', type='virt', description='seal_vm', status='failed', progress='null', error='VDSError:{code='GeneralException', message='General Exception: ('Command [\'/usr/bin/virt-sysprep\', \'-a\', u\'/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_vm/d19456e4-0051-456e-b33c-57348a78c2e0/images/1ecdfbfc-1c22-452f-9a53-2159701549c8/f9de3eae-f475-451b-b587-f6a1405036e8\'] failed with rc=1 out=\'[ 0.0] Examining the guest ...\\nvirt-sysprep: warning: mount_options: mount exited with status 32: mount: \\nwrong fs type, bad option, bad superblock on /dev/mapper/rhel_rhel8-root,\\n missing codepage or helper program, or other error\\n\\n In some cases useful info is found in syslog - try\\n dmesg | tail or so. (ignored)\\nvirt-sysprep: warning: mount_options: mount: /boot: mount point is not a \\ndirectory (ignored)\\nvirt-sysprep: warning: mount_options: mount: /boot/efi: mount point is not \\na directory (ignored)\\n[ 17.9] Performing "abrt-data" ...\\n\' err="virt-sysprep: error: libguestfs error: glob_expand: glob_expand_stub: you \\nmust call \'mount\' first to mount the root filesystem\\n\\nIf reporting bugs, run virt-sysprep with debugging enabled and include the \\ncomplete output:\\n\\n virt-sysprep -v -x [...]\\n"',)'}'}}, log id: 1bbb34bf
I’m not shure what’s wrong or missing. The VM image is using UEFI with Secure Boot, so standard UEFI partition is in place.
Ive found something on bugzilla but does not seem to be related:
https://bugzilla.redhat.com/show_bug.cgi?id=1671895
Thanks,
5 years, 9 months
VM pools broken in 4.3
by Rik Theys
Hi,
It seems VM pools are completely broken since our upgrade to 4.3. Is
anybody else also experiencing this issue?
Only a single instance from a pool can be used. Afterwards the pool
becomes unusable due to a lock not being released. Once ovirt-engine is
restarted, another (single) VM from a pool can be used.
I've added my findings to bug 1462236, but I'm no longer sure the issue
is the same as the one initially reported.
When the first VM of a pool is started:
2019-05-14 13:26:46,058+02 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 2fb4f7f5
2019-05-14 13:26:46,058+02 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 2fb4f7f5
2019-05-14 13:26:46,208+02 INFO [org.ovirt.engine.core.bll.VmPoolHandler] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock Acquired to object 'EngineLock:{exclusiveLocks='[d8a99676-d520-425e-9974-1b1efe6da8a5=VM]', sharedLocks=''}'
-> it has acquired a lock (lock1)
2019-05-14 13:26:46,247+02 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
-> it has acquired another lock (lock2)
2019-05-14 13:26:46,352+02 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command: AttachUserToVmFromPoolAndRunCommand internal: false. Entities affected : ID: 4c622213-e5f4-4032-8639-643174b698cc Type: VmPoolAction group VM_POOL_BASIC_OPERATIONS with role type USER
2019-05-14 13:26:46,393+02 INFO [org.ovirt.engine.core.bll.AddPermissionCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command: AddPermissionCommand internal: true. Entities affected : ID: d8a99676-d520-425e-9974-1b1efe6da8a5 Type: VMAction group MANIPULATE_PERMISSIONS with role type USER
2019-05-14 13:26:46,433+02 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Succeeded giving user 'a5bed59c-d2fe-4fe4-bff7-52efe089ebd6' permission to Vm 'd8a99676-d520-425e-9974-1b1efe6da8a5'
2019-05-14 13:26:46,608+02 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 67acc561
2019-05-14 13:26:46,608+02 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 67acc561
2019-05-14 13:26:46,719+02 INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command:RunVmCommand internal: true. Entities affected : ID: d8a99676-d520-425e-9974-1b1efe6da8a5 Type: VMAction group RUN_VM with role type USER
2019-05-14 13:26:46,791+02 INFO [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, UpdateVmDynamicDataVDSCommand( UpdateVmDynamicDataVDSCommandParameters:{hostId='null', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vmDynamic='org.ovirt.engine.core.common.businessentities.VmDynamic@6db8c94d'}), log id: 2c110e4
2019-05-14 13:26:46,795+02 INFO [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, UpdateVmDynamicDataVDSCommand, return: , log id: 2c110e4
2019-05-14 13:26:46,804+02 INFO [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START,CreateVDSCommand( CreateVDSCommandParameters:{hostId='eec7ec2b-cae1-4bb9-b933-4dff47a70bdb', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vm='VM [stud-c7-1]'}), log id: 71d599f2
2019-05-14 13:26:46,809+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, CreateBrokerVDSCommand(HostName = studvirt1, CreateVDSCommandParameters:{hostId='eec7ec2b-cae1-4bb9-b933-4dff47a70bdb', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vm='VM [stud-c7-1]'}), log id: 3aa6b5ff
2019-05-14 13:26:46,836+02 INFO [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Kernel FIPS - Guid: eec7ec2b-cae1-4bb9-b933-4dff47a70bdb fips: false
2019-05-14 13:26:46,903+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
[domain xml stripped]
2019-05-14 13:26:46,928+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, CreateBrokerVDSCommand, return: , log id: 3aa6b5ff
2019-05-14 13:26:46,932+02 INFO [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, CreateVDSCommand, return: WaitForLaunch, log id: 71d599f2
2019-05-14 13:26:46,932+02 INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
-> it has released lock2
2019-05-14 13:26:47,004+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM stud-c7-1 on Host studvirt1
2019-05-14 13:26:47,094+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: USER_ATTACH_USER_TO_VM_FROM_POOL(316), Attaching User u0045469 to VM stud-c7-1 in VM Pool stud-c7-? was initiated by u0045469(a)esat.kuleuven.be-authz <mailto:u0045469@esat.kuleuven.be-authz>.
2019-05-14 13:26:47,098+02 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Trying to release exclusive lock which does not exist, lock key: 'a5bed59c-d2fe-4fe4-bff7-52efe089ebd6USER_VM_POOL'
-> it's trying to release the same lock2 as above and failing
2019-05-14 13:26:47,098+02 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
-> now it's indicating that it released/freed the lock (lock2)
2019-05-14 13:26:48,518+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-3) [e3c5745c-e593-4aed-ba67-b173808140e8] Command 'AttachUserToVmFromPoolAndRun' id: '0148c91d-b053-4dc9-960c-f10bf0d3f343' child commands '[0470802d-09fa-4579-b95b-3d9603feff7b, 47dbfc58-3bae-4229-96eb-d1fc08911237]' executions were completed, status 'SUCCEEDED'
2019-05-14 13:26:49,584+02 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [e3c5745c-e593-4aed-ba67-b173808140e8] Ending command 'org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand' successfully.
2019-05-14 13:26:49,650+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: USER_ATTACH_USER_TO_VM_FROM_POOL_FINISHED_SUCCESS(318), User u0045469 successfully attached to VM stud-c7-1 in VM Pool stud-c7-?.
2019-05-14 13:26:50,584+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_CONSOLE_DISCONNECTED(168), User <UNKNOWN> got disconnected from VM stud-c7-1.
2019-05-14 13:26:50,585+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM 'd8a99676-d520-425e-9974-1b1efe6da8a5'(stud-c7-1) moved from 'WaitForLaunch' --> 'PoweringUp'
The first lock (on the pool itself) is never released. It seems this lock should also be released? Maybe it's this lock that should be released the second time (instead of the second lock twice)?
When we start to launch another instance of the pool it fails:
2019-05-14 13:49:32,656+02 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 7db2f4fc
2019-05-14 13:49:32,656+02 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 7db2f4fc
2019-05-14 13:49:32,688+02 INFO [org.ovirt.engine.core.bll.VmPoolHandler] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[d8a99676-d520-425e-9974-1b1efe6da8a5=VM]', sharedLocks=''}'
2019-05-14 13:49:32,700+02 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
2019-05-14 13:49:32,700+02 WARN [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Validation of action 'AttachUserToVmFromPoolAndRun' failed for user u0045469(a)esat.kuleuven.be-authz <mailto:u0045469@esat.kuleuven.be-authz>. Reasons: VAR__ACTION__ALLOCATE_AND_RUN,VAR__TYPE__VM_FROM_VM_POOL,ACTION_TYPE_FAILED_NO_AVAILABLE_POOL_VMS
2019-05-14 13:49:32,700+02 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
2019-05-14 13:49:32,706+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-11) [] Operation Failed: [Cannot allocate and run VM from VM-Pool. There are no available VMs in the VM-Pool.]
Regards,
Rik
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>
5 years, 9 months
Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available
by Strahil
On this one https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3...
We should have the following options:
performance.quick-read=off performance.read-ahead=off performance.io-cache=off performance.stat-prefetch=off performance.low-prio-threads=32 network.remote-dio=enable cluster.eager-lock=enable cluster.quorum-type=auto cluster.server-quorum-type=server cluster.data-self-heal-algorithm=full cluster.locking-scheme=granular cluster.shd-max-threads=8 cluster.shd-wait-qlength=10000 features.shard=on user.cifs=off
By the way the 'virt' gluster group disables 'cluster.choose-local' and I think it wasn't like that.
Any reasons behind that , as I use it to speedup my reads, as local storage is faster than the network?
Best Regards,
Strahil NikolovOn May 19, 2019 09:36, Strahil <hunter86_bg(a)yahoo.com> wrote:
>
> OK,
>
> Can we summarize it:
> 1. VDO must 'emulate512=true'
> 2. 'network.remote-dio' should be off ?
>
> As per this: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/h...
>
> We should have these:
>
> quick-read=off
> read-ahead=off
> io-cache=off
> stat-prefetch=off
> eager-lock=enable
> remote-dio=on
> quorum-type=auto
> server-quorum-type=server
>
> I'm a little bit confused here.
>
> Best Regards,
> Strahil Nikolov
>
> On May 19, 2019 07:44, Sahina Bose <sabose(a)redhat.com> wrote:
>>
>>
>>
>> On Sun, 19 May 2019 at 12:21 AM, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>>
>>> On Fri, May 17, 2019 at 7:54 AM Gobinda Das <godas(a)redhat.com> wrote:
>>>>
>>>> From RHHI side default we are setting below volume options:
>>>>
>>>> { group: 'virt',
>>>> storage.owner-uid: '36',
>>>> storage.owner-gid: '36',
>>>> network.ping-timeout: '30',
>>>> performance.strict-o-direct: 'on',
>>>> network.remote-dio: 'off'
>>>
>>>
>>> According to the user reports, this configuration is not compatible with oVirt.
>>>
>>> Was this tested?
>>
>>
>> Yes, this is set by default in all test configuration. We’re checking on the bug, but the error is likely when the underlying device does not support 512b writes.
>> With network.remote-dio off gluster will ensure o-direct writes
>>>
>>>
>>>> }
>>>>
>>>>
>>>> On Fri, May 17, 2019 at 2:31 AM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
>>>>>
>>>>> Ok, setting 'gluster volume set data_fast4 network.remote-dio on' allowed me to create the storage domain without any issues.
>>>>> I set it on all 4 new gluster volumes and the storage domains were successfully created.
>>>>>
>>>>> I have created bug for that:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1711060
>>>>>
>>>>> If someone else already opened - please ping me to mark this one as duplicate.
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>
>>>>>
>>>>> В четвъртък, 16 май 2019 г., 22:27:01 ч. Гринуич+3, Darrell Budic <budic(a)onholyground.com> написа:
>>>>>
>>>>>
>>>>> On May 16, 2019, at 1:41 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>>>>
>>>>>>
>>>>>> On Thu, May 16, 2019 at 8:38 PM Darrell Budic <budic(a)onholyground.com> wrote:
>>>>>>>
>>>>>>> I tried adding a new storage domain on my hyper converged test cluster running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster volume fine, but it’s not able to add the gluster storage domain (as either a managed gluster volume or directly entering values). The created gluster volume mounts and looks fine from the CLI. Errors in VDSM log:
>>>>>>>
>>>>>> ...
>>>>>>>
>>>>>>> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying file system doesn't supportdirect IO (fileSD:110)
>>>>>>> 2019-05-16 10:25:09,584-0500 INFO (jsonrpc/5) [vdsm.api] FINISH createStorageDomain error=Storage Domain target is unsupported: () from=::ffff:10.100.90.5,44732, flow_id=31d993dd, task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:52)
>>>>>>
>>>>>>
>>>>>> The direct I/O check has failed.
>>>>>>
>>>>>>
>>>>>> So something is wrong in the files system.
>>>>>>
>>>>>> To confirm, you can try to do:
>>>>>>
>>>>>> dd if=/dev/zero of=/path/to/mountoint/test bs=4096 count=1 oflag=direct
>>>>>>
>>>>>> This will probably fail with:
>>>>>> dd: failed to open '/path/to/mountoint/test': Invalid argument
>>>>>>
>>>>>> If it succeeds, but oVirt fail to connect to this domain, file a bug and we will investigate.
>>>>>>
>>>>>> Nir
>>>>>
>>>>>
>>>>> Yep, it fails as expected. Just to check, it is working on pre-existing volumes, so I poked around at gluster settings for the new volume. It has network.remote-dio=off set on the new volume, but enabled on old volumes. After enabling it, I’m able to run the dd test:
>>>>>
>>>>> [root@boneyard mnt]# gluster vol set test network.remote-dio enable
>>>>> volume set: success
>>>>> [root@boneyard mnt]# dd if=/dev/zero of=testfile bs=4096 count=1 oflag=direct
>>>>> 1+0 records in
>>>>> 1+0 records out
>>>>> 4096 bytes (4.1 kB) copied, 0.0018285 s, 2.2 MB/s
>>>>>
>>>>> I’m also able to add the storage domain in ovirt now.
>>>>>
>>>>> I see network.remote-dio=enable is part of the gluster virt group, so apparently it’s not getting set by ovirt duding the volume creation/optimze for storage?
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list -- users(a)ovirt.org
>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OPBXHYOHZA4...
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list -- users(a)ovirt.org
>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/B7K24XYG3M4...
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Thanks,
>>>> Gobinda
5 years, 9 months