[Users] SD Disk's Logical Volume not visible/activated on some nodes

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --kNwttSRPs68koQxvdSI5KDEeiPmQG9f68 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, I have ovirt 3.3 installed on on FC 19 hosts with vdsm 4.13.3-2.fc19. One of the hosts (host1) is engine + node + SPM and the other host2 is just a node. I have an iSCSI storage domain configured and accessible from both nodes. When creating a new disk in the SD, the underlying logical volume gets properly created (seen in vgdisplay output on host1), but doesn't seem to be automatically picked by host2. Consequently, when creating/booting a VM with the said disk attached, the VM fails to start on host2, because host2 can't see the LV. Similarly, if the VM is started on host1, it fails to migrate to host2. Extract from host2 log is in the end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280. As far as I could track quickly the vdsm code, there is only call to lvs and not to lvscan or lvchange so the host2 LVM doesn't fully refresh. The only workaround so far has been to restart VDSM on host2, which makes it refresh all LVM data properly. When is host2 supposed to pick up any newly created LVs in the SD VG? Any suggestions where the problem might be? Thanks! Thread-998::DEBUG::2014-02-18 14:49:15,399::BindingXMLRPC::965::vds::(wrapper) client [10.10.0.10]::call vmCreate with ({'acpiEnable': 'true', 'emulatedMachine': 'pc-1.0', 'tabletEnable': 'true', 'vmId': '4669e4ad-7b76-4531-a16b-2b85345593a3', 'memGuaranteedSize': 1024, 'spiceSslCipherSuite': 'DEFAULT', 'timeOffset': '0', 'cpuType': 'Penryn', 'custom': {}, 'smp': '1', 'vmType': 'kvm', 'memSize': 1024, 'smpCoresPerSocket': '1', 'vmName': 'testvm', 'nice': '0', 'smartcardEnable': 'false', 'keyboardLayout': 'en-us', 'kvmEnable': 'true', 'pitReinjection': 'false', 'transparentHugePages': 'true', 'devices': [{'device': 'cirrus', 'specParams': {'vram': '32768', 'heads': '1'}, 'type': 'video', 'deviceId': '9b769f44-db37-4e42-a343-408222e1422f'}, {'index': '2', 'iface': 'ide', 'specParams': {'path': ''}, 'readonly': 'true', 'deviceId': 'a61b291c-94de-4fd4-922e-1d51f4d2760d', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'}, {'index': 0, 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'volumeID': '6b35673e-7062-4716-a6c8-d5bf72fe3280', 'imageID': '3738d400-a62e-4ded-b97f-1b4028e5f45b', 'specParams': {}, 'readonly': 'false', 'domainID': '3307f6fa-dd58-43db-ab23-b1fb299006c7', 'optional': 'false', 'deviceId': '3738d400-a62e-4ded-b97f-1b4028e5f45b', 'poolID': '61f15cc0-8bba-482d-8a81-cd636a581b58', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'memballoon', 'specParams': {'model': 'virtio'}, 'type': 'balloon', 'deviceId': '4e05c1d1-2ac3-4885-8c1d-92ccd1388f0d'}, {'device': 'scsi', 'specParams': {}, 'model': 'virtio-scsi', 'type': 'controller', 'deviceId': 'fa3c223a-0cfc-4adf-90bc-cb6073a4b212'}], 'spiceSecureChannels': 'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', 'display': 'vnc'},) {} flowID [20468793] Thread-998::INFO::2014-02-18 14:49:15,406::API::642::vds::(_getNetworkIp) network None: using 0 Thread-998::INFO::2014-02-18 14:49:15,406::clientIF::394::vds::(createVm) vmContainerLock acquired by vm 4669e4ad-7b76-4531-a16b-2b85345593a3 Thread-999::DEBUG::2014-02-18 14:49:15,409::vm::2091::vm.Vm::(_startUnderlyingVm) vmId=3D`4669e4ad-7b76-4531-a16b-2b85345593a3`::Start Thread-998::DEBUG::2014-02-18 14:49:15,409::clientIF::407::vds::(createVm) Total desktops after creation of 4669e4ad-7b76-4531-a16b-2b85345593a3 is 1 Thread-999::DEBUG::2014-02-18 14:49:15,409::vm::2095::vm.Vm::(_startUnderlyingVm) vmId=3D`4669e4ad-7b76-4531-a16b-2b85345593a3`::_ongoingCreations acquired= Thread-998::DEBUG::2014-02-18 14:49:15,410::BindingXMLRPC::972::vds::(wrapper) return vmCreate with {'status': {'message': 'Done', 'code': 0}, 'vmList': {'status': 'WaitForLaunch', 'acpiEnable': 'true', 'emulatedMachine': 'pc-1.0', 'tabletEnable': 'true', 'pid': '0', 'memGuaranteedSize': 1024, 'timeOffset': '0', 'keyboardLayout': 'en-us', 'displayPort': '-1', 'displaySecurePort': '-1', 'spiceSslCipherSuite': 'DEFAULT', 'cpuType': 'Penryn', 'smp': '1', 'clientIp': '', 'nicModel': 'rtl8139,pv', 'smartcardEnable': 'false', 'kvmEnable': 'true', 'pitReinjection': 'false', 'vmId': '4669e4ad-7b76-4531-a16b-2b85345593a3', 'transparentHugePages': 'true', 'devices': [{'device': 'cirrus', 'specParams': {'vram': '32768', 'heads': '1'}, 'type': 'video', 'deviceId': '9b769f44-db37-4e42-a343-408222e1422f'}, {'index': '2', 'iface': 'ide', 'specParams': {'path': ''}, 'readonly': 'true', 'deviceId': 'a61b291c-94de-4fd4-922e-1d51f4d2760d', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'}, {'index': 0, 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'volumeID': '6b35673e-7062-4716-a6c8-d5bf72fe3280', 'imageID': '3738d400-a62e-4ded-b97f-1b4028e5f45b', 'specParams': {}, 'readonly': 'false', 'domainID': '3307f6fa-dd58-43db-ab23-b1fb299006c7', 'optional': 'false', 'deviceId': '3738d400-a62e-4ded-b97f-1b4028e5f45b', 'poolID': '61f15cc0-8bba-482d-8a81-cd636a581b58', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'memballoon', 'specParams': {'model': 'virtio'}, 'type': 'balloon', 'deviceId': '4e05c1d1-2ac3-4885-8c1d-92ccd1388f0d'}, {'device': 'scsi', 'specParams': {}, 'model': 'virtio-scsi', 'type': 'controller', 'deviceId': 'fa3c223a-0cfc-4adf-90bc-cb6073a4b212'}], 'custom': {}, 'vmType': 'kvm', 'memSize': 1024, 'displayIp': '0', 'spiceSecureChannels': 'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', 'smpCoresPerSocket': '1', 'vmName': 'testvm', 'display': 'vnc', 'nice': '0'}} Thread-999::INFO::2014-02-18 14:49:15,410::vm::2926::vm.Vm::(_run) vmId=3D`4669e4ad-7b76-4531-a16b-2b85345593a3`::VM wrapper has started Thread-999::DEBUG::2014-02-18 14:49:15,412::task::579::TaskManager.Task::(_updateState) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::moving from state init -> state preparing Thread-999::INFO::2014-02-18 14:49:15,413::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID=3D'3307f6fa-dd58-43db-ab23-b1fb299006c7', spUUID=3D'61f15cc0-8bba-482d-8a81-cd636a581b58', imgUUID=3D'3738d400-a62e-4ded-b97f-1b4028e5f45b', volUUID=3D'6b35673e-7062-4716-a6c8-d5bf72fe3280', options=3DNone) Thread-999::DEBUG::2014-02-18 14:49:15,413::lvm::440::OperationMutex::(_reloadlvs) Operation 'lvm reload operation' got the operation mutex Thread-999::DEBUG::2014-02-18 14:49:15,413::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvs --config " devices { preferred_names =3D [\\"^/dev/mapper/\\"] ignore_suspended_devices=3D1 write_cache_state=3D0 disable_after_error_count=3D3 obtain_device_list_from_udev=3D0 filter =3D= [ \'a|/dev/mapper/36090a098103b1821532d057a5a0120d4|/dev/mapper/36090a09810= bb99fefb2da57b94332027|\', \'r|.*|\' ] } global { locking_type=3D1 prioritise_write_locks=3D1 wait_for_locks=3D1 } backup { retain_min =3D 50 retain_days =3D 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 3307f6fa-dd58-43db-ab23-b1fb299006c7' (cwd None) Thread-999::DEBUG::2014-02-18 14:49:15,466::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> =3D '';= <rc> =3D 0 Thread-999::DEBUG::2014-02-18 14:49:15,481::lvm::475::Storage.LVM::(_reloadlvs) lvs reloaded Thread-999::DEBUG::2014-02-18 14:49:15,481::lvm::475::OperationMutex::(_reloadlvs) Operation 'lvm reload operation' released the operation mutex Thread-999::WARNING::2014-02-18 14:49:15,482::lvm::621::Storage.LVM::(getLv) lv: 6b35673e-7062-4716-a6c8-d5bf72fe3280 not found in lvs vg: 3307f6fa-dd58-43db-ab23-b1fb299006c7 response Thread-999::ERROR::2014-02-18 14:49:15,482::task::850::TaskManager.Task::(_setError) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res =3D f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3059, in getVolumeSize apparentsize =3D str(volClass.getVSize(dom, imgUUID, volUUID, bs=3D1)= ) File "/usr/share/vdsm/storage/blockVolume.py", line 111, in getVSize size =3D int(int(lvm.getLV(sdobj.sdUUID, volUUID).size) / bs) File "/usr/share/vdsm/storage/lvm.py", line 914, in getLV raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName)) LogicalVolumeDoesNotExistError: Logical volume does not exist: ('3307f6fa-dd58-43db-ab23-b1fb299006c7/6b35673e-7062-4716-a6c8-d5bf72fe32= 80',) Thread-999::DEBUG::2014-02-18 14:49:15,485::task::869::TaskManager.Task::(_run) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::Task._run: 07adf25e-b9fe-44d0-adf4-9159ac0f1f4d ('3307f6fa-dd58-43db-ab23-b1fb299006c7', '61f15cc0-8bba-482d-8a81-cd636a581b58', '3738d400-a62e-4ded-b97f-1b4028e5f45b', '6b35673e-7062-4716-a6c8-d5bf72fe3280') {} failed - stopping task Thread-999::DEBUG::2014-02-18 14:49:15,485::task::1194::TaskManager.Task::(stop) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::stopping in state preparin= g (force False) Thread-999::DEBUG::2014-02-18 14:49:15,486::task::974::TaskManager.Task::(_decref) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::ref 1 aborting True Thread-999::INFO::2014-02-18 14:49:15,486::task::1151::TaskManager.Task::(prepare) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::aborting: Task is aborted:= 'Logical volume does not exist' - code 610 Thread-999::DEBUG::2014-02-18 14:49:15,486::task::1156::TaskManager.Task::(prepare) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::Prepare: aborted: Logical volume does not exist Thread-999::DEBUG::2014-02-18 14:49:15,486::task::974::TaskManager.Task::(_decref) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::ref 0 aborting True Thread-999::DEBUG::2014-02-18 14:49:15,486::task::909::TaskManager.Task::(_doAbort) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::Task._doAbort: force False= Thread-999::DEBUG::2014-02-18 14:49:15,487::resourceManager::976::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-999::DEBUG::2014-02-18 14:49:15,487::task::579::TaskManager.Task::(_updateState) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::moving from state preparin= g -> state aborting Thread-999::DEBUG::2014-02-18 14:49:15,487::task::534::TaskManager.Task::(__state_aborting) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::_aborting: recover policy = none Thread-999::DEBUG::2014-02-18 14:49:15,487::task::579::TaskManager.Task::(_updateState) Task=3D`07adf25e-b9fe-44d0-adf4-9159ac0f1f4d`::moving from state aborting= -> state failed Thread-999::DEBUG::2014-02-18 14:49:15,487::resourceManager::939::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-999::DEBUG::2014-02-18 14:49:15,488::resourceManager::976::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-999::ERROR::2014-02-18 14:49:15,488::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Logical volume does not exist: ('3307f6fa-dd58-43db-ab23-b1fb299006c7/6b35673e-7062-4716-a6c8-d5bf72fe32= 80',)", 'code': 610}} Thread-999::ERROR::2014-02-18 14:49:15,488::vm::1826::vm.Vm::(_normalizeVdsmImg) vmId=3D`4669e4ad-7b76-4531-a16b-2b85345593a3`::Unable to get volume size for 6b35673e-7062-4716-a6c8-d5bf72fe3280 Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 1822, in _normalizeVdsmImg drv['truesize'] =3D res['truesize'] KeyError: 'truesize' Thread-999::DEBUG::2014-02-18 14:49:15,489::vm::2112::vm.Vm::(_startUnderlyingVm) vmId=3D`4669e4ad-7b76-4531-a16b-2b85345593a3`::_ongoingCreations released= Thread-999::ERROR::2014-02-18 14:49:15,489::vm::2138::vm.Vm::(_startUnderlyingVm) vmId=3D`4669e4ad-7b76-4531-a16b-2b85345593a3`::The vm start process faile= d Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2098, in _startUnderlyingVm self._run() File "/usr/share/vdsm/vm.py", line 2930, in _run devices =3D self.buildConfDevices() File "/usr/share/vdsm/vm.py", line 1935, in buildConfDevices self._normalizeVdsmImg(drv) File "/usr/share/vdsm/vm.py", line 1828, in _normalizeVdsmImg drv['volumeID']) RuntimeError: Volume 6b35673e-7062-4716-a6c8-d5bf72fe3280 is corrupted or missing Best regards, Boyan Tabakov --kNwttSRPs68koQxvdSI5KDEeiPmQG9f68 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlMDYX8ACgkQXOXFG4fgV77amACfdaldO8NlpTry88hrRrdH6eGk umQAoMNVB+zEV4gO+SunFLjuyq8J0pL0 =m0Qx -----END PGP SIGNATURE----- --kNwttSRPs68koQxvdSI5KDEeiPmQG9f68--

----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: users@ovirt.org Sent: Tuesday, February 18, 2014 3:34:49 PM Subject: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello,
I have ovirt 3.3 installed on on FC 19 hosts with vdsm 4.13.3-2.fc19.
Which version of ovirt 3.3 is this? (3.3.2? 3.3.3?)
One of the hosts (host1) is engine + node + SPM and the other host2 is just a node. I have an iSCSI storage domain configured and accessible from both nodes.
When creating a new disk in the SD, the underlying logical volume gets properly created (seen in vgdisplay output on host1), but doesn't seem to be automatically picked by host2.
How do you know it is not seen on host2?
Consequently, when creating/booting a VM with the said disk attached, the VM fails to start on host2, because host2 can't see the LV. Similarly, if the VM is started on host1, it fails to migrate to host2. Extract from host2 log is in the end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
As far as I could track quickly the vdsm code, there is only call to lvs and not to lvscan or lvchange so the host2 LVM doesn't fully refresh. The only workaround so far has been to restart VDSM on host2, which makes it refresh all LVM data properly.
When is host2 supposed to pick up any newly created LVs in the SD VG? Any suggestions where the problem might be?
When you create a new lv on the shared storage, the new lv should be visible on the other host. Lets start by verifying that you do see the new lv after a disk was created. Try this: 1. Create a new disk, and check the disk uuid in the engine ui 2. On another machine, run this command: lvs -o vg_name,lv_name,tags You can identify the new lv using tags, which should contain the new disk uuid. If you don't see the new lv from the other host, please provide /var/log/messages and /var/log/sanlock.log. Thanks, Nir

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --ODRkdRPhuAd6KEcf9OhiQ3AkS3jDuGPGs Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, On 19.2.2014, 17:09, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: users@ovirt.org Sent: Tuesday, February 18, 2014 3:34:49 PM Subject: [Users] SD Disk's Logical Volume not visible/activated on som= e nodes
Hello,
I have ovirt 3.3 installed on on FC 19 hosts with vdsm 4.13.3-2.fc19. =20 Which version of ovirt 3.3 is this? (3.3.2? 3.3.3?)
ovirt-engine is 3.3.2-1.fc19
One of the hosts (host1) is engine + node + SPM and the other host2 is=
just a node. I have an iSCSI storage domain configured and accessible from both nodes.
When creating a new disk in the SD, the underlying logical volume gets=
properly created (seen in vgdisplay output on host1), but doesn't seem=
to be automatically picked by host2. =20 How do you know it is not seen on host2?
It's not present in the output of vgdisplay -v nor vgs.
Consequently, when creating/booting a VM with the said disk attached, the VM fails to start on host2, because host2 can't see the LV. Similarly, if the VM is started on host1, it fails to migrate to host2. Extract from host2 log is in the end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
As far as I could track quickly the vdsm code, there is only call to l= vs and not to lvscan or lvchange so the host2 LVM doesn't fully refresh. The only workaround so far has been to restart VDSM on host2, which makes it refresh all LVM data properly.
When is host2 supposed to pick up any newly created LVs in the SD VG? Any suggestions where the problem might be? =20 When you create a new lv on the shared storage, the new lv should be visible on the other host. Lets start by verifying that you do see
=20 the new lv after a disk was created.=20 =20 Try this: =20 1. Create a new disk, and check the disk uuid in the engine ui 2. On another machine, run this command: =20 lvs -o vg_name,lv_name,tags =20 You can identify the new lv using tags, which should contain the new di= sk uuid. =20 If you don't see the new lv from the other host, please provide /var/lo= g/messages and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node. On the SPM node (where the LV is visible in vgs output): Feb 19 19:10:43 host1 vdsm root WARNING File: /rhev/data-center/61f15cc0-8bba-482d-8a81-cd636a581b58/3307f6fa-dd58-43db= -ab23-b1fb299006c7/images/4d15543c-4c45-4c23-bbe3-f10b9084472a/3e0ce8cb-3= 740-49d7-908e-d025875ac9a2 already removed Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent) Feb 19 19:10:45 host1 multipathd: dm-65: devmap not registered, can't rem= ove Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent) Feb 19 19:10:54 host1 kernel: [1652684.864746] dd: sending ioctl 80306d02 to a partition! Feb 19 19:10:54 host1 kernel: [1652684.963931] dd: sending ioctl 80306d02 to a partition! No recent entries in sanlock.log on the SPM node. On the non-SPM node (the one that doesn't show the LV in vgs output), there are no relevant entries in /var/log/messages. Here's the full sanlock.log for that host: 2014-01-30 16:28:09+0200 1324 [2335]: sanlock daemon started 2.8 host 18bd0a27-c280-4007-98f2-d2e7e73cd8b5.xenon.futu 2014-01-30 16:59:51+0200 5 [609]: sanlock daemon started 2.8 host 4a7627e2-296a-4e48-a7e2-f6bcecac07ab.xenon.futu 2014-01-31 09:51:43+0200 60717 [614]: s1 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb2= 99006c7/ids:0 2014-01-31 16:03:51+0200 83045 [613]: s1:r1 resource 3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-b1f= b299006c7/leases:1048576 for 8,16,30268 2014-01-31 16:18:01+0200 83896 [614]: s1:r2 resource 3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-b1f= b299006c7/leases:1048576 for 8,16,30268 2014-02-06 05:24:10+0200 563065 [31453]: 3307f6fa aio timeout 0 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 ioto 10 to_count 1 2014-02-06 05:24:10+0200 563065 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:10+0200 563065 [31453]: s1 renewal error -202 delta_length 10 last_success 563034 2014-02-06 05:24:21+0200 563076 [31453]: 3307f6fa aio timeout 0 0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 ioto 10 to_count 2 2014-02-06 05:24:21+0200 563076 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:21+0200 563076 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:32+0200 563087 [31453]: 3307f6fa aio timeout 0 0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 ioto 10 to_count 3 2014-02-06 05:24:32+0200 563087 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:32+0200 563087 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:40+0200 563094 [609]: s1 check_our_lease warning 60 last_success 563034 2014-02-06 05:24:41+0200 563095 [609]: s1 check_our_lease warning 61 last_success 563034 2014-02-06 05:24:42+0200 563096 [609]: s1 check_our_lease warning 62 last_success 563034 2014-02-06 05:24:43+0200 563097 [609]: s1 check_our_lease warning 63 last_success 563034 2014-02-06 05:24:43+0200 563098 [31453]: 3307f6fa aio timeout 0 0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 ioto 10 to_count 4 2014-02-06 05:24:43+0200 563098 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:43+0200 563098 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:44+0200 563098 [609]: s1 check_our_lease warning 64 last_success 563034 2014-02-06 05:24:45+0200 563099 [609]: s1 check_our_lease warning 65 last_success 563034 2014-02-06 05:24:46+0200 563100 [609]: s1 check_our_lease warning 66 last_success 563034 2014-02-06 05:24:47+0200 563101 [609]: s1 check_our_lease warning 67 last_success 563034 2014-02-06 05:24:48+0200 563102 [609]: s1 check_our_lease warning 68 last_success 563034 2014-02-06 05:24:49+0200 563103 [609]: s1 check_our_lease warning 69 last_success 563034 2014-02-06 05:24:50+0200 563104 [609]: s1 check_our_lease warning 70 last_success 563034 2014-02-06 05:24:51+0200 563105 [609]: s1 check_our_lease warning 71 last_success 563034 2014-02-06 05:24:52+0200 563106 [609]: s1 check_our_lease warning 72 last_success 563034 2014-02-06 05:24:53+0200 563107 [609]: s1 check_our_lease warning 73 last_success 563034 2014-02-06 05:24:54+0200 563108 [609]: s1 check_our_lease warning 74 last_success 563034 2014-02-06 05:24:54+0200 563109 [31453]: s1 delta_renew read rv -2 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:54+0200 563109 [31453]: s1 renewal error -2 delta_length 11 last_success 563034 2014-02-06 05:24:55+0200 563109 [609]: s1 check_our_lease warning 75 last_success 563034 2014-02-06 05:24:56+0200 563110 [609]: s1 check_our_lease warning 76 last_success 563034 2014-02-06 05:24:57+0200 563111 [609]: s1 check_our_lease warning 77 last_success 563034 2014-02-06 05:24:58+0200 563112 [609]: s1 check_our_lease warning 78 last_success 563034 2014-02-06 05:24:59+0200 563113 [609]: s1 check_our_lease warning 79 last_success 563034 2014-02-06 05:25:00+0200 563114 [609]: s1 check_our_lease failed 80 2014-02-06 05:25:00+0200 563114 [609]: s1 all pids clear 2014-02-06 05:25:05+0200 563119 [31453]: s1 delta_renew read rv -2 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:25:05+0200 563119 [31453]: s1 renewal error -2 delta_length 10 last_success 563034 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 result 1048576:0 close free 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 result 1048576:0 close free 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 result 1048576:0 close free 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 result 1048576:0 close free 2014-02-06 05:26:09+0200 563183 [614]: s2 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb2= 99006c7/ids:0 2014-02-18 14:22:16+0200 1632150 [614]: s3 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb2= 99006c7/ids:0 2014-02-18 14:38:16+0200 1633111 [614]: s4 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb2= 99006c7/ids:0 2014-02-18 16:13:07+0200 1638801 [613]: s5 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb2= 99006c7/ids:0 2014-02-18 19:28:09+0200 1650504 [614]: s6 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb2= 99006c7/ids:0 Last entry is from yesterday, while I just created a new disk. Let me know if you need more info! Thanks, Boyan --ODRkdRPhuAd6KEcf9OhiQ3AkS3jDuGPGs Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlME53kACgkQXOXFG4fgV77TkwCaAzd6obB6duelJTJzhL8x5h1d KeMAoKjR6obG7yEFHV7Il+pY6f2+uRuc =5APL -----END PGP SIGNATURE----- --ODRkdRPhuAd6KEcf9OhiQ3AkS3jDuGPGs--

----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Wednesday, February 19, 2014 7:18:36 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello,
On 19.2.2014, 17:09, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: users@ovirt.org Sent: Tuesday, February 18, 2014 3:34:49 PM Subject: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello,
I have ovirt 3.3 installed on on FC 19 hosts with vdsm 4.13.3-2.fc19.
Which version of ovirt 3.3 is this? (3.3.2? 3.3.3?)
ovirt-engine is 3.3.2-1.fc19
One of the hosts (host1) is engine + node + SPM and the other host2 is just a node. I have an iSCSI storage domain configured and accessible from both nodes.
When creating a new disk in the SD, the underlying logical volume gets properly created (seen in vgdisplay output on host1), but doesn't seem to be automatically picked by host2.
How do you know it is not seen on host2?
It's not present in the output of vgdisplay -v nor vgs.
Consequently, when creating/booting a VM with the said disk attached, the VM fails to start on host2, because host2 can't see the LV. Similarly, if the VM is started on host1, it fails to migrate to host2. Extract from host2 log is in the end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
As far as I could track quickly the vdsm code, there is only call to lvs and not to lvscan or lvchange so the host2 LVM doesn't fully refresh. The only workaround so far has been to restart VDSM on host2, which makes it refresh all LVM data properly.
When is host2 supposed to pick up any newly created LVs in the SD VG? Any suggestions where the problem might be?
When you create a new lv on the shared storage, the new lv should be visible on the other host. Lets start by verifying that you do see the new lv after a disk was created.
Try this:
1. Create a new disk, and check the disk uuid in the engine ui 2. On another machine, run this command:
lvs -o vg_name,lv_name,tags
You can identify the new lv using tags, which should contain the new disk uuid.
If you don't see the new lv from the other host, please provide /var/log/messages and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node.
This means that storage is not accessible from this host.
On the SPM node (where the LV is visible in vgs output):
Feb 19 19:10:43 host1 vdsm root WARNING File: /rhev/data-center/61f15cc0-8bba-482d-8a81-cd636a581b58/3307f6fa-dd58-43db-ab23-b1fb299006c7/images/4d15543c-4c45-4c23-bbe3-f10b9084472a/3e0ce8cb-3740-49d7-908e-d025875ac9a2 already removed Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent) Feb 19 19:10:45 host1 multipathd: dm-65: devmap not registered, can't remove Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent) Feb 19 19:10:54 host1 kernel: [1652684.864746] dd: sending ioctl 80306d02 to a partition! Feb 19 19:10:54 host1 kernel: [1652684.963931] dd: sending ioctl 80306d02 to a partition!
No recent entries in sanlock.log on the SPM node.
On the non-SPM node (the one that doesn't show the LV in vgs output), there are no relevant entries in /var/log/messages.
Strange - sanlock errors are logged to /var/log/messages. It would be helpful if you attach this log - we may find something in it.
Here's the full sanlock.log for that host:
2014-01-30 16:28:09+0200 1324 [2335]: sanlock daemon started 2.8 host 18bd0a27-c280-4007-98f2-d2e7e73cd8b5.xenon.futu 2014-01-30 16:59:51+0200 5 [609]: sanlock daemon started 2.8 host 4a7627e2-296a-4e48-a7e2-f6bcecac07ab.xenon.futu 2014-01-31 09:51:43+0200 60717 [614]: s1 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0 2014-01-31 16:03:51+0200 83045 [613]: s1:r1 resource 3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/leases:1048576 for 8,16,30268 2014-01-31 16:18:01+0200 83896 [614]: s1:r2 resource 3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/leases:1048576 for 8,16,30268 2014-02-06 05:24:10+0200 563065 [31453]: 3307f6fa aio timeout 0 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 ioto 10 to_count 1 2014-02-06 05:24:10+0200 563065 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
Sanlock cannot write to the ids lockspace
2014-02-06 05:24:10+0200 563065 [31453]: s1 renewal error -202 delta_length 10 last_success 563034 2014-02-06 05:24:21+0200 563076 [31453]: 3307f6fa aio timeout 0 0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 ioto 10 to_count 2 2014-02-06 05:24:21+0200 563076 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:21+0200 563076 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:32+0200 563087 [31453]: 3307f6fa aio timeout 0 0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 ioto 10 to_count 3 2014-02-06 05:24:32+0200 563087 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:32+0200 563087 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:40+0200 563094 [609]: s1 check_our_lease warning 60 last_success 563034 2014-02-06 05:24:41+0200 563095 [609]: s1 check_our_lease warning 61 last_success 563034 2014-02-06 05:24:42+0200 563096 [609]: s1 check_our_lease warning 62 last_success 563034 2014-02-06 05:24:43+0200 563097 [609]: s1 check_our_lease warning 63 last_success 563034 2014-02-06 05:24:43+0200 563098 [31453]: 3307f6fa aio timeout 0 0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 ioto 10 to_count 4 2014-02-06 05:24:43+0200 563098 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:43+0200 563098 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:44+0200 563098 [609]: s1 check_our_lease warning 64 last_success 563034 2014-02-06 05:24:45+0200 563099 [609]: s1 check_our_lease warning 65 last_success 563034 2014-02-06 05:24:46+0200 563100 [609]: s1 check_our_lease warning 66 last_success 563034 2014-02-06 05:24:47+0200 563101 [609]: s1 check_our_lease warning 67 last_success 563034 2014-02-06 05:24:48+0200 563102 [609]: s1 check_our_lease warning 68 last_success 563034 2014-02-06 05:24:49+0200 563103 [609]: s1 check_our_lease warning 69 last_success 563034 2014-02-06 05:24:50+0200 563104 [609]: s1 check_our_lease warning 70 last_success 563034 2014-02-06 05:24:51+0200 563105 [609]: s1 check_our_lease warning 71 last_success 563034 2014-02-06 05:24:52+0200 563106 [609]: s1 check_our_lease warning 72 last_success 563034 2014-02-06 05:24:53+0200 563107 [609]: s1 check_our_lease warning 73 last_success 563034 2014-02-06 05:24:54+0200 563108 [609]: s1 check_our_lease warning 74 last_success 563034 2014-02-06 05:24:54+0200 563109 [31453]: s1 delta_renew read rv -2 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:54+0200 563109 [31453]: s1 renewal error -2 delta_length 11 last_success 563034 2014-02-06 05:24:55+0200 563109 [609]: s1 check_our_lease warning 75 last_success 563034 2014-02-06 05:24:56+0200 563110 [609]: s1 check_our_lease warning 76 last_success 563034 2014-02-06 05:24:57+0200 563111 [609]: s1 check_our_lease warning 77 last_success 563034 2014-02-06 05:24:58+0200 563112 [609]: s1 check_our_lease warning 78 last_success 563034 2014-02-06 05:24:59+0200 563113 [609]: s1 check_our_lease warning 79 last_success 563034 2014-02-06 05:25:00+0200 563114 [609]: s1 check_our_lease failed 80 2014-02-06 05:25:00+0200 563114 [609]: s1 all pids clear 2014-02-06 05:25:05+0200 563119 [31453]: s1 delta_renew read rv -2 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:25:05+0200 563119 [31453]: s1 renewal error -2 delta_length 10 last_success 563034 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 result 1048576:0 close free 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 result 1048576:0 close free 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 result 1048576:0 close free 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 result 1048576:0 close free 2014-02-06 05:26:09+0200 563183 [614]: s2 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
Sanlock can access the storage again - did you change something around this time?
2014-02-18 14:22:16+0200 1632150 [614]: s3 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0 2014-02-18 14:38:16+0200 1633111 [614]: s4 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0 2014-02-18 16:13:07+0200 1638801 [613]: s5 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0 2014-02-18 19:28:09+0200 1650504 [614]: s6 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
Last entry is from yesterday, while I just created a new disk.
What was the status of this host in the engine from 2014-02-06 05:24:10+0200 to 2014-02-18 14:22:16? vdsm.log and engine.log for this time frame will make it more clear. Nir

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --vM4t7FopnCM1lptRhWjrucQQXAOXW89bk Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, On 22.2.2014, 22:19, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Wednesday, February 19, 2014 7:18:36 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on= some nodes
Hello,
On 19.2.2014, 17:09, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: users@ovirt.org Sent: Tuesday, February 18, 2014 3:34:49 PM Subject: [Users] SD Disk's Logical Volume not visible/activated on s= ome nodes
Hello,
I have ovirt 3.3 installed on on FC 19 hosts with vdsm 4.13.3-2.fc19= =2E
Which version of ovirt 3.3 is this? (3.3.2? 3.3.3?)
ovirt-engine is 3.3.2-1.fc19
One of the hosts (host1) is engine + node + SPM and the other host2 = is just a node. I have an iSCSI storage domain configured and accessibl= e from both nodes.
When creating a new disk in the SD, the underlying logical volume ge= ts properly created (seen in vgdisplay output on host1), but doesn't se= em to be automatically picked by host2.
How do you know it is not seen on host2?
It's not present in the output of vgdisplay -v nor vgs.
Consequently, when creating/booting a VM with the said disk attached, the VM fails to start on host2, because host2 can't see the LV. Similarly, if the VM is started on host1, it fails to migrate to host2. Extract from host2 log is in th=
e
end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
As far as I could track quickly the vdsm code, there is only call to= lvs and not to lvscan or lvchange so the host2 LVM doesn't fully refresh= =2E The only workaround so far has been to restart VDSM on host2, which makes it refresh all LVM data properly.
When is host2 supposed to pick up any newly created LVs in the SD VG= ? Any suggestions where the problem might be?
When you create a new lv on the shared storage, the new lv should be visible on the other host. Lets start by verifying that you do see the new lv after a disk was created.
Try this:
1. Create a new disk, and check the disk uuid in the engine ui 2. On another machine, run this command:
lvs -o vg_name,lv_name,tags
You can identify the new lv using tags, which should contain the new = disk uuid.
If you don't see the new lv from the other host, please provide /var/log/messages and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node. =20 This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I restart the vdsmd, all volumes get picked up correctly (become visible in lvs output and VMs can be started with them).
=20
On the SPM node (where the LV is visible in vgs output):
Feb 19 19:10:43 host1 vdsm root WARNING File: /rhev/data-center/61f15cc0-8bba-482d-8a81-cd636a581b58/3307f6fa-dd58-4=
3db-ab23-b1fb299006c7/images/4d15543c-4c45-4c23-bbe3-f10b9084472a/3e0ce8c= b-3740-49d7-908e-d025875ac9a2
already removed Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent) Feb 19 19:10:45 host1 multipathd: dm-65: devmap not registered, can't = remove Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent) Feb 19 19:10:54 host1 kernel: [1652684.864746] dd: sending ioctl 80306d02 to a partition! Feb 19 19:10:54 host1 kernel: [1652684.963931] dd: sending ioctl 80306d02 to a partition!
No recent entries in sanlock.log on the SPM node.
On the non-SPM node (the one that doesn't show the LV in vgs output), there are no relevant entries in /var/log/messages. =20 Strange - sanlock errors are logged to /var/log/messages. It would be h= elpful if you attach this log - we may find something in it.
No entries appear in /var/log/messages, other than the quoted above (Sorry, I didn't clarify that that was from /var/log/messages on host1).
=20
Here's the full sanlock.log for that host:
2014-01-30 16:28:09+0200 1324 [2335]: sanlock daemon started 2.8 host 18bd0a27-c280-4007-98f2-d2e7e73cd8b5.xenon.futu 2014-01-30 16:59:51+0200 5 [609]: sanlock daemon started 2.8 host 4a7627e2-296a-4e48-a7e2-f6bcecac07ab.xenon.futu 2014-01-31 09:51:43+0200 60717 [614]: s1 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1= fb299006c7/ids:0 2014-01-31 16:03:51+0200 83045 [613]: s1:r1 resource 3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-= b1fb299006c7/leases:1048576 for 8,16,30268 2014-01-31 16:18:01+0200 83896 [614]: s1:r2 resource 3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-= b1fb299006c7/leases:1048576 for 8,16,30268 2014-02-06 05:24:10+0200 563065 [31453]: 3307f6fa aio timeout 0 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 ioto 10 to_count 1 2014-02-06 05:24:10+0200 563065 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids =20 Sanlock cannot write to the ids lockspace
=20
2014-02-06 05:24:10+0200 563065 [31453]: s1 renewal error -202 delta_length 10 last_success 563034 2014-02-06 05:24:21+0200 563076 [31453]: 3307f6fa aio timeout 0 0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 ioto 10 to_count 2 2014-02-06 05:24:21+0200 563076 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:21+0200 563076 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:32+0200 563087 [31453]: 3307f6fa aio timeout 0 0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 ioto 10 to_count 3 2014-02-06 05:24:32+0200 563087 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:32+0200 563087 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:40+0200 563094 [609]: s1 check_our_lease warning 60 last_success 563034 2014-02-06 05:24:41+0200 563095 [609]: s1 check_our_lease warning 61 last_success 563034 2014-02-06 05:24:42+0200 563096 [609]: s1 check_our_lease warning 62 last_success 563034 2014-02-06 05:24:43+0200 563097 [609]: s1 check_our_lease warning 63 last_success 563034 2014-02-06 05:24:43+0200 563098 [31453]: 3307f6fa aio timeout 0 0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 ioto 10 to_count 4 2014-02-06 05:24:43+0200 563098 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:43+0200 563098 [31453]: s1 renewal error -202 delta_length 11 last_success 563034 2014-02-06 05:24:44+0200 563098 [609]: s1 check_our_lease warning 64 last_success 563034 2014-02-06 05:24:45+0200 563099 [609]: s1 check_our_lease warning 65 last_success 563034 2014-02-06 05:24:46+0200 563100 [609]: s1 check_our_lease warning 66 last_success 563034 2014-02-06 05:24:47+0200 563101 [609]: s1 check_our_lease warning 67 last_success 563034 2014-02-06 05:24:48+0200 563102 [609]: s1 check_our_lease warning 68 last_success 563034 2014-02-06 05:24:49+0200 563103 [609]: s1 check_our_lease warning 69 last_success 563034 2014-02-06 05:24:50+0200 563104 [609]: s1 check_our_lease warning 70 last_success 563034 2014-02-06 05:24:51+0200 563105 [609]: s1 check_our_lease warning 71 last_success 563034 2014-02-06 05:24:52+0200 563106 [609]: s1 check_our_lease warning 72 last_success 563034 2014-02-06 05:24:53+0200 563107 [609]: s1 check_our_lease warning 73 last_success 563034 2014-02-06 05:24:54+0200 563108 [609]: s1 check_our_lease warning 74 last_success 563034 2014-02-06 05:24:54+0200 563109 [31453]: s1 delta_renew read rv -2 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:24:54+0200 563109 [31453]: s1 renewal error -2 delta_length 11 last_success 563034 2014-02-06 05:24:55+0200 563109 [609]: s1 check_our_lease warning 75 last_success 563034 2014-02-06 05:24:56+0200 563110 [609]: s1 check_our_lease warning 76 last_success 563034 2014-02-06 05:24:57+0200 563111 [609]: s1 check_our_lease warning 77 last_success 563034 2014-02-06 05:24:58+0200 563112 [609]: s1 check_our_lease warning 78 last_success 563034 2014-02-06 05:24:59+0200 563113 [609]: s1 check_our_lease warning 79 last_success 563034 2014-02-06 05:25:00+0200 563114 [609]: s1 check_our_lease failed 80 2014-02-06 05:25:00+0200 563114 [609]: s1 all pids clear 2014-02-06 05:25:05+0200 563119 [31453]: s1 delta_renew read rv -2 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids 2014-02-06 05:25:05+0200 563119 [31453]: s1 renewal error -2 delta_length 10 last_success 563034 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 0 0x7fc37c0008c0 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 1 0x7fc37c000910 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 2 0x7fc37c000960 busy 2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 3 0x7fc37c0009b0 busy 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 result 1048576:0 close fr= ee 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 result 1048576:0 close fr= ee 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 result 1048576:0 close fr= ee 2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect 0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 result 1048576:0 close fr= ee 2014-02-06 05:26:09+0200 563183 [614]: s2 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1= fb299006c7/ids:0 =20 Sanlock can access the storage again - did you change something around =
Which line shows that sanlock can't write? The messages are not very "human readable". this time? Unfortunately, I can't recall. The node was generally used during between 6th and 18th, but all running VMs used directly attached iSCSI disks, instead of disks from the Storage Domain. I don't think any configuration changes were done to wither the host, nor storage at that time.
=20
2014-02-18 14:22:16+0200 1632150 [614]: s3 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1= fb299006c7/ids:0 2014-02-18 14:38:16+0200 1633111 [614]: s4 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1= fb299006c7/ids:0 2014-02-18 16:13:07+0200 1638801 [613]: s5 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1= fb299006c7/ids:0 2014-02-18 19:28:09+0200 1650504 [614]: s6 lockspace 3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1= fb299006c7/ids:0
Last entry is from yesterday, while I just created a new disk. =20 What was the status of this host in the engine from 2014-02-06 05:24:10= +0200 to 2014-02-18 14:22:16? =20 vdsm.log and engine.log for this time frame will make it more clear. =20 Nir =20
Host was up and running. The vdsm and engine logs are quite large, as we were running some VM migrations between the hosts. Any pointers at what to look for? For example, I noticed many entries in engine.log like this:= 2014-02-24 17:02:26,635 INFO [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-46) OnVdsDuringFailureTimer of vds host2 entered. Attempts after -580 The "attempt after" number continues to decrease by one every half hour. There was probably some issue on 02.06, I can see this in engine log: 2014-02-06 05:25:03,078 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-48) domain 3307f6fa-dd58-43db-ab23-b1fb299006c7:OtaverkkoDefault in problem. vds: ho= st2 2014-02-06 05:26:34,126 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-48) Domain 3307f6fa-dd58-43db-ab23-b1fb299006c7:OtaverkkoDefault recovered from problem. vds: host2 2014-02-06 05:26:34,135 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-48) Domain 3307f6fa-dd58-43db-ab23-b1fb299006c7:OtaverkkoDefault has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer. However, that was probably temporary. At the moment, if I force host2 to become SPM, then the issue repeats - this time host1 doesn't pick up newly created disks (and again, no messages in /var/log/messages or sanlock). So, I don't think there is issue with connection to storage on one host. What exactly is the mechanism by which the non-SPM nodes should pick up newly created disks? Is the engine sending them command to refresh, or are they supposed to monitor the metadata of the Storage Domain volume group? One warning that I keep seeing in vdsm logs on both nodes is this: Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=3D134217728 mdafree=3D0 This was happening even when the storage domain was newly created and there were only one or two disks on it! I can't imagine how that could be "too many". Thanks, Boyan --vM4t7FopnCM1lptRhWjrucQQXAOXW89bk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlMMaC8ACgkQXOXFG4fgV77ZVACgtoEt4Dc4xM1bTYyzL3QB8L4t 9EYAn2aYffGrecX+9+qDNvyfDCUMXKwy =9uyU -----END PGP SIGNATURE----- --vM4t7FopnCM1lptRhWjrucQQXAOXW89bk--

----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Tuesday, February 25, 2014 11:53:45 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello,
On 22.2.2014, 22:19, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Wednesday, February 19, 2014 7:18:36 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello,
On 19.2.2014, 17:09, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: users@ovirt.org Sent: Tuesday, February 18, 2014 3:34:49 PM Subject: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Consequently, when creating/booting a VM with the said disk attached, the VM fails to start on host2, because host2 can't see the LV. Similarly, if the VM is started on host1, it fails to migrate to host2. Extract from host2 log is in the end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
As far as I could track quickly the vdsm code, there is only call to lvs and not to lvscan or lvchange so the host2 LVM doesn't fully refresh.
lvs should see any change on the shared storage.
The only workaround so far has been to restart VDSM on host2, which makes it refresh all LVM data properly.
When vdsm starts, it calls multipath -r, which ensure that we see all physical volumes.
When is host2 supposed to pick up any newly created LVs in the SD VG? Any suggestions where the problem might be?
When you create a new lv on the shared storage, the new lv should be visible on the other host. Lets start by verifying that you do see the new lv after a disk was created.
Try this:
1. Create a new disk, and check the disk uuid in the engine ui 2. On another machine, run this command:
lvs -o vg_name,lv_name,tags
You can identify the new lv using tags, which should contain the new disk uuid.
If you don't see the new lv from the other host, please provide /var/log/messages and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node.
This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I restart the vdsmd, all volumes get picked up correctly (become visible in lvs output and VMs can be started with them).
Lests repeat this test, but now, if you do not see the new lv, please run: multipath -r And report the results.
Here's the full sanlock.log for that host: ... 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 ioto 10 to_count 1 2014-02-06 05:24:10+0200 563065 [31453]: s1 delta_renew read rv -202 offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
Sanlock cannot write to the ids lockspace
Which line shows that sanlock can't write? The messages are not very "human readable".
The one above my comment at 2014-02-06 05:24:10+0200 I suggest to set sanlock debug level on the sanlock log to get more detailed output. Edit /etc/sysconfig/sanlock and add: # -L 7: use debug level logging to sanlock log file SANLOCKOPTS="$SANLOCKOPTS -L 7"
Last entry is from yesterday, while I just created a new disk.
What was the status of this host in the engine from 2014-02-06 05:24:10+0200 to 2014-02-18 14:22:16?
vdsm.log and engine.log for this time frame will make it more clear.
Host was up and running. The vdsm and engine logs are quite large, as we were running some VM migrations between the hosts. Any pointers at what to look for? For example, I noticed many entries in engine.log like this:
It will be hard to make any progress without the logs.
One warning that I keep seeing in vdsm logs on both nodes is this:
Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=134217728 mdafree=0
Can you share the output of the command bellow? lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name I suggest that you open a bug and attach there engine.log, /var/log/messages, vdsm.log and sanlock.log. Please also give detailed info on the host os, vdsm version etc. Nir

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --LREaOVJqkWp8iv9aCReT65xDKv7JE33We Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 28.2.2014, 20:05, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Tuesday, February 25, 2014 11:53:45 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on= some nodes
Hello,
On 22.2.2014, 22:19, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Wednesday, February 19, 2014 7:18:36 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated = on some nodes
Hello,
On 19.2.2014, 17:09, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: users@ovirt.org Sent: Tuesday, February 18, 2014 3:34:49 PM Subject: [Users] SD Disk's Logical Volume not visible/activated on= some nodes
Consequently, when creating/booting a VM with the said disk attached, the VM fails to start on host2, because host2 can't see the LV. Similarly, if the VM is started on=
host1, it fails to migrate to host2. Extract from host2 log is in = the end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
As far as I could track quickly the vdsm code, there is only call = to lvs and not to lvscan or lvchange so the host2 LVM doesn't fully refre= sh. =20 lvs should see any change on the shared storage. =20 The only workaround so far has been to restart VDSM on host2, whic= h makes it refresh all LVM data properly. =20 When vdsm starts, it calls multipath -r, which ensure that we see all p= hysical volumes. =20
When is host2 supposed to pick up any newly created LVs in the SD = VG? Any suggestions where the problem might be?
When you create a new lv on the shared storage, the new lv should b= e visible on the other host. Lets start by verifying that you do see the new lv after a disk was created.
Try this:
1. Create a new disk, and check the disk uuid in the engine ui 2. On another machine, run this command:
lvs -o vg_name,lv_name,tags
You can identify the new lv using tags, which should contain the ne= w disk uuid.
If you don't see the new lv from the other host, please provide /var/log/messages and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node.
This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I restart the vdsmd, all volumes get picked up correctly (become visible in lvs output and VMs can be started with them). =20 Lests repeat this test, but now, if you do not see the new lv, please=20 run: =20 multipath -r =20 And report the results. =20
Running multipath -r helped and the disk was properly picked up by the second host. Is running multipath -r safe while host is not in maintenance mode? If yes, as a temporary workaround I can patch vdsmd to run multipath -r when e.g. monitoring the storage domain.
Here's the full sanlock.log for that host: ... 0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 ioto 10 to_count 1 2014-02-06 05:24:10+0200 563065 [31453]: s1 delta_renew read rv -202=
offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
Sanlock cannot write to the ids lockspace
Which line shows that sanlock can't write? The messages are not very "human readable". =20 The one above my comment at 2014-02-06 05:24:10+0200 =20 I suggest to set sanlock debug level on the sanlock log to get more det= ailed output. =20 Edit /etc/sysconfig/sanlock and add: =20 # -L 7: use debug level logging to sanlock log file SANLOCKOPTS=3D"$SANLOCKOPTS -L 7" =20
Last entry is from yesterday, while I just created a new disk.
What was the status of this host in the engine from 2014-02-06 05:24:10+0200 to 2014-02-18 14:22:16?
vdsm.log and engine.log for this time frame will make it more clear.
Host was up and running. The vdsm and engine logs are quite large, as = we were running some VM migrations between the hosts. Any pointers at wha= t to look for? For example, I noticed many entries in engine.log like th= is: =20 It will be hard to make any progress without the logs. =20
One warning that I keep seeing in vdsm logs on both nodes is this:
Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=3D134217728 mdafree=3D0 =20 Can you share the output of the command bellow? =20 lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count= ,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts. host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,= tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-a= c73492ac465 128.00m 0 13 2 ltJOs7-T7yE-faQd-MkD8-PS5F-wGXU-nmDQrC 3e147e83-0b0c-4cb8-b46f-67c8dc62756e -wi------ 7.00g 114.62g 128.00m 1596 917 IU_bdf2c305-194c-45b5-bd62-1bf01b05e286,MD_7,PU_00000000-0000-0000-0000-0= 00000000000 128.00m 0 13 2 YxR9Df-YfvQ-0W5s-5CnC-JGN6-gQug-vi5R4n 57668a89-c5ca-4231-b18a-dd0607e459a5 -wi------ 10.00g 114.62g 128.00m 1596 917 PU_00000000-0000-0000-0000-000000000000,MD_10,IU_90f2b1e5-59f7-4bed-97a9-= f02d683f07f8 128.00m 0 13 2 FeqoWr-aYs1-N4M6-Miss-lMec-afKk-7XJKpf 637bdaf7-61cb-4feb-838d-24355367dcec -wi------ 7.00g 114.62g 128.00m 1596 917 PU_00000000-0000-0000-0000-000000000000,IU_e2dafaea-71f5-4f60-abc1-0c7bf2= feb223,MD_9 128.00m 0 13 2 9xWhSv-g5ps-utzj-eXfz-G8Zz-BdkN-hN8wGE 65a711a2-d08a-496a-93de-e47aec19cfb1 -wi------ 12.00g 114.62g 128.00m 1596 917 IU_91d741e3-19b7-40bd-8b96-de48caa161f1,PU_00000000-0000-0000-0000-000000= 000000,MD_8 128.00m 0 13 2 Isn11q-cokl-op59-vQcl-l4Rc-Uaeg-1lcGqd f4231952-76c5-4764-9c8b-ac73492ac465 -wi-ao--- 3.00g 114.62g 128.00m 1596 917 MD_4,PU_00000000-0000-0000-0000-000000000000,IU_0227da98-34b2-4b0c-b083-d= 42e7b760036 128.00m 0 13 2 u80jcl-PPVX-nRp6-KDqH-ZcRI-IAuU-JMrqK0 f8f7d099-3e54-45c0-8f4b-3ee25901522e -wi-ao--- 40.00g 114.62g 128.00m 1596 917 MD_6,PU_00000000-0000-0000-0000-000000000000,IU_d5a37063-66bb-4bf1-9269-e= ae7f136d7e1 128.00m 0 13 2 h9SB9I-J9Dm-mTIL-lT6T-ZxvF-8mHj-eAs59A ids -wi-ao--- 128.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 rrycOM-WkBu-1VLg-RYrd-z7bO-8nS9-CkLpqc inbox -wi-a---- 128.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 IDjyMT-Quxu-NxG8-dKfz-CSlc-i1EQ-o8rlSb leases -wi-a---- 2.00g 114.62g 128.00m 1596 917 128.00m 0 13 2 sTxbVK-5YDT-H7nw-53b9-Av7J-YIzP-res2b1 master -wi-ao--- 1.00g 114.62g 128.00m 1596 917 128.00m 0 13 2 2lwYGv-cZop-7Jj9-vYFI-RDEZ-1hKy-oXNbei metadata -wi-a---- 512.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 NznWRR-kLhp-HEMw-0c7P-9XIj-YiXb-oYFovC outbox -wi-a---- 128.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 JTDXAM-vMQI-UY0o-Axy9-UQKv-7Hl7-M9XSwQ mysql -wi-a---- 40.00g 0 4.00m 10241 0 188.00k 0 1 1 nQHcrF-KWhE-s8pY-90bc-jPSN-3qZR-pxHe0O iso-domain -wi-ao--- 10.00g 97.80g 4.00m 34596 25036 1020.00k 0 3 2 qElr6v-lPNE-Kvc5-0g9W-I41J-dsBD-7QfZLv root -wi-ao--- 19.53g 97.80g 4.00m 34596 25036 1020.00k 0 3 2 FSaRPB-WiZo-Ddqo-hPsB-jUe9-bf4v-eJBTJW swap -wi-ao--- 7.81g 97.80g 4.00m 34596 25036 1020.00k 0 3 2 host2: [root@host2 log]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,= tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-a---- 2.00g 114.62g 128.00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-a= c73492ac465 128.00m 0 13 2 ltJOs7-T7yE-faQd-MkD8-PS5F-wGXU-nmDQrC 3e147e83-0b0c-4cb8-b46f-67c8dc62756e -wi-ao--- 7.00g 114.62g 128.00m 1596 917 IU_bdf2c305-194c-45b5-bd62-1bf01b05e286,MD_7,PU_00000000-0000-0000-0000-0= 00000000000 128.00m 0 13 2 YxR9Df-YfvQ-0W5s-5CnC-JGN6-gQug-vi5R4n 57668a89-c5ca-4231-b18a-dd0607e459a5 -wi-a---- 10.00g 114.62g 128.00m 1596 917 PU_00000000-0000-0000-0000-000000000000,MD_10,IU_90f2b1e5-59f7-4bed-97a9-= f02d683f07f8 128.00m 0 13 2 FeqoWr-aYs1-N4M6-Miss-lMec-afKk-7XJKpf 637bdaf7-61cb-4feb-838d-24355367dcec -wi-ao--- 7.00g 114.62g 128.00m 1596 917 PU_00000000-0000-0000-0000-000000000000,IU_e2dafaea-71f5-4f60-abc1-0c7bf2= feb223,MD_9 128.00m 0 13 2 9xWhSv-g5ps-utzj-eXfz-G8Zz-BdkN-hN8wGE 65a711a2-d08a-496a-93de-e47aec19cfb1 -wi-ao--- 12.00g 114.62g 128.00m 1596 917 IU_91d741e3-19b7-40bd-8b96-de48caa161f1,PU_00000000-0000-0000-0000-000000= 000000,MD_8 128.00m 0 13 2 Isn11q-cokl-op59-vQcl-l4Rc-Uaeg-1lcGqd f4231952-76c5-4764-9c8b-ac73492ac465 -wi-a---- 3.00g 114.62g 128.00m 1596 917 MD_4,PU_00000000-0000-0000-0000-000000000000,IU_0227da98-34b2-4b0c-b083-d= 42e7b760036 128.00m 0 13 2 u80jcl-PPVX-nRp6-KDqH-ZcRI-IAuU-JMrqK0 f8f7d099-3e54-45c0-8f4b-3ee25901522e -wi-a---- 40.00g 114.62g 128.00m 1596 917 MD_6,PU_00000000-0000-0000-0000-000000000000,IU_d5a37063-66bb-4bf1-9269-e= ae7f136d7e1 128.00m 0 13 2 h9SB9I-J9Dm-mTIL-lT6T-ZxvF-8mHj-eAs59A ids -wi-ao--- 128.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 rrycOM-WkBu-1VLg-RYrd-z7bO-8nS9-CkLpqc inbox -wi-a---- 128.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 IDjyMT-Quxu-NxG8-dKfz-CSlc-i1EQ-o8rlSb leases -wi-a---- 2.00g 114.62g 128.00m 1596 917 128.00m 0 13 2 sTxbVK-5YDT-H7nw-53b9-Av7J-YIzP-res2b1 master -wi-a---- 1.00g 114.62g 128.00m 1596 917 128.00m 0 13 2 2lwYGv-cZop-7Jj9-vYFI-RDEZ-1hKy-oXNbei metadata -wi-a---- 512.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 NznWRR-kLhp-HEMw-0c7P-9XIj-YiXb-oYFovC outbox -wi-a---- 128.00m 114.62g 128.00m 1596 917 128.00m 0 13 2 G2zjlF-QO3X-tznc-BZE8-mY63-8I9n-gZQ7k5 root -wi-ao--- 19.53g 4.00m 4.00m 7001 1 1020.00k 0 2 1 vtuPP3-itfy-DtFX-j2gJ-wZBI-3RpF-fsWBJc swap -wi-ao--- 7.81g 4.00m 4.00m 7001 1 1020.00k 0 2 1
=20 I suggest that you open a bug and attach there engine.log, /var/log/me= ssages, vdsm.log and sanlock.log. =20 Please also give detailed info on the host os, vdsm version etc. =20 Nir =20
I'll gather logs and will create bug report. Thanks! Boyan --LREaOVJqkWp8iv9aCReT65xDKv7JE33We Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlMUNG4ACgkQXOXFG4fgV76cLQCfRgVxYUJGwyREbC+tBawsveje z5YAn1oMSGckv6tDa7zxL4rK+oEp31/5 =t4kg -----END PGP SIGNATURE----- --LREaOVJqkWp8iv9aCReT65xDKv7JE33We--

Hi Zdenek, can you look into this strange incident? When user creates a disk on one host (create a new lv), the lv is not seen on another host in the cluster. Calling multipath -r cause the new lv to appear on the other host. Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual. ----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Monday, March 3, 2014 9:51:05 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
> Consequently, when creating/booting > a VM with the said disk attached, the VM fails to start on host2, > because host2 can't see the LV. Similarly, if the VM is started on > host1, it fails to migrate to host2. Extract from host2 log is in the > end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280. > > As far as I could track quickly the vdsm code, there is only call to > lvs > and not to lvscan or lvchange so the host2 LVM doesn't fully refresh.
lvs should see any change on the shared storage.
> The only workaround so far has been to restart VDSM on host2, which > makes it refresh all LVM data properly.
When vdsm starts, it calls multipath -r, which ensure that we see all physical volumes.
> > When is host2 supposed to pick up any newly created LVs in the SD VG? > Any suggestions where the problem might be?
When you create a new lv on the shared storage, the new lv should be visible on the other host. Lets start by verifying that you do see the new lv after a disk was created.
Try this:
1. Create a new disk, and check the disk uuid in the engine ui 2. On another machine, run this command:
lvs -o vg_name,lv_name,tags
You can identify the new lv using tags, which should contain the new disk uuid.
If you don't see the new lv from the other host, please provide /var/log/messages and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node.
This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I restart the vdsmd, all volumes get picked up correctly (become visible in lvs output and VMs can be started with them).
Lests repeat this test, but now, if you do not see the new lv, please run:
multipath -r
And report the results.
Running multipath -r helped and the disk was properly picked up by the second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipath -r when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not needed. You should see lv on the shared storage without running multipath. Zdenek, can you explain this?
One warning that I keep seeing in vdsm logs on both nodes is this:
Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=134217728 mdafree=0
Can you share the output of the command bellow?
lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains. Zdenek, how can we debug this further? Nir

----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Boyan Tabakov" <blade@alslayer.net> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com> Sent: Monday, March 3, 2014 9:39:47 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is not seen on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual.
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Monday, March 3, 2014 9:51:05 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
>> Consequently, when creating/booting >> a VM with the said disk attached, the VM fails to start on host2, >> because host2 can't see the LV. Similarly, if the VM is started on >> host1, it fails to migrate to host2. Extract from host2 log is in >> the >> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280. >> >> As far as I could track quickly the vdsm code, there is only call to >> lvs >> and not to lvscan or lvchange so the host2 LVM doesn't fully >> refresh.
lvs should see any change on the shared storage.
>> The only workaround so far has been to restart VDSM on host2, which >> makes it refresh all LVM data properly.
When vdsm starts, it calls multipath -r, which ensure that we see all physical volumes.
>> >> When is host2 supposed to pick up any newly created LVs in the SD >> VG? >> Any suggestions where the problem might be? > > When you create a new lv on the shared storage, the new lv should be > visible on the other host. Lets start by verifying that you do see > the new lv after a disk was created. > > Try this: > > 1. Create a new disk, and check the disk uuid in the engine ui > 2. On another machine, run this command: > > lvs -o vg_name,lv_name,tags > > You can identify the new lv using tags, which should contain the new > disk > uuid. > > If you don't see the new lv from the other host, please provide > /var/log/messages > and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node.
This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I restart the vdsmd, all volumes get picked up correctly (become visible in lvs output and VMs can be started with them).
Lests repeat this test, but now, if you do not see the new lv, please run:
multipath -r
And report the results.
Running multipath -r helped and the disk was properly picked up by the second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipath -r when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not needed.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
One warning that I keep seeing in vdsm logs on both nodes is this:
Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=134217728 mdafree=0
Can you share the output of the command bellow?
lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Zdenek, how can we debug this further?
I see same issue in Fedora 19. Can you share with us the output of: cat /etc/redhat-release uname -a lvm version Nir

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --jRwgP1nGWsjExEUgLS6FSLvHqevHtQK23 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Boyan Tabakov" <blade@alslayer.net> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com> Sent: Monday, March 3, 2014 9:39:47 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on= some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is not = seen on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but u= nusual.
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Monday, March 3, 2014 9:51:05 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated o= n some nodes
>>> Consequently, when creating/booting >>> a VM with the said disk attached, the VM fails to start on host= 2, >>> because host2 can't see the LV. Similarly, if the VM is started= on >>> host1, it fails to migrate to host2. Extract from host2 log is = in >>> the >>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280= =2E >>> >>> As far as I could track quickly the vdsm code, there is only ca= ll to >>> lvs >>> and not to lvscan or lvchange so the host2 LVM doesn't fully >>> refresh.
lvs should see any change on the shared storage.
>>> The only workaround so far has been to restart VDSM on host2, w= hich >>> makes it refresh all LVM data properly.
When vdsm starts, it calls multipath -r, which ensure that we see al= l physical volumes.
>>> >>> When is host2 supposed to pick up any newly created LVs in the = SD >>> VG? >>> Any suggestions where the problem might be? >> >> When you create a new lv on the shared storage, the new lv shoul= d be >> visible on the other host. Lets start by verifying that you do s= ee >> the new lv after a disk was created. >> >> Try this: >> >> 1. Create a new disk, and check the disk uuid in the engine ui >> 2. On another machine, run this command: >> >> lvs -o vg_name,lv_name,tags >> >> You can identify the new lv using tags, which should contain the= new >> disk >> uuid. >> >> If you don't see the new lv from the other host, please provide >> /var/log/messages >> and /var/log/sanlock.log. > > Just tried that. The disk is not visible on the non-SPM node.
This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I resta= rt the vdsmd, all volumes get picked up correctly (become visible in l= vs output and VMs can be started with them).
Lests repeat this test, but now, if you do not see the new lv, pleas= e run:
multipath -r
And report the results.
Running multipath -r helped and the disk was properly picked up by th= e second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipath = -r when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not n= eeded.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
One warning that I keep seeing in vdsm logs on both nodes is this:
Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=3D134217728 mdafree=3D0
Can you share the output of the command bellow?
lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tag= s,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_co= unt,tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00= m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c= 8b-ac73492ac465 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Zdenek, how can we debug this further?
I see same issue in Fedora 19.
Can you share with us the output of:
cat /etc/redhat-release uname -a lvm version
Nir
$ cat /etc/redhat-release Fedora release 19 (Schr=C3=B6dinger=E2=80=99s Cat) $ uname -a Linux blizzard.mgmt.futurice.com 3.12.6-200.fc19.x86_64.debug #1 SMP=20 Mon Dec 23 16:24:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux $ lvm version LVM version: 2.02.98(2) (2012-10-15) Library version: 1.02.77 (2012-10-15) Driver version: 4.26.0 --jRwgP1nGWsjExEUgLS6FSLvHqevHtQK23 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlMV2tUACgkQXOXFG4fgV75tWgCcDmmo/rkdRNmPuXgfGrMN4VIJ 7z8An0LrJa0bhyS9tLVaqz6U30rM6A+p =41Ax -----END PGP SIGNATURE----- --jRwgP1nGWsjExEUgLS6FSLvHqevHtQK23--

----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Tuesday, March 4, 2014 3:53:24 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Boyan Tabakov" <blade@alslayer.net> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com> Sent: Monday, March 3, 2014 9:39:47 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is not seen on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual.
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Monday, March 3, 2014 9:51:05 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
>>>> Consequently, when creating/booting >>>> a VM with the said disk attached, the VM fails to start on host2, >>>> because host2 can't see the LV. Similarly, if the VM is started on >>>> host1, it fails to migrate to host2. Extract from host2 log is in >>>> the >>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280. >>>> >>>> As far as I could track quickly the vdsm code, there is only call >>>> to >>>> lvs >>>> and not to lvscan or lvchange so the host2 LVM doesn't fully >>>> refresh.
lvs should see any change on the shared storage.
>>>> The only workaround so far has been to restart VDSM on host2, which >>>> makes it refresh all LVM data properly.
When vdsm starts, it calls multipath -r, which ensure that we see all physical volumes.
>>>> >>>> When is host2 supposed to pick up any newly created LVs in the SD >>>> VG? >>>> Any suggestions where the problem might be? >>> >>> When you create a new lv on the shared storage, the new lv should be >>> visible on the other host. Lets start by verifying that you do see >>> the new lv after a disk was created. >>> >>> Try this: >>> >>> 1. Create a new disk, and check the disk uuid in the engine ui >>> 2. On another machine, run this command: >>> >>> lvs -o vg_name,lv_name,tags >>> >>> You can identify the new lv using tags, which should contain the new >>> disk >>> uuid. >>> >>> If you don't see the new lv from the other host, please provide >>> /var/log/messages >>> and /var/log/sanlock.log. >> >> Just tried that. The disk is not visible on the non-SPM node. > > This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I restart the vdsmd, all volumes get picked up correctly (become visible in lvs output and VMs can be started with them).
Lests repeat this test, but now, if you do not see the new lv, please run:
multipath -r
And report the results.
Running multipath -r helped and the disk was properly picked up by the second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipath -r when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not needed.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
One warning that I keep seeing in vdsm logs on both nodes is this:
Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=134217728 mdafree=0
Can you share the output of the command bellow?
lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Patch http://gerrit.ovirt.org/25408 should solve this issue. It may also solve the other issue with the missing lv - I could not reproduce it yet. Can you try to apply this patch and report the results? Thanks, Nir

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --CFI93AnmktNI0BWuXdh0ffiLXcOJQifXa Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Nir, On Wed Mar 5 14:37:17 2014, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Tuesday, March 4, 2014 3:53:24 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on= some nodes
On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Boyan Tabakov" <blade@alslayer.net> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com> Sent: Monday, March 3, 2014 9:39:47 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated = on some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is no= t seen on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but=
>>>>> >>>>> As far as I could track quickly the vdsm code, there is only = call >>>>> to >>>>> lvs >>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully >>>>> refresh.
lvs should see any change on the shared storage.
>>>>> The only workaround so far has been to restart VDSM on host2,= which >>>>> makes it refresh all LVM data properly.
When vdsm starts, it calls multipath -r, which ensure that we see = all physical volumes.
>>>>> >>>>> When is host2 supposed to pick up any newly created LVs in th= e SD >>>>> VG? >>>>> Any suggestions where the problem might be? >>>> >>>> When you create a new lv on the shared storage, the new lv sho= uld be >>>> visible on the other host. Lets start by verifying that you do= see >>>> the new lv after a disk was created. >>>> >>>> Try this: >>>> >>>> 1. Create a new disk, and check the disk uuid in the engine ui=
unusual.
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Monday, March 3, 2014 9:51:05 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated= on some nodes
>>>>> Consequently, when creating/booting >>>>> a VM with the said disk attached, the VM fails to start on ho= st2, >>>>> because host2 can't see the LV. Similarly, if the VM is start= ed on >>>>> host1, it fails to migrate to host2. Extract from host2 log i= s in >>>>> the >>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe32=
>>>> 2. On another machine, run this command: >>>> >>>> lvs -o vg_name,lv_name,tags >>>> >>>> You can identify the new lv using tags, which should contain t= he new >>>> disk >>>> uuid. >>>> >>>> If you don't see the new lv from the other host, please provid= e >>>> /var/log/messages >>>> and /var/log/sanlock.log. >>> >>> Just tried that. The disk is not visible on the non-SPM node. >> >> This means that storage is not accessible from this host. > > Generally, the storage seems accessible ok. For example, if I res= tart > the vdsmd, all volumes get picked up correctly (become visible in= lvs > output and VMs can be started with them).
Lests repeat this test, but now, if you do not see the new lv, ple= ase run:
multipath -r
And report the results.
Running multipath -r helped and the disk was properly picked up by = the second host.
Is running multipath -r safe while host is not in maintenance mode?=
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipat=
h -r
when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not=
needed.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
> One warning that I keep seeing in vdsm logs on both nodes is this= : > > Thread-1617881::WARNING::2014-02-24 > 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG > 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded > critical size: mdasize=3D134217728 mdafree=3D0
Can you share the output of the command bellow?
lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count,t= ags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_= count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.= 00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-= 9c8b-ac73492ac465 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Patch http://gerrit.ovirt.org/25408 should solve this issue.
It may also solve the other issue with the missing lv - I could not reproduce it yet.
Can you try to apply this patch and report the results?
Thanks, Nir
This patch helped, indeed! I tried it on the non-SPM node (as that's=20 the node that I can currently easily put in maintenance) and the node=20 started picking up newly created volumes correctly. I also set the=20 user_lvmetad to 0 in the main lvm.conf, because without it manually=20 running e.g. lvs was still using the metadata daemon. I can't confirm yet that this helps with the metadata volume warning,=20 as that warning appears only on the SPM. I'll be able to put the SPM=20 node in maintenance soon and will report later. This issue on Fedora makes me think - is Fedora still fully supported=20 platform? Best regards, Boyan --CFI93AnmktNI0BWuXdh0ffiLXcOJQifXa Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlMXKNcACgkQXOXFG4fgV76CZgCgkAj0IDS6sTZr3DyAVmvBO9J+ vEcAnjP/qvyIjx9eR1DkdP6Ccj2VK/4n =/pXc -----END PGP SIGNATURE----- --CFI93AnmktNI0BWuXdh0ffiLXcOJQifXa--

----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Wednesday, March 5, 2014 3:38:25 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello Nir,
On Wed Mar 5 14:37:17 2014, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Tuesday, March 4, 2014 3:53:24 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Boyan Tabakov" <blade@alslayer.net> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com> Sent: Monday, March 3, 2014 9:39:47 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is not seen on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual.
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Monday, March 3, 2014 9:51:05 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes >>>>>> Consequently, when creating/booting >>>>>> a VM with the said disk attached, the VM fails to start on host2, >>>>>> because host2 can't see the LV. Similarly, if the VM is started >>>>>> on >>>>>> host1, it fails to migrate to host2. Extract from host2 log is in >>>>>> the >>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280. >>>>>> >>>>>> As far as I could track quickly the vdsm code, there is only call >>>>>> to >>>>>> lvs >>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully >>>>>> refresh. > > lvs should see any change on the shared storage. > >>>>>> The only workaround so far has been to restart VDSM on host2, >>>>>> which >>>>>> makes it refresh all LVM data properly. > > When vdsm starts, it calls multipath -r, which ensure that we see all > physical volumes. > >>>>>> >>>>>> When is host2 supposed to pick up any newly created LVs in the SD >>>>>> VG? >>>>>> Any suggestions where the problem might be? >>>>> >>>>> When you create a new lv on the shared storage, the new lv should >>>>> be >>>>> visible on the other host. Lets start by verifying that you do see >>>>> the new lv after a disk was created. >>>>> >>>>> Try this: >>>>> >>>>> 1. Create a new disk, and check the disk uuid in the engine ui >>>>> 2. On another machine, run this command: >>>>> >>>>> lvs -o vg_name,lv_name,tags >>>>> >>>>> You can identify the new lv using tags, which should contain the >>>>> new >>>>> disk >>>>> uuid. >>>>> >>>>> If you don't see the new lv from the other host, please provide >>>>> /var/log/messages >>>>> and /var/log/sanlock.log. >>>> >>>> Just tried that. The disk is not visible on the non-SPM node. >>> >>> This means that storage is not accessible from this host. >> >> Generally, the storage seems accessible ok. For example, if I restart >> the vdsmd, all volumes get picked up correctly (become visible in lvs >> output and VMs can be started with them). > > Lests repeat this test, but now, if you do not see the new lv, please > run: > > multipath -r > > And report the results. >
Running multipath -r helped and the disk was properly picked up by the second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipath -r when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not needed.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
>> One warning that I keep seeing in vdsm logs on both nodes is this: >> >> Thread-1617881::WARNING::2014-02-24 >> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG >> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded >> critical size: mdasize=134217728 mdafree=0 > > Can you share the output of the command bellow? > > lvs -o > uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Patch http://gerrit.ovirt.org/25408 should solve this issue.
It may also solve the other issue with the missing lv - I could not reproduce it yet.
Can you try to apply this patch and report the results?
Thanks, Nir
This patch helped, indeed! I tried it on the non-SPM node (as that's the node that I can currently easily put in maintenance) and the node started picking up newly created volumes correctly. I also set the user_lvmetad to 0 in the main lvm.conf, because without it manually running e.g. lvs was still using the metadata daemon.
I can't confirm yet that this helps with the metadata volume warning, as that warning appears only on the SPM. I'll be able to put the SPM node in maintenance soon and will report later.
This issue on Fedora makes me think - is Fedora still fully supported platform?
It is supported, but probably not tested properly. Nir

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --oXn0SGFNv1kLlogwarm5QsNoD8UwRwMnq Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 5.3.2014, 16:01, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Wednesday, March 5, 2014 3:38:25 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on= some nodes
Hello Nir,
On Wed Mar 5 14:37:17 2014, Nir Soffer wrote:
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Tuesday, March 4, 2014 3:53:24 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated = on some nodes
On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Boyan Tabakov" <blade@alslayer.net> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com> Sent: Monday, March 3, 2014 9:39:47 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activate= d on some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is = not seen on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.=
>>>>>>> >>>>>>> As far as I could track quickly the vdsm code, there is onl= y call >>>>>>> to >>>>>>> lvs >>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't full= y >>>>>>> refresh. >> >> lvs should see any change on the shared storage. >> >>>>>>> The only workaround so far has been to restart VDSM on host= 2, >>>>>>> which >>>>>>> makes it refresh all LVM data properly. >> >> When vdsm starts, it calls multipath -r, which ensure that we se= e all >> physical volumes. >> >>>>>>> >>>>>>> When is host2 supposed to pick up any newly created LVs in = the SD >>>>>>> VG? >>>>>>> Any suggestions where the problem might be? >>>>>> >>>>>> When you create a new lv on the shared storage, the new lv s= hould >>>>>> be >>>>>> visible on the other host. Lets start by verifying that you = do see >>>>>> the new lv after a disk was created. >>>>>> >>>>>> Try this: >>>>>> >>>>>> 1. Create a new disk, and check the disk uuid in the engine = ui >>>>>> 2. On another machine, run this command: >>>>>> >>>>>> lvs -o vg_name,lv_name,tags >>>>>> >>>>>> You can identify the new lv using tags, which should contain= the >>>>>> new >>>>>> disk >>>>>> uuid. >>>>>> >>>>>> If you don't see the new lv from the other host, please prov= ide >>>>>> /var/log/messages >>>>>> and /var/log/sanlock.log. >>>>> >>>>> Just tried that. The disk is not visible on the non-SPM node.=
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, b=
ut
unusual.
----- Original Message ----- > From: "Boyan Tabakov" <blade@alslayer.net> > To: "Nir Soffer" <nsoffer@redhat.com> > Cc: users@ovirt.org > Sent: Monday, March 3, 2014 9:51:05 AM > Subject: Re: [Users] SD Disk's Logical Volume not visible/activat= ed on > some > nodes >>>>>>> Consequently, when creating/booting >>>>>>> a VM with the said disk attached, the VM fails to start on = host2, >>>>>>> because host2 can't see the LV. Similarly, if the VM is sta= rted >>>>>>> on >>>>>>> host1, it fails to migrate to host2. Extract from host2 log= is in >>>>>>> the >>>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe=
>>>> >>>> This means that storage is not accessible from this host. >>> >>> Generally, the storage seems accessible ok. For example, if I r= estart >>> the vdsmd, all volumes get picked up correctly (become visible = in lvs >>> output and VMs can be started with them). >> >> Lests repeat this test, but now, if you do not see the new lv, p= lease >> run: >> >> multipath -r >> >> And report the results. >> > > Running multipath -r helped and the disk was properly picked up b= y the > second host. > > Is running multipath -r safe while host is not in maintenance mod= e?
It should be safe, vdsm uses in some cases.
> If yes, as a temporary workaround I can patch vdsmd to run multip= ath -r > when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is n= ot needed.
You should see lv on the shared storage without running multipath.=
Zdenek, can you explain this?
>>> One warning that I keep seeing in vdsm logs on both nodes is th=
is:
>>> >>> Thread-1617881::WARNING::2014-02-24 >>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG >>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded >>> critical size: mdasize=3D134217728 mdafree=3D0 >> >> Can you share the output of the command bellow? >> >> lvs -o >> uuid,name,attr,size,free,extent_size,extent_count,free_count= ,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name > > Here's the output for both hosts. > > host1: > [root@host1 ~]# lvs -o > uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_fre= e_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count > LV UUID LV > Attr LSize VFree Ext #Ext Free LV Tags > > VMdaSize VMdaFree #LV #PV > jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL > 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 12= 8.00m > 1596 917 > IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-476= 4-9c8b-ac73492ac465 > 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Patch http://gerrit.ovirt.org/25408 should solve this issue.
It may also solve the other issue with the missing lv - I could not reproduce it yet.
Can you try to apply this patch and report the results?
Thanks, Nir
This patch helped, indeed! I tried it on the non-SPM node (as that's the node that I can currently easily put in maintenance) and the node started picking up newly created volumes correctly. I also set the user_lvmetad to 0 in the main lvm.conf, because without it manually running e.g. lvs was still using the metadata daemon.
I can't confirm yet that this helps with the metadata volume warning, as that warning appears only on the SPM. I'll be able to put the SPM node in maintenance soon and will report later.
This issue on Fedora makes me think - is Fedora still fully supported platform? =20 It is supported, but probably not tested properly. =20 Nir =20
Alright! Thanks a lot for the help! BR, Boyan --oXn0SGFNv1kLlogwarm5QsNoD8UwRwMnq Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlMXOGkACgkQXOXFG4fgV76fggCfdCw3VrqvDCB565NVByNEQ+MG Aw8AoJu5O1l9PtdOH08mr7YE5RTgg26C =D5gj -----END PGP SIGNATURE----- --oXn0SGFNv1kLlogwarm5QsNoD8UwRwMnq--
participants (2)
-
Boyan Tabakov
-
Nir Soffer