On Sat, Sep 28, 2019 at 11:04 PM Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

Hi Nir,

Thank you for your time.

On 9/27/19 4:27 PM, Nir Soffer wrote:


On Fri, Sep 27, 2019, 12:37 Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:
Hi,

After upgrading to 4.3.6, my storage domain can no longer be activated, rendering my data center useless.

My storage domain is local storage on a filesystem backed by VDO/LVM. It seems 4.3.6 has added support for 4k storage.
My VDO does not have the 'emulate512' flag set.

This configuration is not supported before 4.3.6. Various operations may fail when
reading or writing to storage.
I was not aware of this when I set it up as I did not expect this to influence a setup where oVirt uses local storage (a file system location).

4.3.6 detects storage block size, creates compatible storage domain metadata, and
consider the block size when accessing storage.
 
I've tried downgrading all packages on the host to the previous versions (with ioprocess 1.2), but this does not seem to make any difference.

Downgrading should solve your issue, but without any logs we only guess.

I was able to work around my issue by downgrading to ioprocess 1.1 (and vdsm-4.30.24). Downgrading to only 1.2 did not solve my issue. With ioprocess downgraded to 1.1, I did not have to downgrade the engine (still on 4.3.6). 

ioprocess 1.1. is not recommended, you really want to use 1.3.0.

I think I now have a better understanding what happened that triggered this.

During a nightly yum-cron, the ioprocess and vdsm packages on the host were upgraded to 1.3 and vdsm 4.30.33. At this point, the engine log started to log:

2019-09-27 03:40:27,472+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engine-Thread-384418) [695f38cc] Executing with domain map: {6bdf1a0d-274b-4195-8f
f5-a5c002ea1a77=active}
2019-09-27 03:40:27,646+02 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engine-Thread-384418) [695f38cc] Unexpected return value: Status [code=348, message=Block size does not match storage block size: 'block_size=512, storage_block_size=4096']

This means that when activating the storage domain, vdsm detected that the storage block size
is 4k, but the domain metadata reports block size of 512.

This combination may partly work for localfs domain since we don't use sanlock with local storage,
and vdsm does not use direct I/O when writing to storage, and always use 4k block size when
reading metadata from storage.

Note that with older ovirt-imageio < 1.5.2, image uploads and downloads may fail when using 4k storage.
in recent ovirt-imageio we detect and use the correct block size.

2019-09-27 03:40:27,646+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engine-Thread-384418) [695f38cc] FINISH, ConnectStoragePoolVDSCommand, return: , log id: 483c7a17

I did not notice at first that this was a storage related issue and assumed it may get resolved by also upgrading the engine. So in the morning I upgraded the engine to 4.3.6 but this did not resolve my issue.

I then found the above error in the engine log. In the release notes of 4.3.6 I read about the 4k support.

I then downgraded ioprocess (and vdsm) to ioprocess 1.2 but that did also not solve my issue. This is when I contacted the list with my question.

Afterwards I found in the ioprocess rpm changelog that (partial?) 4k support was also in 1.2. I kept on downgrading until I got ioprocess 1.1 (without 4k support) and at this point I could re-attach my storage domain.

You mention above that 4.3.6 will detect the block size and configure the metadata on the storage domain? I've checked the dom_md/metadata file and it shows:

ALIGNMENT=1048576
BLOCK_SIZE=512
CLASS=Data
DESCRIPTION=studvirt1-Local
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
MASTER_VERSION=1
POOL_DESCRIPTION=studvirt1-Local
POOL_DOMAINS=6bdf1a0d-274b-4195-8ff5-a5c002ea1a77:Active
POOL_SPM_ID=-1
POOL_SPM_LVER=-1
POOL_UUID=085f02e8-c3b4-4cef-a35c-e357a86eec0c
REMOTE_PATH=/data/images
ROLE=Master
SDUUID=6bdf1a0d-274b-4195-8ff5-a5c002ea1a77
TYPE=LOCALFS
VERSION=5
_SHA_CKSUM=9dde06bbc9f2316efc141565738ff32037b1ff66

So you have a v5 localfs storage domain - because we don't use leases, this domain should work
with 4.3.6 if you modify this line in the domain metadata.

    BLOCK_SIZE=4096

To modify the line, you have to delete the checksum:

    _SHA_CKSUM=9dde06bbc9f2316efc141565738ff32037b1ff66

I assume that at this point it works because ioprocess 1.1 does not report the block size to the engine (as it doesn't support this option?)?

I think it works because ioprocess 1.1 has a bug when it does not use direct I/O when writing
files. This full vdsm to believe you have block size of 512 bytes.

Can I update the storage domain metadata manually to report 4096 instead?

I also noticed that the storage_domain_static table has the block_size stored. Should I update this field at the same time as I update the metadata file?

Yes, I think it should work.

If the engine log and database dump is still needed to better understand the issue, I will send it on Monday.

Engine reports the block size reported by vdsm. Once we get the system up with your 4k storage domain,
we can check that engine reports the right value and update it if needed.
 
I think what you should do is:

1. Backup storage domain metadata
/path/to/domain/domain-uuid/dom_md

2. Deactivate the storage domain (from engine)

3. Edit the metadata file:
- change BLOCK_SIZE to 4096
- delete the checksum line (_SHA_CKSUM=9dde06bbc9f2316efc141565738ff32037b1ff66)

4. Activate the domain

With vdsm < 4.3.6, the domain should be active, ignoring the block size.

5. Upgrade back to 4.3.6

The system should detect the block size and work normally.

6. File ovirt bug for this issue

We need at least to document the way to fix the storage domain manually.

We also should consider checking storage domain metadata during upgrades. I think it will be
a better experience if upgrade will fail and you have a working system with older version.
Should I also downgrade the engine to 4.3.5 to get this to work again. I expected the downgrade of the host to be sufficient.

As an alternative I guess I could enable the emulate512 flag on VDO but I can not find how to do this on an existing VDO volume. Is this possible?

Please share more data so we can understand the failure:

- complete vdsm log showing the failure to activate the domain
  - with 4.3.6
  - with 4.3.5 (after you downgraded
- contents of /rhev/data-center/mnt/_<domaindir>/domain-uuid/dom_md/metadata
  (assuming your local domain mount is /domaindir)  
- engine db dump

Nir
 

Regards,
Rik


On 9/26/19 4:58 PM, Sandro Bonazzola wrote:

The oVirt Project is pleased to announce the general availability of oVirt 4.3.6 as of September 26th, 2019.

 

This update is the sixth in a series of stabilization updates to the 4.3 series.

 

This release is available now on x86_64 architecture for:

* Red Hat Enterprise Linux 7.7 or later (but < 8)

* CentOS Linux (or similar) 7.7 or later (but < 8)

 

This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for:

* Red Hat Enterprise Linux 7.7 or later (but < 8)

* CentOS Linux (or similar) 7.7 or later (but < 8)

* oVirt Node 4.3 (available for x86_64 only)

 

Due to Fedora 28 being now at end of life this release is missing experimental tech preview for x86_64 and s390x architectures for Fedora 28.

We are working on Fedora 29 and 30 support and we may re-introduce experimental support for Fedora in next release.

 

See the release notes [1] for installation / upgrade instructions and a list of new features and bugs fixed.

 

Notes:

- oVirt Appliance is already available

- oVirt Node is already available[2]


oVirt Node and Appliance have been updated including:

- oVirt 4.3.6: http://www.ovirt.org/release/4.3.6/

- Wildfly 17.0.1: https://wildfly.org/news/2019/07/07/WildFly-1701-Released/

- Latest CentOS 7.7 updates including:

- latest CentOS Virt and Storage SIG updates:


 

Given the amount of security fixes provided by this release, upgrade is recommended as soon as practical.


Additional Resources:

* Read more about the oVirt 4.3.6 release highlights:http://www.ovirt.org/release/4.3.6/

* Get more oVirt Project updates on Twitter: https://twitter.com/ovirt

* Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/

 

[1] http://www.ovirt.org/release/4.3.6/

[2] http://resources.ovirt.org/pub/ovirt-4.3/iso/


--

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AY66CEQHHYOVBWAQQYYSPEG5DXEIUAAT/


-- 
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/JPIYWV2OUNLNHUY6EU7YZI2RYFW2SW5L/
-- 
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>