Why filetransaction needs to encode the content to utf-8?

Hi Nir and all, In [1] you added line 151, to encode the contents to utf-8. Do you remember why you needed that? What happens if I remove this line? I am working on [2]. It fails on that line, because the current content, if organization name is unicode, has a UTF-8 encoded string already, but is a python str (not unicode). Tried patching otopi [3], did a few attempts (some of them also pushed there, check the different patchsets), but none worked. So I am going to patch postinstall file generation instead [4], but I don't like this. Any hints are welcome. Thanks and best regards, [1] https://gerrit.ovirt.org/#/c/92435/1/src/otopi/filetransaction.py [2] https://bugzilla.redhat.com/show_bug.cgi?id=1729511 [3] https://gerrit.ovirt.org/102085 [4] https://gerrit.ovirt.org/102089 -- Didi

Not sure if its helpful, but didn't see any other reply, so anyway: Change in [1] assumes Unicode Sandwich [5] where the given content is assumed to be already encoded into utf-8 string (unicode object in py2, str object in py3, six.text_type for both py versions) and then encoded back into utf-8 bytes when its time to write it back. In your case the certificate contents was read as plain str in python 2 which is by default assumed to have ASCII encoding, so the top bread slice of the sandwich was missing and it got the jam spilling (exception). Line of [1] is mostly for sake of python 3, where we cannot treat bytes and strs the same any more since default encoding for py3 strs is unicode and not ascii like py2. So if you remove [1] you'll probably make some problems for py3 (like comparing bytes with strings, TypeErrors, etc). [5] https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-a... On Tue, Jul 23, 2019 at 1:21 PM Yedidyah Bar David <didi@redhat.com> wrote:
Hi Nir and all,
In [1] you added line 151, to encode the contents to utf-8. Do you remember why you needed that? What happens if I remove this line?
I am working on [2]. It fails on that line, because the current content, if organization name is unicode, has a UTF-8 encoded string already, but is a python str (not unicode). Tried patching otopi [3], did a few attempts (some of them also pushed there, check the different patchsets), but none worked. So I am going to patch postinstall file generation instead [4], but I don't like this.
Any hints are welcome. Thanks and best regards,
[1] https://gerrit.ovirt.org/#/c/92435/1/src/otopi/filetransaction.py
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1729511
[3] https://gerrit.ovirt.org/102085
[4] https://gerrit.ovirt.org/102089 -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/3JZCK5POIBNEDI...

On Tue, Jul 23, 2019 at 1:11 PM Yedidyah Bar David <didi@redhat.com> wrote:
Hi Nir and all,
In [1] you added line 151, to encode the contents to utf-8. Do you remember why you needed that? What happens if I remove this line?
I added it because without it installation on Fedora 28 was broken. I don't remember what was the error but probably the content was unicode instead of bytes. I am working on [2]. It fails on that line, because the current
content, if organization name is unicode, has a UTF-8 encoded string already, but is a python str (not unicode).
If the content is not unicode, why it goes to the unicode branch? if binary: self._content = content else: if isinstance(content, list) or isinstance(content, tuple): self._content = '\n'.join([common.toStr(i) for i in content]) if content: self._content += '\n' else: self._content = common.toStr(content) if not self._content.endswith('\n'): self._content += '\n' self._content = self._content.encode("utf-8") It looks like the binary flag value is wrong, or common.toStr() is encoding the values to bytes. The common error in python 2 is:
text = u"\u05d0" encoded = text.encode("utf-8") encoded.encode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal not in range(128)
But:
text = u"ascii" encoded = text.encode("utf-8") encoded.encode("utf-8") 'ascii'
Which seems to be the case in [2]. So if common.toStr() is encoding the values, there is no need to encode the values again in line 151. Tried patching otopi [3],
did a few attempts (some of them also pushed there, check the different patchsets), but none worked. So I am going to patch postinstall file generation instead [4], but I don't like this.
Any hints are welcome. Thanks and best regards,
[1] https://gerrit.ovirt.org/#/c/92435/1/src/otopi/filetransaction.py
I don't understand what are you trying to do there, however codecs.open() is not needed to read utf-8 data. with open(path, "rb") as f: text = f.read().decode("utf-8")
I agree that this is not the way. Nir
participants (3)
-
Amit Bawer
-
Nir Soffer
-
Yedidyah Bar David