[ovirt-devel] Re: Why filetransaction needs to encode the content to utf-8?

23 Jul 2019

      On Tue, Jul 23, 2019 at 1:11 PM Yedidyah Bar David <didi@redhat.com> wrote:
...
Hi Nir and all,
In [1] you added line 151, to encode the contents to utf-8. Do you
remember why you needed that? What happens if I remove this line?
I added it because without it installation on Fedora 28 was broken. I don't
remember
what was the error but probably the content was unicode instead of bytes.

I am working on [2]. It fails on that line, because the current
...
content, if organization name is unicode, has a UTF-8 encoded string
already, but is a python str (not unicode).
If the content is not unicode, why it goes to the unicode branch?

        if binary:
            self._content = content
        else:
            if isinstance(content, list) or isinstance(content, tuple):
                self._content = '\n'.join([common.toStr(i) for i in
content])
                if content:
                    self._content += '\n'
            else:
                self._content = common.toStr(content)
                if not self._content.endswith('\n'):
                    self._content += '\n'
            self._content = self._content.encode("utf-8")

It looks like the binary flag value is wrong, or common.toStr() is encoding
the values to bytes.

The common error in python 2 is:
...
...
...
text = u"\u05d0"
encoded = text.encode("utf-8")
encoded.encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0:
ordinal not in range(128)
But:
...
...
...
text = u"ascii"
encoded = text.encode("utf-8")
encoded.encode("utf-8")
'ascii'
Which seems to be the case in [2].

So if common.toStr() is encoding the values, there is no need to encode the
values
again in line 151.

Tried patching otopi [3],
...
did a few attempts (some of them also pushed there, check the
different patchsets), but none worked. So I am going to patch
postinstall file generation instead [4], but I don't like this.
Any hints are welcome. Thanks and best regards,
[1] https://gerrit.ovirt.org/#/c/92435/1/src/otopi/filetransaction.py
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1729511
[3] https://gerrit.ovirt.org/102085
I don't understand what are you trying to do there, however codecs.open()
is not needed
to read utf-8 data.

    with open(path, "rb") as f:
        text = f.read().decode("utf-8")
...
[4] https://gerrit.ovirt.org/102089
I agree that this is not the way.

Nir

[ovirt-devel] Re: Why filetransaction needs to encode the content to utf-8?

Nir Soffer