[ovirt-devel] Re: unicode_literrals vs "u''" vs six.text_type

1 Sep 2019


      On Sun, Sep 1, 2019 at 2:34 PM Yedidyah Bar David <didi@redhat.com> wrote:
...
On Sun, Sep 1, 2019 at 1:20 PM Amit Bawer <abawer@redhat.com> wrote:
...
On Sun, Sep 1, 2019 at 10:28 AM Yedidyah Bar David <didi@redhat.com>
wrote:
...
...
Hi all,
That's a "sub-thread" of "unicode sandwich in otopi/engine-setup".
I was recommended to use 'six.text_type() over "u''". I did read [1],
and eventually decided that my own preference is to just add "u"
prefix. Reasoning is inside [1].
Do people have different preferences/reasoning they want to share?
Do people think we should have project-wide policy re this?
Since our code is currently transitioning from py2 to py2/py3, and not
from py3 to py3/py2, it would be fair to assume that most
already existing string literals in it contain ascii symbols, unless
explicitly stated otherwise;
so IMO it would only make sense to enforce 'u' over newly added literals
which involve non-ascii symbols as long as py2 is still alive.
Not exactly.
Suppose (mostly correctly) that the code didn't employ the "unicode
sandwich" technique so far. Meaning, much was handled as python2 str
objects containing utf-8-encoded strings, and converted to unicode
objects mainly as needed/noted/considered. Suppose that x is a
variable that used to contain such an str, usually ascii-only, but
sometimes perhaps utf-8. Now, this:
'x: {}'.format(x)
would work, and replace {} with the contents of x, and return a
python2 str, utf-8-encoded if x is utf-8. But if now x contains a
unicode object (because we decided to follow the sandwich approach,
and encode all utf-8 during input), it would fail, if x is not
ascii-only. Adding u to 'x: {}' solves this.
utf-8 is an ascii extension, meaning that first 128 ordinals agree for both
encodings, so unicode sandwich has no negative effect on your example.
It would be only a problem only if input for x originally had a non-ascii
character in it, but that should have been an issue for py2 in the first
place, regardless to py3 sandwiches.
...
So I have to handle also all existing such literals, at least those
that would now require handling unicode vars.
...
...
Personally, I do not see the big advantage of adding "six.text_type()"
(15 chars) instead of a single "u". I do see where it can be useful,
but not as a very long replacement, IMO, for "u", or for
unicode_literals.
Once py2 will be officially terminated, probably neither option
...
however IMO for literals it seems that an explicit 'u' is a more native
approach, and provides clarity about the intentions of the programmer
compared
to a global switch button in the form of import unicode_literals. Using
six.text_type() is probably a good solution nowadays for variables and not
mentioned above would be meaningful as unicode is py3's default string
encoding;
literals,
...
and would probably have to die off some day after py2 does the same.
...
Thanks and best regards,
[1] http://python-future.org/unicode_literals.html
--
Didi
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
...
List Archives:
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/SW3P4VOGBP43N5...
--
Didi