Lately, I’ve been porting BitPacket to Python 3.0. I wanted to keep backwards compatibility with Python 2.6 (which is the 2.x I have in my Debian) and, thankfully, I only had to fix three minor issues:
- Unicode strings
- Dictionary keys
- Bytes vs. strings
StringIO and unicode strings
If you have ever used the StringIO module you should be familiar with this:
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
In Py3k the StringIO is located under the io package, so you should changed the above by:
from io import StringIO
which is also compatible with Python 2.6.
Once I did the change my code only worked in Py3k, Python 2.6 complained when trying to use the write method with a simple string:
>>> from io import StringIO
>>> stream = StringIO()
>>> stream.write("test")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.6/io.py", line 1515, in write
s.__class__.__name__)
TypeError: can't write str to text stream
You should note that in Py3k all the strings are unicode strings by default. This is not true in Python 2.6, so my first approach was the following:
>>> stream.write(u"test")
4
Unfortunately, this only worked in Python 2.6. Py3k does not recognize the unicode prefix “u“, giving you this error:
>>> stream.write(u"test")
File "", line 1
stream.write(u"test")
^
SyntaxError: invalid syntax
I googled a bit and find out a Making code compatible with Python 2 and 3 post (from the guy that finished all the SICP exercices) where it explained some similar issues, so I came up with this solution:
def u_str(string):
if sys.hexversion >= 0x03000000:
return string
else:
return unicode(string)
>>> stream.write(u_str("test"))
4
In Py3k, unicode does not exist but as that line is never executed we don’t get any error.
Even that worked well, I was not very happy with it. It was too slow and I had to use the custom u_str function everywhere. So, I googled a bit more and I found a nice pycon 2009 talk about Python 3.0 compatibility. Finally, I had which I think is the best solution (for both speed and clearness):
try:
# This will raise an exception in Py3k, as unicode doesn't exist
str = unicode
except:
pass
So, instead of defining a new u_str function, the str type is re-defined as unicode for Python 2.6. Then, I only had to update all the strings in the code to use str:
>>> stream.write(str("test"))
4
Note: I put this code in a compatibility.py file and import it everywhere I need it.
Dictionary keys
The next problem was reported by the 2to3 tool that comes with Py3k.
- for k in field.keys():
+ for k in list(field.keys()):
Basically, it told me that the dictionary keys() method returns a view in Py3k not a list, so it needs to be converted to a list as explained here:
dict methods dict.keys(), dict.items() and dict.values() return “views” instead of lists. For example, this no longer works: k = d.keys(); k.sort(). Use k = sorted(d) instead (this works in Python 2.5 too and is just as efficient).
Bytes vs. strings
Finally, the last issue was about the difference between strings and bytes in Python 2.x and 3.0. In Python 2.x, bytes is just an alias for str:
>>> bytes
<type 'str'>
In Py3k, bytes and str are different classes and behave differently, see below:
>>> s = "AB"
>>> s[0]
'A'
>>> s[1]
'B'
>>> b = b"AB"
>>> b[0]
65
>>> b[1]
66
This means that one needs to take care of functions returning bytes (e.g. struct.pack) and the operations performed with the returned data, in my case a call to the ord function, that failed with the typical error message:
TypeError: ord() expected string of length 1, but int found
So, following the approaches mentioned above I added the following function to my compatibility.py:
def u_ord(c):
if sys.hexversion >= 0x03000000:
return c
else:
return ord(c)
which I used instead of the built-in ord in the struct.pack case.
Hope this helps to someone.
Happy hacking!