Engineering

Porting Tag Inspector from Python 2 to Python 3

Estimated Reading Time: 4 minutes

Recently, our engineering team completed a migration of our proprietary tag auditing and management platform from Python 2 to Python 3. This upgrade brings with it a host of positive changes to Tag Inspector that will seek to improve performance and reduce bugs. Most notably, switching to Python 3 will future-proof our application’s security and stability.

Earlier this year, the Python Software Foundation announced that Python 2 will reach end-of-life on Jan. 1, 2020. This simply means that moving forward, Python 2 will no longer receive updates from its developers.

Porting Challenges

Porting from Python 2 to Python 3 is no easy task. This is what, historically, has kept Python 2 alive. Python is well known for its vast array of third-party libraries; these libraries must also then be ported into the new codebase. This usually presents one of two challenges.

On one hand, if the team behind the library hasn’t upgraded it to Python 3, the burden of that task falls onto the shoulders of our developers. On the other hand, teams who do port to Python 3 will often take the opportunity to update their framework in such a way as to break existing code that implements their library. Existing programs must be rewritten to fit the new library’s language. Even a single added character in a function name can—and will—grind an application to a halt. Our engineers were able to tackle and overcome both of these challenges.

Unicode Enhancements

Python 3 also brings with it code-wide Unicode enhancements. What this means is that text in Python 3 is now stored as Unicode rather than ASCII, the old default. As a web scraping application, Tag Inspector will encounter a lot of non-English Unicode characters that cannot be encoded to ASCII. Normally, trying to encode these characters into the proper character set would often throw errors since you won’t always know what character set your application is reading—a fact that is particularly true for web applications.

This is quite possibly the best new feature for our purposes, yet it did come with its own challenges. Like I mentioned before, a change of this magnitude requires an extensive rewrite of much of the code. While Python 3 introduces text as Unicode, it also supports a different datatype (simply called bytes, or bytestring). This datatype is comprised of fully encoded text—the actual bits that make up the string. Python 2 was more-or-less the opposite: normal strings were encoded and Unicode had its own special string datatype. As you can guess, this presented an issue with the migration.

How can you easily tell which datatype any given string is in the old codebase? Luckily, Python 3 ships with a fantastic debugging tool: the -b command line option. Passing this flag when running your script lets you know exactly where you are trying to compare strings to bytes. This makes it simple to hunt down which line in your script is expecting a different datatype than it is being given. The -b flag gives developers an obvious leg up when navigating the new string datatypes.

Set Up for Success

In the end, although migrating to Python 3 is no easy task, our engineering team was able to successfully complete the project and deliver a wonderful and exciting new implementation. This is the same Tag Inspector that you are used to but faster, less buggy, and ready for anything the future may bring.

Now Hiring Software Engineers

Interested in working on our Tag Inspector proprietary software?

Go To InfoTrust Careers Page

Originally Published On October 31, 2019

Andy Bengel

October 8, 2020

Engineering

Porting Tag Inspector from Python 2 to Python 3

Porting Challenges

Unicode Enhancements

Set Up for Success

Now Hiring Software Engineers

About

Resources

Contact/Support

Careers