programming, python

Django i18n without Unicode issues

There are hundreds of good tutorials written about how to add i18n to a Django application. Some of them are straightforward to follow and get the translations working quickly.

I have read tutorials for adding German, Portuguese, Chinese, and Spanish. In all of them, the special characters are used directly in the translation files without any issues, in my case that didn’t work out at first.

I couldn’t use the special Spanish characters directly in the translation files. I was having always an error similar to “‘ascii’ codec can’t decode byte 0xc3 in position 18: ordinal not in range(128)“.

Are you having issues with accented or special characters in your translation files? Cannot you show characters like ß, ü, or “你好”? Let me help you with the problem.


My Project Setup

I have a Django project with multiple applications, and for each application, I had to add its translation files. Because of the numerous templates, views, and APIs, I wanted to translate one application per time. Test the translations slowly for every feature and jump to the next one.

When I read the tutorials, I didn’t generate the .po files or the folder structure automatically. I tried, but the generation was giving me some issues, and because of my approach, I decided to create the folder structure and the files by hand.

If you look at the tutorials carefully, they usually create the files using makemessages. This utility creates the folder structure in all the applications and looks for the keys that need to be added depending on your application.

The use is pretty simple, execute the below command with the language you would like to add, in this case, German.

django-admin makemessages -l de -i venv

And it will create the folder structure for all the applications, together with all the keys that need to be translated. The result will look like this:

Folders for i18n in Django

English and Spanish are present, and their translation files are compiled in this case. After the execution of the above command, the structure for the German language was created. The new file and folders are the ones within the highlighted yellow square.

My step-by-step technique

As said before, my project has multiple applications, concretely eight applications. It is a side project I use for myself, but it is still not publically available for other users. I set up the internationalization and started with the home page. I added the files only for the main application, translated the templates to Spanish, and then jumped to the views.

In the templates, I use trans, so it is pretty simple to have it working. Just the i18n library at the top of the template and start creating your translations:

{% extends 'base.html' %}

{% load i18n %}

...

<h1 class="display-5 fw-bold">{% trans "bukios_home_welcome_title" %}</h1>
...

Don’t forget to add the load i18n in every template where you use the translation functions. It is not imported from other templates, although you extend from a base template or include them.

After I added a few keys, I tested the feature in isolation, without looking at texts from other applications or features. One feature translation and application at a time. The command that you will need to have the translations working in your templates and not seeing the keys is:

django-admin compilemessages -i venv

This allows you to have both languages working, but how? I moved the English texts to the translation file, checked it worked, and added the Spanish translation. Two steps, one seeing that everything continued working in English and a second step checking that all the things appeared in Spanish.

First Issue: the templates

In the templates, I found my first problem. The Spanish special characters like the accented vocals were not working using them directly. I looked into the different tutorials and although everybody was using them directly in the .po files, it didn’t work for me. Each time that I added an accented vocal or other Spanish special characters, the compilation worked but when I tried to look into the web to see the changes I got an error similar to this one:

Unicode error in Django

The hint the above error:

Unicode issue with Spanish characters

a Unicode error with some text, probably something like “…etición in…”

After looking a bit I found that “ó” was not working but “&oacute;” was. So, I decided to encode all the special characters. It was not a big deal for Spanish, but it may be impossible for Chinese.

Second issue: the messages

After I finished with the templates, I continued with views.py. They don’t use the same function, there you use gettext(…) or similar.

But it took me some time to figure out that gettext was not able to translate &aacute; to á, it was not able to decode the HTML codes in the translation files.

The solution, or how to set the charset

The problem is clear, but let me briefly summarize it. In Spanish, German, or other languages the special characters are not shown as expected in the views. The HTML code encoding is not a nice or right solution. The gettext function does not decode the HTML codes at all, they are not solving partially your problem. Why partially? because for example I tried something like ‘你好’ (that means ‘hello’) and it didn’t show up either.

The problem is not happening at the compilation time, it is happening at the use time. In the browser, you will see something similar to the below screenshot.

Unicode error in Django, complete stacktrace

The deep analysis was giving me some insides, the Unicode was wrong at some point. I tried at the application level, at the HTML level, and at the translation files level. I failed at all the levels with all my changes. My setup was correct, the same as in every tutorial, with no differences. Almost, no differences.

How did I find out the error? After reading, testing, and failing so many times around the Unicode, the encoding and decoding of the gettext function, I decided to debug the library and find out how it worked, where it was failing, and what was the code doing at that level.

The clue

I put a breakpoint at line 458 in the gettext.py file, stop the application there and execute some tests when the key had some accented vocals

/usr/local/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gettext.py
Debugging gettext library

The key was the charset value, debugging I found out that it was ‘ascii’, not UTF-8 as I was expecting and as I was setting up in a lot of places. The value that I was adding after reading some StackOverflow similar cases, was not properly coming here. I tried with UTF-8.

Eureka! It worked!

At this point, I had some success, but still no solution.

The solution

I added UTF-8 in multiple ways and in multiple files, but nothing looked working. I analyzed the tutorials one by one, reviewing again the setup, the views, the imports, using ugettext (deprecated and removed in Django 3), u’your text’, …

Nothing worked.

I took the official documentation, and read it from the beginning, carefully, but I wasn’t able to find how to fix it.

At this point, I had created the English files, and the Spanish files for the complete application all the translations worked fine for the templates, but not the translations coming from the views using gettext. Still, I wasn’t translating the javascript files, but those use gettext too. They could wait until I fixed the issue.

And then, I found a post on medium, Giorgi was doing the same as others, he was using English, Russian, and Georgian. But there was one thing different, he showed a .po file with something like a header, something similar to the below code.

msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2020-01-28 14:45+0400\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n!=1);\n"

Look at the Content-Type line, there is a charset set to UTF-8 there.

I tried and the magic happens. The accented characters, the opening of the exclamations, our famous ñ,… all of them started to show up.

The key is the line where the charset is set, I tested removing the others and the translation issue still worked.

Conclusion

Look for a tutorial that fits well for you and start setting up i18n in Django. Also get the documentation close to you and read it. My recommendation is that you should use the special utility libraries from Django at the beginning, until you learn all the details, which will avoid you problems. As you have read, I have described you how to set up the charset to UTF-8, for your translation files and how to dive deep to understand what is going on in the libraries.

In my case, not using the makemessages utility at the beginning, letting the library to set those values automatically was the issue. More if you think about it, that in all the tutorials I was reading, they clean up the files and remove these key values completely. In some tutorials, you are able to see some dots (‘…’), but it is difficult to figure out what they mean.

Remember to add the charset in the way that I have described. It was not easy to find out how to add it to the translation files.

There are more values that I have kept for now, I will fine-tune them later as soon as I see the impact that some of them will have in the different libraries. For the time being, they are like documentation of what it is possible to set up in the translation files.