Statistics regarding localized text

Here are some data that I gathered over time regarding localized text. I think you may find them useful.

These are for English and are referring to application strings (non-documentation):

  • Average words/string: 3.6 words
  • Average characters/string: 22 characters
  • Most characters/strings: <1000

Text size expansion when translated

These data were published by IBM, in their National Language Design Guide Volume 1:

Characters in English

Average expansion

<10

200-300%

11-20

180-200%

21-30

160-280%

31-50

140-160%

51-70

130-140%

Over 70

150%

You may find this very useful for UI strings because this would allow you to do pseudo-localization with strings length expanding in order to evaluate how your interface will behave.

From my experience I encountered the biggest expansions on Russian language, followed by French. If you rely on word-wrap remember that there are languages with very long strings, German being one of them.

Translation metrics

  • IT/documentation – ~2000-2500 words/day (considering 8h workday)
  • Law/licensing – 50% speed decrease
  • On easy text you can boost the productivity a lot, reaching even 5000 words/day.
Resources

Newbie guide to Unicode

I compiled a small list of things that any developer should know about Unicode. (more…)

Basic Unicode readiness testing for your application

Unicode is a very complex thing that is always evolving but this doesn’t mean that you shouldn’t do some basic testing in order to uncover hidden bugs. (more…)

Examples of internationalizing keyboard shortcuts

This article tries to complete a previous article regarding internationalization of keyboard shortcuts. I decided to analyze few cases in order to explain better how we should define keyboard shortcuts in a way that will keep them working on most international keyboards. (more…)

Using the proper language codes

How to choose the proper language codes when localizing?

If you localize only for the macro-language use the macro language code. If you have only one English use just “en” code but if you have more than one “en” will mean “en-US” and you will have to add more detailed codes like “en-CA”. Unicode website gives more details in picking the right language code article. (more…)

Using Unicode console output with Python

On Windows console and Unicode are not quite friends. Here is some code that I use in order to assure that my Python scripts output is consistent on all platforms and is supporting Unicode encoded as UTF-8. (more…)

Optimal Windows keyboard settings for Romanian programmers

Here is how you should configure you keyboard layouts in order to be a able to write code, English and Romanian text as easy as possible. (more…)

Google Groups translations not passing even the simplest spell check test

I’m a strong admirer of Google business model but I keep wondering how did they managed to obtain such a huge number of typos inside their translation to Romanian language. I don’t have inside information about the aproaches they made in order to localize Google Groups but I have serious doubts about having any kind of quality check. (more…)

Tip #1 – Altered English dictionary

I decided to start a series of Tips&Tricks posts that anyone can use for fast 18n bug finding in their application. As you can know the speed is really important when it comes on solving bugs.

What is an altered English dictionary? it’s a virtual translation of your English translatable strings to English but using characters from other languages that are similar to the ones from English…. shortly it’s a find-and-replace by character. (more…)

Using msdev resource editor to create Unicode dialogs that will not display well

I was surprised to discover that creating Unicode UI elements in the resource editor of Microsoft Visual Studio 2008 (SP1) does work, BUT when you compile your application they will fail to display well. (more…)