Posted by sorin on 2010-08-13
Here are some data that I gathered over time regarding localized text. I think you may find them useful.
These are for English and are referring to application strings (non-documentation):
- Average words/string: 3.6 words
- Average characters/string: 22 characters
- Most characters/strings: <1000
Text size expansion when translated
These data were published by IBM, in their National Language Design Guide Volume 1:
|
Characters in English
|
Average expansion
|
|
<10
|
200-300%
|
|
11-20
|
180-200%
|
|
21-30
|
160-280%
|
|
31-50
|
140-160%
|
|
51-70
|
130-140%
|
|
Over 70
|
150%
|
You may find this very useful for UI strings because this would allow you to do pseudo-localization with strings length expanding in order to evaluate how your interface will behave.
From my experience I encountered the biggest expansions on Russian language, followed by French. If you rely on word-wrap remember that there are languages with very long strings, German being one of them.
Translation metrics
- IT/documentation – ~2000-2500 words/day (considering 8h workday)
- Law/licensing – 50% speed decrease
- On easy text you can boost the productivity a lot, reaching even 5000 words/day.
Resources
Posted by sorin on 2010-08-12
I compiled a small list of things that any developer should know about Unicode. (more…)
Posted by sorin on 2010-04-29
Unicode is a very complex thing that is always evolving but this doesn’t mean that you shouldn’t do some basic testing in order to uncover hidden bugs. (more…)
Posted by sorin on 2010-04-16
This article tries to complete a previous article regarding internationalization of keyboard shortcuts. I decided to analyze few cases in order to explain better how we should define keyboard shortcuts in a way that will keep them working on most international keyboards. (more…)
Posted by sorin on 2010-04-09
How to choose the proper language codes when localizing?
If you localize only for the macro-language use the macro language code. If you have only one English use just “en” code but if you have more than one “en” will mean “en-US” and you will have to add more detailed codes like “en-CA”. Unicode website gives more details in picking the right language code article. (more…)
Posted by sorin on 2010-01-06
On Windows console and Unicode are not quite friends. Here is some code that I use in order to assure that my Python scripts output is consistent on all platforms and is supporting Unicode encoded as UTF-8. (more…)
Posted by sorin on 2009-08-14
Here is how you should configure you keyboard layouts in order to be a able to write code, English and Romanian text as easy as possible. (more…)
Posted by sorin on 2009-05-22
I’m a strong admirer of Google business model but I keep wondering how did they managed to obtain such a huge number of typos inside their translation to Romanian language. I don’t have inside information about the aproaches they made in order to localize Google Groups but I have serious doubts about having any kind of quality check. (more…)
Posted by sorin on 2009-04-22
I decided to start a series of Tips&Tricks posts that anyone can use for fast 18n bug finding in their application. As you can know the speed is really important when it comes on solving bugs.
What is an altered English dictionary? it’s a virtual translation of your English translatable strings to English but using characters from other languages that are similar to the ones from English…. shortly it’s a find-and-replace by character. (more…)
Posted by sorin on 2009-04-21
I was surprised to discover that creating Unicode UI elements in the resource editor of Microsoft Visual Studio 2008 (SP1) does work, BUT when you compile your application they will fail to display well. (more…)