Click here to download new go-Translate .

Internationalisation

Internationalisation is referred as "i18n", where 18 is the number of letters between 'i' and 'n'. Internationalisation is the process of designing and developing a system which can be easily adapted into a variety of languages, cultures and customs. Internationalitation includes:

  • Enabling the code to use Unicode.
  • Enabling the code to support local, regional, language, or culturally related preferences, which typically includes date and time formats, local calendars, number formats and numeral systems, sorting and presentation of lists, handling of personal names and forms of address
  • Proper handling of the concatenation of strings, avoiding hard coded / dependence in code of UI string values, etc.
  • Adding markup in your DTD to support bidirectional text, identify, language, or adding to CSS support for vertical text or other non-Latin typographic features.
  • Separating localisable elements from source code or content, so that user can select the same based on his preferences

Localisation -- or "L10N"

Localisation sometimes referred as "l10n", where 10 is the number of letters between 'l' and 'n'. Localisation refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific region (a "locale"). Though localisation is synonymous to translation of the UI and documentation, but is more complex covering:

  • Numeric, date and time formats
  • Use of currency
  • Keyboard usage
  • Collation and sorting
  • Symbols, icons and colors

Text and graphics containing references to objects, actions or ideas which, in a given culture (cultural issues are more serious as it may be subject to misinterpretation).

Localisation is often treated as nothing more than "high-tech translation", but this view does not capture its importance, its complexity, or what actually takes place during localisation. It also hides the fact that localisation must be integrated with other business processes if it is to be effective. Localisation is an integral part of globalisation, and without it, globalisation efforts are ineffective. So what exactly is localisation if it isn‟t simply translation? Localisation is the process of modifying products or services to account for differences in distinct markets.

Localisation commonly addresses the following issues:

  • Linguistics Issues
  • Physical Issues
  • Business & Cultural Issues
  • Technical Issues

Localisation is crucial

Earlier software was designed to support only one language. With global markets opening up, there is a requirement for supporting multiple languages and diverse cultures. Each country or region needed its own version of software. In such scenario no option but to Re-develop the entire software / product for different languages (multiple versions, versioning, tracking, issues, global changes) Translate the software in desired languages. The above has fallouts: in first case it is repetition of the coding work & in second case since all the menus, help files, dialog boxes, labels were hard coded required efforts. Also since the architecture was not initially meant for supporting multiple languages couple of issues were there and in such case architectural changes were required.

Language is a primary factor when it comes building relationship with a clients / users. If you want to be in business, then speak the language which the client / user is comfortable. The end user feels homely and feels secured if the information is made available to him in his own language. Localisation also opens up a huge Business opportunity.

  • Localisation will catalyze the process of e- Governance
  • Crucial in Indian context since there are 22 scheduled languages (with alternate scripts)
  • Necessary to have same information available in various languages so that the user feels homely.
  • SIM (Simultaneous) Release of information in various languages
  • The MMPs will have greater reach if they are localized.

Challenges in Localisation

  • Volume, last minute localisation, SIM (Simultaneous) release
  • Non-availability of skilled manpower, especially in the field of language translation
  • Scalable Standardized tools / technologies for the entire workflow
  • Standard term Banks – especially for e-governance applications, cross language terms.
  • Non-availability of automated "Localisation Projects Management Framework"

Workflow software can greatly streamline the localisation process by managing the sequence of review/edit tasks, providing the status of tasks and processes, and notifying participants of changes in state, new work, or other information

To provide uniformity to access data in native languages following standards are mandatory in application development:

Unicode

The Unicode Consortium is a non-profit organisation devoted to developing, maintaining, promoting software internationalisation standards and data, particularly the Unicode Standard, which specifies the representation of text in all modern softwares, products and standards. The Unicode Consortium actively develops standards in the area of Internationalisation including defining the behavior and relationships between Unicode characters. The Consortium works closely with W3C and ISO—in particular with ISO/IEC/JTC 1/SC2/WG2, which is responsible for maintaining ISO/IEC 10646, the International Standard synchronized with the Unicode Standard.

The latest electronic version of the Unicode Standard is Version 6.2. This is a consolidated version of the standard, incorporating all changes into the full text. The Unicode 6.2 Documentation.

The Unicode Character Standard primarily encodes scripts rather than languages. That is, where more than one language shares a set of symbols that have a historically related derivation, the union of the set of symbols of each such language is unified into a single collection identified as a single script. These collections of symbols (i.e. scripts) then serve as inventories of symbols which are drawn upon to write particular languages. In many cases, a single script may serve to write tens or even hundreds of languages.

The Unicode Consortium also has publications which include Unicode Standard with its Annexes, Character,Unicode Technical Standards and Reports.

For more information please refer Unicode website.

W3C and WCAG

The World Wide Web Consortium (W3C) is an international community that develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential as a forum for information, commerce, communication, and collective understanding. W3C was created to ensure compatibility and agreement among industry members in the adoption of new standards.

W3C India Office is a full voting member of W3C and is Advisory Committee Representative. It proposes to work in close collaboration with all stake holders of academia, Govt., Industry and Industry Associations.

For more information please refer W3C- India Office website.

CLDR

The Common Locale Data Repository (CLDR) that provides key building block for softwares to support all the languages of the world, is maintained by UNICODE consortium. http://www.unicode.org/cldr

CLDR is the largest and most extensive standard repository of locale data. Its goal is to gather basic linguistic information for various “locales”, essentially combinations of a language and a location, so that this data will be used for software Internationalisation and Localisation. This is achieved by adapting a software to the conventions of different languages for common software tasks such as formatting of dates, times, time zones, numbers and currency values; sorting text; choosing languages or countries by name and many others. The basic lists that CLDR gathers are:

1. Dates formats
2. Time Zones
3. Number formats
4. Currency formats
5. Measurement System
6. Collation Specification

  • Sorting
  • Searching
  • Matching

7. Translation of names for language, territory, script, time zones, currencies
8. Script and characters used by a language.

Advantage of using CLDR

1. With the development of standards in the format of date, time, currency, measurement etc., any type of misinterpretation of data gets avoided.
2. The information sharing is effortless among various industries with CLDR.
3. CLDR provides the critical advantage of consistency.
4. Since CLDR provides data for free, this reduces significantly the cost of a project.
5. It makes resources available to anyone, at no cost.
6. CLDR has initiated the development of unique data representation of technologies with respect to the business requirements of the user communities.

FUEL

Frequently Used Entries for Localisation (FUEL) is an open source project, promoted by C-DAC, GIST and Red Hat. This is basically a list of frequently used words in various open source softwares such as Libre Office, Mozilla Firefox, Thunderbird and others.

FUEL aims at solving the problem of inconsistency and lack of standardisation in software translation across the platforms. It works to provide a standardized and consistent experience of computer interfaces to the language communities. FUEL works to create linguistic and technical resources like standardized terminologies, computer translation style and convention guides and assessment methodologies. The FUEL initiative is unique. It is a set of steps any content generating organisation or a team involved in creating localized content can undertake and adopt to ensure consistently highly quality. Though it is place for linguistic resources, the FUEL approach of creating linguistic resources is not any different from any software development. FUEL has a version control system allowing evolution of development, a bug tracker, a ticketing system and a mailing list. It is of modular nature and concentrates on base registers. This feature make FUEL citizen-centric and so FUEL has great potential to be an ideal solution even for e-Governance applications. Collaborative innovation is the most important aspect in FUEL.