Translation – Frequently Asked Questions

Martin Post

About this FAQ

If you consider the process of translation as a “black box”, it is pretty much what you’d expect:

So far, so good.

But “under the hood”, more interesting things are happening in the modern translation workflow. “Interesting” here meaning: elements that will allow you, the client, to…

In this FAQ, we’ll take a closer look at several aspects of this modern translation workflow.

Terminology

The translation industry has its own terminology. Let’s make sure we talk about the same things. 🙂

Language Service Provider

A person or company providing language services such as writing, proofreading, and translation.

Translation Buyer

A person or company buying translation services.

Translator

A person translating content from one language to another

(Translation) Reviewer / Proofreader

A person reviewing (and probably editing) a translation created by a translator.

Translation Memory

A translation memory (TM) is a database that stores “segments”, which can be sentences, paragraphs or sentence-like units (headings, titles or elements in a list) that have previously been translated.

A translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Translation units are created and stored during the translation process and can then be reused later for other translation projects.

Individual words and their translations are not within the domain of TM; they are handled by terminology bases (see below).

Software programs that are used to maintain and edit translation memories are sometimes known as translation memory managers (TMM). Translation memories are typically used in conjunction with dedicated computer assisted translation (CAT) software.

Fuzzy Match

During translation, translation software will check the currently translated sentence / segment against the translation memory. If the translation memory contains a sentence not identical, but similar to that sentence, it will be presented to the translator as a “fuzzy” match. Usually, such a “fuzzy” match is highlighted in some way to point out to the translator that the suggested translation probably has to be adjusted before using it for another sentence.

Termbase

A termbase (a contraction of “terminology” and “database”) is a database consisting of concept-oriented terminological entries (or “concepts”) and related information, usually in multilingual format. Entries may include any of the following additional information:

  • a definition
  • source or context of the term
  • subject area, domain, or industry
  • grammatical information (verb, noun, etc.)
  • notes
  • usage label (figurative, American English, formal, etc.)
  • author / editor meta data (“created by / at” / “modified by” / at)
  • verification status (“verified” or “approved”)
  • an ID
TMX (File Format)

Translation Memory eXchange is an XML specification for the exchange of translation memory data between computer-aided translation and localization tools with little or no loss of critical data. TMX forms part of the Open Architecture for XML Authoring and Localization (OAXAL) reference architecture.

XLIFF (File Format)

XLIFF (XML Localization Interchange File Format) is an XML-based format created to standardize the way localizable data are passed between tools during a localization process and a common format for CAT tool files. XLIFF was standardized by OASIS in 2002. The specification is aimed at the localization industry. It specifies elements and attributes to store content extracted from various original file formats and its corresponding translation. The goal was to abstract the localization skills from the engineering skills related to specific formats such as HTML.

MXLIFF (File Format)

MXLIFF is a native format of the Phrase translation platform used for translation projects. It is multilingual and contains lots of Phrase-specific metadata that allow tracking a translation unit, its origin and modification dates, the changes from a pre-inserted machine translation or translation memory match etc.

TBX (File Format)

TermBase eXchange is an international standard (ISO 30042:2008) for the representation of structured concept-oriented terminological data. TBX is copublished by ISO and the Localization Industry Standards Association (LISA). It is currently available as an ISO standard and as an open, industry standard, available at no charge. TBX defines an XML format for the exchange of terminology data and can be considered “an industry standard for terminology exchange”.

Basics: Translation Concepts and Workflows(s)

What is Computer-Assisted Translation?

Some content in this section of the FAQ was lifted and adapted from the Wikipedia article on computer-assisted translation.

Computer-assisted translation or CAT is a form of language translation in which a human translator uses computer hardware and software to support and facilitate the translation process. It is not to be confused with machine translation.

The automatic machine translation systems available today are not able to produce high-quality translations unaided: Their output must be edited by a human to correct errors and improve the quality of translation. Computer-assisted translation (CAT) incorporates that manual editing stage into the software, making translation an interactive process between human and computer.

What Tools Are Used in Computer-Assisted Translation?

Computer-assisted translation is a broad term covering a range of tools. These can include:

What Are The Differences Between The Modern And The “Traditional” Translation Workflow?

What is the Traditional Translation Workflow?

The “traditional”, digital* translation workflow is as follows:

* We are not considering pre-computer, paper-based translation here.

The translator will…

Advantages of the Traditional Translation Workflow

Disadvantages of the Traditional Translation Workflow

What is the Modern Translation Workflow?

In summary, the modern translation workflow works as follows:

Documents are uploaded (from a local computer system) or “sideloaded” (from a document or content management system) into the translation environment. Here, translatable content is extracted and made available for translators, who connect to this central translation environment using either a web browser or a dedicated translation editor. All translation units are stored in the central translation memory and are made available immediately for other translators, so they can be reused in other documents, be proofread, quality-checked etc. When a translation is complete, the translated document is re-assembled on the server. The translator and/or client then can either download it or reimport it into their document or content management system.

In the modern translation workflow, the translation coordinator / project lead will…

  1. upload the document(s) to a translation platform, where the translatable content is extracted from each document
  2. run an analysis to determine if parts of the documents have been translated before
  3. assign the translation to one or multiple translators, who will be notified.

The translator will…

  1. log into the translation system / platform
  2. access the translatable content from the documents that were assigned to him
  3. translate the content, often using external resources such as one or more translation memories (TM), termbases and reference documents
  4. mark the translation project as completed.

Optionally, translations can be reviewed by one or more persons (reviewers). These could be other translators, product specialists or subject matter experts. In the review step, the reviewer will…

  1. log into the translation system / platform
  2. access the translated content
  3. edit translations, answer questions and add remarks.

In the final step, the translation coordinator / project lead will…

  1. run a quality check to ensure that all translations are complete and structurally correct
  2. download the translated documents.

Advantages of the Modern Translation Workflow

Disadvantages of the Modern Translation Workflow

Translator Productivity

How Many Words Can A Translator Typically Translate Per Day?

Of course, the number of words that a translator can translate per hour or day depends on several factors: project scope, topic, language complexity, topic complexity, terminology, general quality of the source text…

In general, a professional translator should be able to translate around 300 words per hour and 2000 words per day in good quality of he is familiar with the topic and terminology. Using high-quality machine translation engines and translation memories may speed up the process, but not indefinitely.

Machine Translation

What is Machine Translation?

Machine translation (MT) uses software to translate text or speech from one language to another. For more information, see the extensive Wikipedia article on machine translation.

The most commonly known machine translation engines are Google Translate, Microsoft Translator and DeepL.

What is Neural Machine Translation?

Neural machine translation (NMT) is an approach to machine translation that uses a large neural network. It departs from phrase-based statistical translation approaches. Google, Microsoft translation services and DeepL now use NMT. An open source neural machine translation system, OpenNMT, has been released by the Harvard NLP group.

NMT models use deep learning and representation learning.

Does Machine Translation Replace Human Translators?

The quality of machine translation has improved considerably over the last years. In certain text categories, modern machine translation engines will provide results that are indistinguishable from a translator’s work.

However, companies demanding correct, high-quality translations shouldn’t rely on unedited machine translations. The risk of publishing poorly worded content containing factual errors and embarrassing word choices usually outweighs the benefits (speed and savings).

In the modern translation workflow, translators will use machine translation as a tool: A segment (for example a sentence) will be sent to a machine translation engine. The machine translation of that segment will then be presented to the translator as a suggestion, which he can…

In this scenario, machine translation is a tool speeding up the translation process, not a replacement for the translator.

When a Language Service Provider Uses Machine Translation, Will the Translated Content Become Public?

Most commercial machine translation engines are available in two “flavors”:

in a free, publicly available version over a web interface, where content can be entered or pasted, and the translation is available over the same web interface. This includes Google’s public Translate page, the free DeepL Translator and the Bing Microsoft Translator.

Translatable text entered into publicly available machine translation engines usually becomes part of the training data corpus. In other words: Text entered here may not be made publicly available as such, but it becomes part of the vast data structures that are used to improve the quality of machine translation engines. If a person using the engine edits the translation result, these edits will also be used to improve translation quality.

For commercial Machine Translation Engines, the commercial/API-based version will usually keep the translated data private. For example, the DeepL Privacy page states:

“When using DeepL Pro, the texts you submit and their translations are never stored, and are used only insofar as it is necessary to create the translation.”

Pricing

Please note that while the previous sections of this FAQ describe common practices in the translation industry, the following section describes quoting and pricing as handled by / offered at 9to5 Media Services. These are still common practices, but other language service providers may have their own approach.

How Are Prices Determined?

Translation prices are based on the number of words in the source text.

Counting the number of words in the target text is not a common practice. It would encourage translators to create more verbose translations than necessary to increase the number of billable words.

Why Aren’t Prices Based On The Number Of Pages, Paragraph, Or Lines?

In traditional translation (especially for literature or news), criteria such as pages and lines are perfectly valid. Most pages in a book (for example a novel) will usually contain a certain amount of text (words) close to the full document’s average (1800 words is a usual number). And in printed news, there is a typical average number of words or characters per line.

However, these measurements make little sense in most modern document formats and online systems, where the number of characters and words can fluctuate. For example, modern web pages are usually built using responsive design principles / technologies, and the number of words can be 10 or 120 per “line”, based on screen size and user settings.

What Services Are Included In The Base Translation Word Rate?

When Is The Base Translation Word Rate Used?

The Base Translation Word Rate is normally used for the following document categories:

What Services Are Included In The Premium Translation Word Rate?

When Is The Premium Translation Word Rate Used?

The Premium Translation Word Rate is normally used for the following document categories:

How Does Quality Assurance Work In Computer-Assisted Translation?

Quality assurance modules of a computer-assisted translation system will check for various errors in the translation, such as:

Quality Assurance Criteria

Spelling

Spelling errors (terms not present in the user dictionary or termbase) are identified. Translators are encouraged to add terms both to their personal dictionaries and to the (Project / client) termbase to reduce the number of reported spelling errors.

Empty Target

The target segment of a translation unit should never be empty, because this would lead to “holes” not only in the translated text, but also in other documents based on it.

Punctuation Variations

A translator might forget to set a period at the end of a sentence, or add one where it isn’t required, for example in lists.

Inconsistent Translations: Identical Source, Different Target

If a segment appears more than once in a document, and the second and consecutive instances are translated differently, this will result in an error. For most technical and marking text categories, there should be a 1:1 relation between source segments and their translations. In other words: there should always be one translation for a given sentence / segment. Of course, there may be exceptions where a given context requires a different translation.

Inconsistent Translations: Identical Target, Different Source

If there are multiple identical sentences / segments in a translation based on different source segments, this will result in an error. There are obviously cases (for example small errors or variations in source segments) where having the same translation is acceptable. In other cases, it may indicate that a translator has ignored minor, but important differences between two similar source segments.

Missing Numbers

If there are numbers in a source segment, but not in the target segment, this will trigger an error. There may be language-specific conventions for writing out “small numbers” (e.g. up to 12), but in other cases (such as technical specifications), these errors could have severe consequences.

Repeated words

Repeated words (such as “the the”) usually constitute an error.

Multiple Spaces

In traditional word processing and layout applications and document formats, multiple spaces will be visible as such in the resulting document and should usually be avoided (the old convention of having two spaces after a period was US-specific to start with and has largely been abandoned in the world of word processing and high-quality, variable width fonts). In other formats (XML, HTML), multiple spaces will be ignored / displayed as one space.

Leading And Trailing Spaces

In traditional word processing and layout applications / document formats, leading and trailing spaces will become part of the translated document and be shown as such. In other formats (XML, HTML), leading and trailing spaces will be ignored. In general, leading and trailing spaces should not be part of translation units, as they could lead to unwanted indentation or content reflow.

Tags And Formatting

One of the main advantages of computer-assisted translation is the ability to extract translatable content (in the form of segments / translation units) and present segment-internal formatting and metadata such as bold type, italics, hyperlinks, index entries etc. in the form of abstract “tags”. In most translation applications, these are presented as colored boxes that can be moved or deleted, but not edited. The translator can place these tags in the target segment, for example to highlight words or sections that were also highlighted in the source segment.

If there are inconsistencies (such as too few or too many tags in the target segment), this will trigger an error. Tag errors can lead to fatal issues when the target document is created, as missing tags might break the internal structure or at least the aesthetic of the document. For example, even one missing “end bold formatting here” tag in a segment could lead to the rest of a hundred page document being formatted in bold type. Therefore, all tag and formatting errors need to be resolved.

Inconsistent Tag Content

If a tag (as described in the previous article) is placed, but the tag content is different from the tag in the source, this will trigger an error. However, different tag content may be perfectly valid – for example, when a hyperlink in the French translation should point to a different target than the hyperlink in the English source.

Terminology Errors: Required Terms Not Used

If the termbase for a project has an entry for a word, and the translation specified in this entry is not being used, this will trigger an error. This quality assurance mechanism helps to improve translation consistency (for example, by ensuring that “knob” is always translated as “Drehregler” and not “Regler”). On the other hand, many terms have ()multiple acceptable translations,(. em) so translators may have to ignore this type of error.

Terminology Errors: Forbidden Terms Used

If the termbase for a project has an entry for a word, and a translation is marked as forbidden, an error will be reported when this term is being used. This might also be used to highlight undesired variations of a company or product name.

Unresolved Comments

If a translator has entered a comment (usually a question) during translation, and the reviewer has not marked this comment as resolved, this will trigger an error.

Target Segment Length

For certain document types and user interface translations, it may be required to restrict the length of the translation. In that case, translations exceeding that length will trigger an error.

Fuzzy Translation Memory Matches With No Post-Editing

When a translator accepts a fuzzy match from the translation memory and confirms it without editing it, this will trigger an error. In some cases, using such a fuzzy match (a translation of a similar sentence) may be perfectly acceptable. In others, it may introduce errors.

↻ 2024-07-21