Translation – Frequently Asked Questions

Martin Post

About this FAQ

If you consider the process of translation as a “black box”, it is pretty much what you’d expect:

If you request the translation of a document to a target language, you’ll receive a translated document. Done.
If you order translations of multiple documents and/or translation to multiple target languages, you’ll receive more documents.

So far, so good.

But “under the hood”, more interesting things are happening in the modern translation workflow. “Interesting” here meaning: elements that will allow you, the client, to…

save time,
save money,
get better results and
reuse these results more effectively.

In this FAQ, we’ll take a closer look at several aspects of this modern translation workflow.

Terminology

The translation industry has its own terminology. Let’s make sure we talk about the same things. 🙂

Language Service Provider

A person or company providing language services such as writing, proofreading, and translation.

Translation Buyer

A person or company buying translation services.

Translator

A person translating content from one language to another

(Translation) Reviewer / Proofreader

A person reviewing (and probably editing) a translation created by a translator.

Translation Memory

A translation memory (TM) is a database that stores “segments”, which can be sentences, paragraphs or sentence-like units (headings, titles or elements in a list) that have previously been translated.

A translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Translation units are created and stored during the translation process and can then be reused later for other translation projects.

Individual words and their translations are not within the domain of TM; they are handled by terminology bases (see below).

Software programs that are used to maintain and edit translation memories are sometimes known as translation memory managers (TMM). Translation memories are typically used in conjunction with dedicated computer assisted translation (CAT) software.

Fuzzy Match

During translation, translation software will check the currently translated sentence / segment against the translation memory. If the translation memory contains a sentence not identical, but similar to that sentence, it will be presented to the translator as a “fuzzy” match. Usually, such a “fuzzy” match is highlighted in some way to point out to the translator that the suggested translation probably has to be adjusted before using it for another sentence.

Termbase

A termbase (a contraction of “terminology” and “database”) is a database consisting of concept-oriented terminological entries (or “concepts”) and related information, usually in multilingual format. Entries may include any of the following additional information:

a definition
source or context of the term
subject area, domain, or industry
grammatical information (verb, noun, etc.)
notes
usage label (figurative, American English, formal, etc.)
author / editor meta data (“created by / at” / “modified by” / at)
verification status (“verified” or “approved”)
an ID

TMX (File Format)

Translation Memory eXchange is an XML specification for the exchange of translation memory data between computer-aided translation and localization tools with little or no loss of critical data. TMX forms part of the Open Architecture for XML Authoring and Localization (OAXAL) reference architecture.

XLIFF (File Format)

XLIFF (XML Localization Interchange File Format) is an XML-based format created to standardize the way localizable data are passed between tools during a localization process and a common format for CAT tool files. XLIFF was standardized by OASIS in 2002. The specification is aimed at the localization industry. It specifies elements and attributes to store content extracted from various original file formats and its corresponding translation. The goal was to abstract the localization skills from the engineering skills related to specific formats such as HTML.

MXLIFF (File Format)

MXLIFF is a native format of the Phrase translation platform used for translation projects. It is multilingual and contains lots of Phrase-specific metadata that allow tracking a translation unit, its origin and modification dates, the changes from a pre-inserted machine translation or translation memory match etc.

TBX (File Format)

TermBase eXchange is an international standard (ISO 30042:2008) for the representation of structured concept-oriented terminological data. TBX is copublished by ISO and the Localization Industry Standards Association (LISA). It is currently available as an ISO standard and as an open, industry standard, available at no charge. TBX defines an XML format for the exchange of terminology data and can be considered “an industry standard for terminology exchange”.

Basics: Translation Concepts and Workflows(s)

What is Computer-Assisted Translation?

Some content in this section of the FAQ was lifted and adapted from the Wikipedia article on computer-assisted translation.

Computer-assisted translation or CAT is a form of language translation in which a human translator uses computer hardware and software to support and facilitate the translation process. It is not to be confused with machine translation.

The automatic machine translation systems available today are not able to produce high-quality translations unaided: Their output must be edited by a human to correct errors and improve the quality of translation. Computer-assisted translation (CAT) incorporates that manual editing stage into the software, making translation an interactive process between human and computer.

What Tools Are Used in Computer-Assisted Translation?

Computer-assisted translation is a broad term covering a range of tools. These can include:

Translation software: a catch-all term for binary software applications or online tools / platforms used for translation that typically cover some or all of the following features / modules.
Well-known “traditional” (binary) translation software products: SDL Trados Studio, MemoQ and OmegaT – an open-source software. Well-known web-based or “hybrid” translation software applications: Across Language Server, Phrase and XTM Cloud
Electronic dictionaries and Spell checkers, either built into word processing software, or available as add-on programs. While termbases and glossaries as described below are usually client-, domain- or project-specific, a spell checker usually is a “catch-all” system for finding spelling errors, usually based on a default dictionary and a user dictionary.
Grammar checkers
Translation memory tools (TM tools), consisting of a database of text segments in a source language and their translations in one or more target languages.
Terminology databases (also known as termbases), either on the host computer or accessible through the Internet.
Concordancers, which are programs that retrieve instances of a word or an expression and their respective context in a monolingual, bilingual or multilingual corpus, such as a bitext or a translation memory
Bitext aligners: tools that align a source text and its translation which can then be analyzed using a full-text search tool or a concordancer
Project management software that allows linguists to structure complex translation projects in a form of chain of tasks (often called “workflow”), assign the various tasks to different people, and track the progress of each of these tasks

What Are The Differences Between The Modern And The “Traditional” Translation Workflow?

What is the Traditional Translation Workflow?

The “traditional”, digital* translation workflow is as follows:

* We are not considering pre-computer, paper-based translation here.

The translator will…

receive a document on a storage medium or via e-mail,
open this document on his computer and
translate its content – usually by overwriting sentence by sentence, headline by headline with the translated content
return the translated document to the client.

Advantages of the Traditional Translation Workflow

Simple and straightforward: If both sides use the same application / format (e.g. Microsoft Word documents), no other technology is required.

Disadvantages of the Traditional Translation Workflow

Both client and translator need to have the software used for creating the respective documents. For example, to translate an Adobe InDesign document, the translator needs to have InDesign – preferably the same version as the client, or file conversion may be required.
The result of the translation process is a translated document, and nothing else. Accordingly, every sentence exists only in the context of the translated document. This has another consequence:
There is no “live connection” between the content of the original document and the translation. To reuse a part of the translation, the responsible party would have to understand both the source and the target language and/or align the documents manually. This also means that translating 20 documents – no matter how similar they are – constitutes 20 separate (mini) projects where “reuse” requires remembering where to find previously translated content, pasting it into the next document and ensuring that these copies were consistent. This becomes extremely ineffective as more documents, document formats and languages are added.
For most file formats / document types, the translator will be able to edit not only the content, but also the structure of the document. This can lead to unintended changes, deletions or formatting variations.

What is the Modern Translation Workflow?

In summary, the modern translation workflow works as follows:

Documents are uploaded (from a local computer system) or “sideloaded” (from a document or content management system) into the translation environment. Here, translatable content is extracted and made available for translators, who connect to this central translation environment using either a web browser or a dedicated translation editor. All translation units are stored in the central translation memory and are made available immediately for other translators, so they can be reused in other documents, be proofread, quality-checked etc. When a translation is complete, the translated document is re-assembled on the server. The translator and/or client then can either download it or reimport it into their document or content management system.

In the modern translation workflow, the translation coordinator / project lead will…

upload the document(s) to a translation platform, where the translatable content is extracted from each document
run an analysis to determine if parts of the documents have been translated before
assign the translation to one or multiple translators, who will be notified.

The translator will…

log into the translation system / platform
access the translatable content from the documents that were assigned to him
translate the content, often using external resources such as one or more translation memories (TM), termbases and reference documents
mark the translation project as completed.

Optionally, translations can be reviewed by one or more persons (reviewers). These could be other translators, product specialists or subject matter experts. In the review step, the reviewer will…

log into the translation system / platform
access the translated content
edit translations, answer questions and add remarks.

In the final step, the translation coordinator / project lead will…

run a quality check to ensure that all translations are complete and structurally correct
download the translated documents.

Advantages of the Modern Translation Workflow

The translator only needs to have and understand his translation software, a dedicated tool for the task. This allows him to translate documents in fairly exotic formats or for expensive software without making any extra effort or investment. In other words: The translator can focus on translation.
As a result, a translation buyer can choose the right translators based on their qualifications, not the applications they own or know.
Translations are added to the translation memory. This means that translated content become reusable for other documents and projects, saving time and money. For example, content from a product manual could easily be reused in a knowledge base.
In many modern translation tools, reuse occurs automatically: One a segment has been translated and confirmed, it is automatically “propagated” (inserted for all following instances of the segment in question).
Working with translation memories and termbases ensures consistency. Variations that could lead to ambiguities and misunderstandings are avoided.
The translator only gets to see and edit translatable content. He cannot delete sections, restructure documents or change their formatting.
All relevant resources (documents, translation memories, termbases) are available in one central location. When something is changed (a term definition, a product claim translation), it becomes available to everyone (project manager, translators, reviewers…) immediately.

Disadvantages of the Modern Translation Workflow

If the translation workflow writes to and reads from an online translation memory, a permanent Internet connection is required. Therefore, some translation platforms – such as Phrase – offer an offline client that will allow translators to work offline and upload their work later.
In the same way that the correct translation of a segment allows reuse and ensures consistency, an error written to the translation memory will easily be propagated within the document and to new documents. In other words: mistakes will multiply. Accordingly, quality assurance is an important step.

Translator Productivity

How Many Words Can A Translator Typically Translate Per Day?

Of course, the number of words that a translator can translate per hour or day depends on several factors: project scope, topic, language complexity, topic complexity, terminology, general quality of the source text…

In general, a professional translator should be able to translate around 300 words per hour and 2000 words per day in good quality of he is familiar with the topic and terminology. Using high-quality machine translation engines and translation memories may speed up the process, but not indefinitely.

Machine Translation

What is Machine Translation?

Machine translation (MT) uses software to translate text or speech from one language to another. For more information, see the extensive Wikipedia article on machine translation.

The most commonly known machine translation engines are Google Translate, Microsoft Translator and DeepL.

What is Neural Machine Translation?

Neural machine translation (NMT) is an approach to machine translation that uses a large neural network. It departs from phrase-based statistical translation approaches. Google, Microsoft translation services and DeepL now use NMT. An open source neural machine translation system, OpenNMT, has been released by the Harvard NLP group.

NMT models use deep learning and representation learning.

Does Machine Translation Replace Human Translators?

The quality of machine translation has improved considerably over the last years. In certain text categories, modern machine translation engines will provide results that are indistinguishable from a translator’s work.

However, companies demanding correct, high-quality translations shouldn’t rely on unedited machine translations. The risk of publishing poorly worded content containing factual errors and embarrassing word choices usually outweighs the benefits (speed and savings).

In the modern translation workflow, translators will use machine translation as a tool: A segment (for example a sentence) will be sent to a machine translation engine. The machine translation of that segment will then be presented to the translator as a suggestion, which he can…

use as it is (without editing)
modify (sentence structure, terminology)
discard and create his own translation instead.

In this scenario, machine translation is a tool speeding up the translation process, not a replacement for the translator.

When a Language Service Provider Uses Machine Translation, Will the Translated Content Become Public?

Most commercial machine translation engines are available in two “flavors”:

in a free, publicly available version over a web interface, where content can be entered or pasted, and the translation is available over the same web interface. This includes Google’s public Translate page, the free DeepL Translator and the Bing Microsoft Translator.

Translatable text entered into publicly available machine translation engines usually becomes part of the training data corpus. In other words: Text entered here may not be made publicly available as such, but it becomes part of the vast data structures that are used to improve the quality of machine translation engines. If a person using the engine edits the translation result, these edits will also be used to improve translation quality.

For commercial Machine Translation Engines, the commercial/API-based version will usually keep the translated data private. For example, the DeepL Privacy page states:

“When using DeepL Pro, the texts you submit and their translations are never stored, and are used only insofar as it is necessary to create the translation.”

Pricing

Please note that while the previous sections of this FAQ describe common practices in the translation industry, the following section describes quoting and pricing as handled by / offered at 9to5 Media Services. These are still common practices, but other language service providers may have their own approach.

How Are Prices Determined?

Translation prices are based on the number of words in the source text.

Counting the number of words in the target text is not a common practice. It would encourage translators to create more verbose translations than necessary to increase the number of billable words.

Why Aren’t Prices Based On The Number Of Pages, Paragraph, Or Lines?

In traditional translation (especially for literature or news), criteria such as pages and lines are perfectly valid. Most pages in a book (for example a novel) will usually contain a certain amount of text (words) close to the full document’s average (1800 words is a usual number). And in printed news, there is a typical average number of words or characters per line.

However, these measurements make little sense in most modern document formats and online systems, where the number of characters and words can fluctuate. For example, modern web pages are usually built using responsive design principles / technologies, and the number of words can be 10 or 120 per “line”, based on screen size and user settings.

What Services Are Included In The Base Translation Word Rate?

Communication between the translation buyer and language service provider (via e-mail, phone, Skype or similar service), unless very complex explanations / clarifications are required – these might be billed after prior notice.
Project setup (creation and configuration of a project in the translation management system, translation memory and termbase setup)
Preparation of source material for translation, unless extensive conversion and editing is required – in this case, the Premium Translation Word Rate will be used.
Upload of translatable content to the translation management system
Analysis of source documents in the translation management system (taking existing translation memories into account)
Quotes based on analysis of the source document(s). Quotes are sent via e-mail.
Translation
Quality assurance in the translation management system
Error correction based on quality assurance
Download of translatable content from the translation management system
Delivery of translated, quality-checked content to the client in digital form.

When Is The Base Translation Word Rate Used?

The Base Translation Word Rate is normally used for the following document categories:

Press releases
Long-form documents that have no or only a few images and very little in-document formatting
Short product manuals typeset in Microsoft Word or other basic word processing application (if they are properly formatted)
Knowledge base content
Plain text documents

What Services Are Included In The Premium Translation Word Rate?

All services included in the base translation word rate.
Preparation of complex source material for translation, including removal of elements that would slow down translation (in-sentence line breaks, hard hyphens etc.), conversion from legacy formats to a format supported by modern translation tools etc.
Post-editing (after translation) of complex documents. This typically includes regeneration of metadata such as table of contents or index, adjusting page breaks in long documents etc.
Creation of reference PDFs (if applicable and requested)

When Is The Premium Translation Word Rate Used?

The Premium Translation Word Rate is normally used for the following document categories:

Long-form documents (especially manuals) with many images and/or independent text frames and/or a lot of local (in-sentence) formatting

How Does Quality Assurance Work In Computer-Assisted Translation?

Quality assurance modules of a computer-assisted translation system will check for various errors in the translation, such as:

Quality Assurance Criteria

Spelling

Spelling errors (terms not present in the user dictionary or termbase) are identified. Translators are encouraged to add terms both to their personal dictionaries and to the (Project / client) termbase to reduce the number of reported spelling errors.

Empty Target

The target segment of a translation unit should never be empty, because this would lead to “holes” not only in the translated text, but also in other documents based on it.

Punctuation Variations

A translator might forget to set a period at the end of a sentence, or add one where it isn’t required, for example in lists.

Inconsistent Translations: Identical Source, Different Target

If a segment appears more than once in a document, and the second and consecutive instances are translated differently, this will result in an error. For most technical and marking text categories, there should be a 1:1 relation between source segments and their translations. In other words: there should always be one translation for a given sentence / segment. Of course, there may be exceptions where a given context requires a different translation.

Inconsistent Translations: Identical Target, Different Source

If there are multiple identical sentences / segments in a translation based on different source segments, this will result in an error. There are obviously cases (for example small errors or variations in source segments) where having the same translation is acceptable. In other cases, it may indicate that a translator has ignored minor, but important differences between two similar source segments.

Missing Numbers

If there are numbers in a source segment, but not in the target segment, this will trigger an error. There may be language-specific conventions for writing out “small numbers” (e.g. up to 12), but in other cases (such as technical specifications), these errors could have severe consequences.

Repeated words

Repeated words (such as “the the”) usually constitute an error.

Multiple Spaces

In traditional word processing and layout applications and document formats, multiple spaces will be visible as such in the resulting document and should usually be avoided (the old convention of having two spaces after a period was US-specific to start with and has largely been abandoned in the world of word processing and high-quality, variable width fonts). In other formats (XML, HTML), multiple spaces will be ignored / displayed as one space.

Leading And Trailing Spaces

In traditional word processing and layout applications / document formats, leading and trailing spaces will become part of the translated document and be shown as such. In other formats (XML, HTML), leading and trailing spaces will be ignored. In general, leading and trailing spaces should not be part of translation units, as they could lead to unwanted indentation or content reflow.

Tags And Formatting

One of the main advantages of computer-assisted translation is the ability to extract translatable content (in the form of segments / translation units) and present segment-internal formatting and metadata such as bold type, italics, hyperlinks, index entries etc. in the form of abstract “tags”. In most translation applications, these are presented as colored boxes that can be moved or deleted, but not edited. The translator can place these tags in the target segment, for example to highlight words or sections that were also highlighted in the source segment.

If there are inconsistencies (such as too few or too many tags in the target segment), this will trigger an error. Tag errors can lead to fatal issues when the target document is created, as missing tags might break the internal structure or at least the aesthetic of the document. For example, even one missing “end bold formatting here” tag in a segment could lead to the rest of a hundred page document being formatted in bold type. Therefore, all tag and formatting errors need to be resolved.

Inconsistent Tag Content

If a tag (as described in the previous article) is placed, but the tag content is different from the tag in the source, this will trigger an error. However, different tag content may be perfectly valid – for example, when a hyperlink in the French translation should point to a different target than the hyperlink in the English source.

Terminology Errors: Required Terms Not Used

If the termbase for a project has an entry for a word, and the translation specified in this entry is not being used, this will trigger an error. This quality assurance mechanism helps to improve translation consistency (for example, by ensuring that “knob” is always translated as “Drehregler” and not “Regler”). On the other hand, many terms have ()multiple acceptable translations,(. em) so translators may have to ignore this type of error.

Terminology Errors: Forbidden Terms Used

If the termbase for a project has an entry for a word, and a translation is marked as forbidden, an error will be reported when this term is being used. This might also be used to highlight undesired variations of a company or product name.

Unresolved Comments

If a translator has entered a comment (usually a question) during translation, and the reviewer has not marked this comment as resolved, this will trigger an error.

Target Segment Length

For certain document types and user interface translations, it may be required to restrict the length of the translation. In that case, translations exceeding that length will trigger an error.

Fuzzy Translation Memory Matches With No Post-Editing

When a translator accepts a fuzzy match from the translation memory and confirms it without editing it, this will trigger an error. In some cases, using such a fuzzy match (a translation of a similar sentence) may be perfectly acceptable. In others, it may introduce errors.

↻ 2024-07-21