#9 Semantics, spelling and vocabulary: creating a new kind of ambiguity?
The study of language in use is a truly rewarding experience. Human discourse is a treasure trove of variation and creativity, and languages like English and German come in a number social and regional varieties used for a multitude of interaction purposes. Often, their most distinguishing characteristic is their lexicon, their vocabularies. Let’s look at what free online translation tools know about special purpose vocabularies. Here’s an example from the language of science and engineering:
For the purposes of this part of ISO 9117, a single coating or multi-coat system of paint or varnish or related material is considered to be through-dry when a specified gauze under specifies pressure, torsion and time does not damage the film.
Für die Zwecke dieses Teils von ISO 9117 wird eine einzelne Beschichtung oder ein Mehrschichtsystem aus Farbe oder Lack oder einem verwandten Material als durchtränkt angesehen, wenn eine spezifische Gaze unter spezifischem Druck, Torsion und Zeit den Film nicht beschädigt.
Even if you don’t know what “through-dry” means, I think you would be able to hazard a guess, wouldn’t you? It’s something to do with being dry. The German translation provided by the machine, however, means “saturated, impregnated” (i.e. quite the opposite of being dry)! The correct term is “durchgetrocknet” (or “durchgehärtet”), a documented and standardised special purpose term (in chemistry and related fields). So, what went wrong here?
Machines need databases of word lists to tap into when converting sentences from one language to another. Glossaries of special purpose vocabularies and annotated lists of technical and scientific terms, for example, are fed into these databases to provide the raw material for the computer to access at high speed during the conversion process.
Let’s look at an example of social variation. We had one in an earlier post – the “dings” used by DIY enthusiasts. Here’s another example from a different field, illustrating the power of creativity:
One beef with the BMI is that burgers cannot easily be traded across borders.
BMI in this sentence stands for the “Big Mac Index” produced by The Economist, and the author uses the phrase “to have a beef with someone” (have a complaint) partly for comic effect, as “beef” belongs to the same lexical field as “burger”. Unfortunately, the machine doesn’t understand the context (or the joke) and gives us the meat (“Rindfleisch”) instead:
Ein Rindfleisch mit dem BMI ist, dass Burger nicht einfach über Grenzen hinweg gehandelt werden können.
Well, at least we still have something to laugh about.
Similarly, the slang term “binge”, here used in the sentence starting with “You can binge on…” presents a problem:
Unfortunately, the machine is way off track with “auf die Beine stellen”, which means “to organise/mount something” (e.g. an exhibition). If you have a clue as to why this is, please write to me. Maybe the machine is short-sighted and needs glasses, which is the only explanation I could find for the following example:
[Er] nutzt die Daten unserer Newsletterempfänger jedoch nicht, um diese selbst anzuschreiben oder an Dritte weiterzugeben.
However, [he] does not use the data of our newsletter recipients to write them down or pass them on to third parties.
The verb “anschreiben” has various meanings (both in standard and colloquial usage, as well as in combination with the verb “lassen”), but is here clearly used to mean “to write to somebody”. The context makes this clear. But do you see how the pronoun “diese” – “them” is misinterpreted, missing the coreference with the newsletter recipients? That is the reason for this semantic error.
One last example:
They stand by him when an attempt is made on his life by a gang of rockers who were paid to kill him.
Sie stehen ihm bei, als eine Rockerbande, die dafür bezahlt wurden, ihn zu töten, versucht, sein Leben zu retten.
The phrase “attempt is made on his life” has a relatively low frequency in the English language (less than 0.01 occurrences per 1 million words, cf. Sketch Engine), but it is not ambiguous in any way. It means to try and kill somebody, not to save somebody’s life, as the machine would have us believe. It seems that machines still have a lot of learning to do; they are “inexperienced”.
Would you like to share some examples or discuss the topic? Write to me below or on Facebook. I look forward to hearing from you!