How to Fix the 5 Flaws in Evaluating Machine Translation

Marion Marking, writing for the Slator:

"A new study examines the basis of such claims that (as the researchers put it) “machine translation has increased […] to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations.” And it does so by taking a closer look at the human assessments that led to such claims.
The new study, published in the peer-reviewed Journal of Artificial Intelligence Research, shows that recent findings of human parity in machine translation were due to “weaknesses” in the way humans evaluated MT output."

Turns out, human evaluation of MT depends on a number of factors that boil down to three highly subjective circumstances:

“The choice of raters, the availability of linguistic context, and the creation of reference translations”