Contents |
[edit]
Graphs
[edit]
Quotes
- "Any MT evaluation measure is less reliable on shorter translations. But, reliability on shorter texts, as short as one sentence or even one phrase, is highly desirable because a reliable MT revaluation measure can greatly accelerate exploratory data analysis." (Turian et al. 2003)
- "We claim that WordNets can be used to weight the lexical adequacy. This weight may be computed by measuring the conceptual distance between the node which represents the expected lexical unit and the one which represents the translation obtained." (Marrafa and Ribeiro 2001)
- "One problem with conducting correlation experiments with human assessment scores at the sentence level is that the human scores are noisy — that is, the levels of agreement between human judges on the actual sentence level assessment scores is not extremely high." (Banerjee and Lavie 2005)
- "... the noise in the human assessments hurts the correlations between automatic scores and human assessments." (Banerjee and Lavie 2005)
- " ... want to know if these predictions will hold across a range of target languages and text types." (Babych et al. 2005)
- "Low inter-judge correlation in the present experiment underscores how little the community understands about the MT evaluation problem. If the MT research community is serious about designing reliable automatic MT evaluation measures, then we must obtain human judgement data through more reliable means" (Turian et al. 2003)
- "... in wider domains, BLEU and NIST may need more than one reference translation or a large test set in order to produce results that correlate reliably with human assessments." (Coughlin 2003)
- "... in highly technical domains like ours, a single reference translation is sufficient to produce high-quality results." (Coughlin 2003)
- "... a metric that exhibits high levels of correlation with human judgements at the sentence level would be highly desirable." (Lavie 2004)
- "It is well-known that using more sentences and more references increases the reliability of MT evaluation." (Turian et al. 2003)
[edit]
References
- Turian, J. P. and Shen, L. and Melamed, I. D. (2003) "Evaluation of Machine Translation and its Evaluation" in MT Summit IX, New Orleans, USA, 23-27 September 2003. pp. 386-393
- Marrafa, P. and Ribeiro, A. (2001) "Quantitative Evaluation of Machine Translation Systems: Sentence Level" in MT Summit VIII, Santiago de Compostella, Spain, 18-22 September 2001.
- Banerjee, S. and Lavie, A. (2005) "METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments" in Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43th Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005
- Babych, B., Hartley, A. and Elliott, D. (2005) "Estimating the predictive power of n-gram MT evaluation metrics across languages and text types" in
[edit]
References to find
(1994) John S. White, Theresa A. O'Connell, Francis E. O'Mara: The ARPA MT evaluation methodologies: evolution, lessons,and future approaches. Technology partnerships for crossing the language barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas,5-8 October, Columbia, Maryland, USA. [Washington, DC: AMTA]; pp. 193-205 [PDF, 212KB]- Lorna Balkan. "Quality Criteria for MT". Technical report, University of Essex, Colchester, England, 1991. BIBLIOGRAPHY 218
- Arnold D., Humphreys R:L: & Sadler L. (eds). 1993. Special Issue on Evaluation of MT Systems. Machine Translation vol. 8, Nos. 1-2, 1993.
- Flanagan, M. 1994. Error Classification for MT Evaluation. In Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, MD.
- JEIDA 1992. JEIDA Methodology and Criteria on Machine Translation Evaluation (JEIDA Report). H. Nomura (editor). Japan Electronic Industry Development Association.
- Orr, D. & Small, V. 1967. Comprehensibility of Machine-Aided Translations of Russian Scientific Documents. Mechanical Translation and Computational Linguistics, 10, 1-10.
- Pankowicz, Z. L. 1978. Facts of Life in Assessment of Machine Translation, CEC, Luxembourg.
- Sager, J. 1978. Criteria for Machine Translation Evaluation. Proceedings of Workshop on Evaluation Problems in Machine Translation. Luxembourg. February, 1978.
-
Somers, H. and Wild, E. 2000. Evaluating Machine Translation: the Cloze procedure revisited. Translating and the Computer 22, London, November 2000. - Van Slype, G. 1979. Critical Methods for Evaluating the Quality of Machine Translation. Prepared for the European Commission Directorate General Scientific and Technical Information and Information Management. Report BR-19142. Bureau Marcel van Dijk (PDF available).
- Vanni, M. & Reeder, F., 2000. How Are You Doing? A Look at MT Evaluation. In White J.S. (Ed.): Envisioning Machine Translation in the Information Future, 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Cuernavaca, Mexico, October 10-14, 2000. LNCS 1934, Springer, p.109-116.



