1 Tadeusz Piotrowski, Łukasz Grabowski (editors) The Translator and the Computer Proceedings of a Conference held in Wrocław, April 20 21, 2012, organized by the Philological School of Higher Education and C&M Localization Centre WYDAWNICTWO WYŻSZEJ SZKOŁY FILOLOGICZNEJ WE WROCŁAWIU
2 Copyright by Wyższa Szkoła Filologiczna we Wrocławiu, Wrocław 2013 Copy editing: Irena Szymaniec Proofreading: Barbara Woldan Layout and typesetting: Sylwia Rudzińska Cover design: Konstancja Górny ISBN WYDAWNICTWO WYŻSZEJ SZKOŁY FILOLOGICZNEJ WE WROCŁAWIU Wrocław, ul. Sienkiewicza 32, tel. (+48 71) fax (+48 71) , Wydanie I.
3 Contents Introduction... 5 PART I. PRACTICAL APPLICATIONS Iwona Sikora, Polish translators workstation: On the usage and adoption of computer-assisted translation tools with some implications for translators training Łukasz Bogucki, Translation software in audiovisual contexts Michał Tyszkowski, Bartłomiej Dymek, Zarządzanie projektem tłumaczeniowym w nowoczesnej firmie jako zaawansowany proces biznesowy Wojciech Figiel, Komputerowy system notacji w tłumaczeniu konsekutywnym dostępny dla tłumaczy z dysfunkcją wzroku Ksenia Gałuskina, Korpusy w pracy tłumacza Katarzyna Marszałek-Kowalewska, Piotr Wierzchoń, Computer-aided translation of technical texts: A corpus-based study of multiword unit extraction in the modern Persian language Ewa Rudnicka, Maciej Piasecki, Polish-English wordnet a new resource and its potential for translators Marcin Walczyński, Editorial errors in translation: Translators computer skills and the implications for translators training Grzegorz Wojarnik, Uwarunkowania prawne i technologiczne elektronicznego repertorium tłumacza przysięgłego w modelu cloud computing
4 4 Contents PART II. RESEARCH ON TRANSLATION Maciej Eder, Computational stylistics and Biblical translation: How reliable can a dendrogram be? Łukasz Grabowski, Computational stylometry in search of translation universals: The case of English original and translated science fiction novels Jan Rybicki, Stylometric translator attribution: Do translators leave lexical traces? Bogusław Solecki, Quantitative and qualitative analysis of sample translations produced by Google Translate and human translators
5 Introduction On April 21 22, 2012, the Philological School of Higher Education in Wrocław, together with C&M Localization Centre, organized a conference The Translator and the Computer. A two-day meeting brought together a diverse group of participants, who shared their experiences and insights and suggested how the translator s work could be improved. The conference also provided an opportunity to reflect upon the future of the translator s work and translation studies in the context of rapid development of information and computer technologies. More than 30 papers were presented at The Translator and the Computer conference, with plenary lectures delivered by Łukasz Bogucki (University of Łódź) and Krzysztof Jassem (Adam Mickiewicz University in Poznań). Also, a number of practical workshops and special sessions dedicated to presentation of commercial software, such as ApSICXbench, Déjà Vu X2, Google Translator Toolkit, SDL Trados Studio, memoq, XTM and many others, were organized. The list of participants in the conference included linguists, lexicographers, translators, academic teachers, computer scientists, product specialists and many other scholars and practitioners interested in using a variety of computer programs and tools (e.g. computer-assisted translation software, translation memories, online dictionaries, language corpora etc.) in the translator s work. The present volume has a selection of papers presented at the conference, which can be categorized into two broad topical areas, such as practical applications of computer technologies in translator s work and empirical research on translation completed with the help of computer programs. Practical applications The first part of this volume has nine articles. In the opening paper, Iwona Sikora presents the results of the survey on the use of computer-assisted translation tools (CAT) among Polish translators. The results of the study conducted in 2011 showed that despite widespread use of CAT tools and high awareness of the benefits they provide, there is still a large group of professionals who are either unfamiliar with or skeptical about this software. Consequently, the author formulates a number of suggestions concerning translators training in such tools. The paper by Łukasz Bogucki presents an overview of state-of-the-art developments and computer and digital applications used in a dynamically developing field of audiovisual translation, including subtitling, surtitling, fansubbing,
6 6 Introduction dubbing, voice-over and audiodescription. The author notes that the practice of audiovisual translation has become increasingly dependent on computer technologies, including solutions such as eye-tracking, machine translation, speech recognition, text compression, normally used in other areas of science. The paper by Michał Tyszkowski and Bartłomiej Dymek focuses on complexity of translation project management seen as a complex business process. Their paper describes in greater detail the stages of the translation project life cycle, starting from file preprocessing and ending with final verification of the completed translation. Apart from technical aspects of the translation project management, the authors discuss typical problems and potential areas for improvement with an aim to raise awareness of the complexity of this business process. Wojciech Figiel addresses a specific problem of note-taking in consecutive interpreting as encountered by the blind or visually-impaired interpreters. Since traditional note-taking systems that require the use of sight cannot be used in such circumstances, the author presents in detail a custom-designed note-taking system consisting of a laptop computer equipped with a speech synthesizer, screen reader and dedicated software used by the blind or low- -sighted people. He also presents selected examples of practical application of this system, as well as the rules and principles governing its use. Ksenia Gałuskina explores various possibilities of using different types of language corpora (monolingual, comparable and parallel) that can be useful in the translator s work, and discusses the concept of the Web as corpus, presents a number of corpus-building and corpus-processing tools, such as BootCat, AntConc, WordSmith Tools, Unitex and NooJ, and outlines their most important functionalities that can be used as translation aids. The article by Katarzyna Marszałek-Kowalewska and Piotr Wierzchoń investigates the use of various extraction methods of multiword units from a custom-designed corpus of Persian (Farsi) language with the view to facilitate translation of Persian technical texts into English. Grounded in corpus linguistics, the methods provide data in the form of inventories of fixed multiword units that can be further used in translation of specialist texts in language pairs other than Persian English. Ewa Rudnicka and Maciej Piasecki present some possibilities of using English and Polish wordnets, i.e. plwordnet and Princeton Wordnet two lexical and semantic databases freely available online, in translation practice. The authors describe a set of hierarchically ordered inter-lingual relations and the mapping procedure used for the linking of the two wordnets, and discuss the role of these inter-lingual relations in finding translation equivalents. Marcin Walczyński discusses the problem of editorial errors encountered in translation. Drawing on a number of examples, he proposes a classification of such errors into five major categories, such as format setting, typo-
7 Introduction 7 graphic, spelling, punctuation- and convention-related errors. Finally, Marcin Walczyński explores causes of these errors and puts forward some proposals of remedial measures aimed to raise awareness of this problem among translators. In the last paper in this part, Grzegorz Wojarnik addresses a number of legal and technological issues relevant in the context of building an online repertory system for sworn translators. The author shows that the requirements imposed by legal regulations, in particular the ones on personal data protection, provide a challenge in developing such an online system. Grzegorz Wojarnik sees the cloud computing technology as a framework to develop the repertory system which can to some extent free translators from duties imposed by legislators. Empirical research on translation The second part of this volume has four papers dealing with computer-assisted empirical studies on translation. In the first, Maciej Eder utilizes a selection of methods typical of computational stylometry in a comparison of two versions of the New Testament, i.e. the Greek original and its Latin translation known as the Vulgate. Although the study focuses primarily on stylistic differentiation between particular books, the author broaches upon the problem of reliability of data in computational stylometry, and presents a simple way of improving reliability of cluster analysis plots using resampling of input data. The paper by Łukasz Grabowski presents the results of an empirical study of translation universals of core patterns of lexical use and the leveling-out in English translations of contemporary Russian science fiction novels. Using a combination of selected corpus linguistics and computational stylometry methods, the author shows that the style of English translations differs from science fiction novels originally written in English. Jan Rybicki examines the application of selected quantitative methods typically used in authorship attribution in the exploration of the phenomenon of translator s visibility or invisibility in translated texts. Comparing a number of authorial and translatorial signals in selected English and Polish originals and translations, Jan Rybicki is convinced that the results can help one reexamine some of the current views on language, literature, literary translation, authorship and style. The volume ends with a paper by Bogusław Solecki, who presents the results of an experiment aimed to measure the quality of English-to-Polish and Polish- -to-english translation of randomly chosen fragments of texts completed with the help of Google Translate machine translation tool. The results show that the quality of produced translations is rather inconsistent, in particular in terms of syntax. The author recommends experimenting with the program by using different text types as Google Translate occasionally produces high-quality results.
8 8 Introduction Acknowledgements The present volume would not have appeared without the support and hard work of many people engaged both in its preparation and in the organization of The Translator and the Computer conference. We would like to thank Mr Ryszard Opala, the Chancellor of the Philological School of Higher Education, and Mr Michał Tyszkowski from C&M Localization Centre in Wrocław for making this conference possible. Many thanks are also due to the members of the organizing committee, including Magdalena Nowak, Anna Zasłona, Anna Gamracy, Bogumił Ucherek and Grzegorz Ziemkiewicz, and in particular to Ms Monika Szela, who has given so freely of her time and effort. Tadeusz Piotrowski and Łukasz Grabowski
9 Part I. Practical applications
11 Polish translators workstation: On the usage and adoption of computer-assisted translation tools with some implications for translators training Iwona Sikora School of Higher Vocational Education in Nysa, Poland Częstochowa University of Technology, Poland Technology extends human capacities (Biau Gil & Pym 2006) Abstract. This article presents full discussion of the results of the survey conducted among Polish translators in July and September 2011 and is a revised version of an earlier author s article (Sikora 2012). The study presented below concerns the adoption and usage of computer-assisted translation tools used especially at the document and translation production level, with special attention paid to translation memory tools. The results of the survey suggest that computer-assisted tools are in a widespread use in translations and translators are aware of the benefits of using such technologies. However, the study also shows that there is still a group of professionals who either have not had a chance to become familiar with these tools or due to different reasons are convinced that the time, effort, and costs devoted to learning how to operate such programs exceed the benefits of their usage. On this basis, certain conclusions concerning translators training in translation technologies are formulated. Moreover, translators preferences concerning the features and composition of computer-assisted translation systems are examined with a view to providing some hints for the further research and development in this field. Keywords. Machine translation, translation memory, computer-assisted translation, translator s workstation, information communication technology. 1. Introduction The aim of this paper is to present and discuss the results of the survey carried out among Polish translators in July and September 2011 concerning the range of tools, considered to form the so-called translator s workstation, used by professional translators in the process of document translation and production.
12 12 Iwona Sikora A modern translator s workstation 1 is a rather complex system built of several various components. It comprises a variety of tools starting with word counting applications and ending with Translation Memory (TM) technologies. It is not a single program containing all necessary tools and applications but rather a suite of programs which can be adapted to each translator s individual needs. A Translation Memory and Terminology Management Tools, often referred as Terminology Management Systems (TMS), are its integral parts, yet all the other components are equally important since they facilitate, speed up and improve the quality of the translation services provided. 2. The survey 2.1. The survey of goals The survey s aim was to examine the adoption and usage of information and communications technologies, in particular Computer-Assisted called also Computer-Aided Translation (CAT) tools. Similar surveys examining translators attitudes towards CAT tools, Translation Memory (TM) systems or Machine Translation (MT) modules were carried out mainly in the UK (cf. Trad Online 2010/2011; Fulford & Granell-Zafra 2005: 4; Lagoudaki 2006; Dillon & Fraser 2006). No such studies, however, have been published so far in reference to Polish translators. This study is an attempt to fill this gap and provide some preliminary insights into the Polish translators practices, attitudes and opinions concerning the usage of selected CAT tools. More specifically, the study aims are as follows: (1) to study translators attitudes towards CAT tools and TM systems in particular; (2) to examine the range of tools used by translators in the translation pro cess (at the stage of translation creation, document preparation, information search, and terminology retrieval); (3) to analyze the reason for which translators use or do not use TM systems; (4) to analyze the features of TM packages considered most useful in translator s work; (5) to examine terminology search techniques; (6) to examine the level of adoption of CAT tools among Polish translators. 1 A good introduction to the origin and usage of the term translator s workstation constitutes the article of John Hutchins (1998).
13 Polish translators workstation The survey s design and structure The classification of translator s activities proposed by Heather Fulford and Joaquin Granell-Zafra (2005: 5; 2004: 54 55) was partially used in the preparation of this survey. Moreover, the survey by Elina Lagoudaki (2006) served also as the guidance in designing the questionnaire. The questions in this study refer especially to software and applications at the document production level as, e.g., word processors, OCR (optical character recognition) readers, file converters, etc.), translation creation (translation memory and terminology management systems), as well as one of the most essential phases in the translation process terminology and information search (various tools and techniques used for finding relevant terminology and necessary information). The survey was conducted online in July and September 2011 and consisted of four parts with 45 questions. It was aimed at Polish translators only and was promoted via translators forums such as Proz.com (Polish section), MLingua.pl, GoldenLine.pl, BFT Branżowe Forum Tłumaczy, Textum.pl; also it was sent via to freelancers, translation agencies and certified translators. 285 questionnaires were collected in total, out of which 159 (56%) were completed in full and eligible for the analysis. The results will be presented in the following sections: (1) Translator s profile (demographic data, education, special qualifications and work profile information); (2) Familiarity with Information Communication Technology (ICT) and its usage for translation (general purpose and translation-specific purpose software); (3) Terminology search and retrieval strategies (technologies used, strategies preferred, terminology management); (4) Perception of CAT tools, especially TM (attitudes, requirements, reasons for adoption and using, reasons for rejection). 3. The survey of research results and analysis 3.1. Translator s profile The majority of the respondents were female (69%) and the average age was from 30 to 39 years (46%). The distribution of the other age groups was as follows: years 24%, years 16%, years 8%, and 60 years and over 6%. The majority of respondents (36%) come from a city with the population over , 26% from to residents, 19% up to residents, 10% live in rural areas, and 9% reside in towns with the population from to
14 14 Iwona Sikora More than 90% of the respondents hold a higher education diploma, out of which 41% completed postgraduate studies. Only 7% of the translators have higher vocational education, while there were only four (3%) translators with secondary education certificate (Figure1). Figure 1. Education level of the respondents As for their studies major, 125 respondents (78%) completed language studies. The second largest group (11%) took their degree in economic and business studies, followed by 8% majoring in technical subjects. The total exceeds 100% because for this question the respondents were allowed to choose more than one answer. It also means that some translators majored in two or more disciplines. The results show that the great majority hold a diploma in language studies and only 34 translators do not have linguistic qualifications for performing this profession and have a higher education diploma in other fields (Figure 2). other medical studies liberal arts other than language studies social studies legal studies natural studies technical/mathematical studies economics/business studies language studies 3% 3% 4% 5% 5% 6% 8% 11% 78% Figure 2. Studies major
15 Polish translators workstation 15 The results also show that only 19% of all translators completed more than one study, which may seem to be a rather low number. Only eight respondents completed law studies, which seems a little puzzling in relation to the high number of translators who translate legal texts (Figure 4). Looking at the other disciplines, 18 translators reported completing economic/business studies, 12 declared majoring in disciplines related to technical subjects, 10 in natural studies, and 8 in social domain. The respondents were also questioned about possessing special professional (apart from linguistic) qualifications for performing the translator s job acquired during any type of studies but with strictly translational major or specialization. 31% 26% 27% 13% 3% no - only general education yes - BA studies with major in translation yes - MA studies with major in translation yes - postgraduate studies for translators other Figure 3. Special professional qualifications A great majority (69%), apart from general higher education, possess also specific qualifications in the field of translation. Those with general linguistic education constitute 31% (see Figure 3). These results show a tendency among Polish translators to improve their qualifications either at post-graduate programs (27%) or at BA/MA programs with major in translation studies (29%). These numbers show a growing trend among Polish translators to choose this profession consciously. Some translators enumerated also acquiring a doctoral degree or completing special courses for translators abroad. For this question the respondents were allowed to choose more than one answer. The results demonstrate that the translators do not limit themselves to one type of services and perform also simultaneous (21%) and consecutive (43%) interpreting services. The results also show that written translation is a basic form of their professional activity accompanied by interpreting services. The fact that almost half of the respondents specialize in both translation and interpreting shows the flexible and versatile character of the Polish translational community.
16 16 Iwona Sikora In regard to main subject areas of translated texts, the most common were legal (62%), economic, financial (60%), and technical (59%) texts (Figure 4). Figure 4. Main subject areas of translated texts As for professional experience, the respondents declared having from 1 to 48 years of experience (Figure 5). The biggest group includes translators with professional experience from 6 to 10 years (38%), followed by those with experience between 11 and 20 years (25%). The third biggest group consists of young translators with professional experience up to 5 years (23%). Figure 5. Years of professional experience Asked about the volume of pages ( characters or ca. 350 words) translated weekly, most respondents reported that they translate (15%) pages or (15%) pages a week, which would make an average of 10 pages
17 Polish translators workstation 17 per a working day. These numbers seem feasible taking into account the widespread usage of CAT tools (Figure 6). 15% 15% 13% 11% 11% 10% 8% 7% 4% 4% 2% 0% 1-5 pp pp pp pp pp pp pp pp pp pp pp. over 100 pp. Figure 6. Number of pages translated weekly As for computer literacy level, 51% of the respondents describe their computer skills as very good, 37% as good, and 11% as average; whereas only one translator thinks their computer skills are too low and insufficient (cf. Table 1). These answers correspond to the results obtained for the question concerning the usage of CAT tools. The comparison of these results shows that the higher computer skills of the respondents the greater the usage of CAT tools (TM systems in particular). Only one respondent claimed his/her computer skills were too low and he/she also reported not using CAT tools. Among the respondents who rated their computer skills as average, a majority 61% do not use CAT tools. For the respondents who described their computer skills as good or very good this tendency is reversed: the majority (61% for good and 75% for very good) acknowledge the usage of CAT tools. These results indicate strong correlation between the level of computer skills and actual usage of CAT tools. Table 1. Level of computer skills and translation memory usage Do you use CAT tools with TM? Level of computer skills in % (numbers in the brackets show the number of respondents) not sufficient average good very good yes (0) 0% (7) 39% (36) 61% (61) 75% no (1) 100% (11) 61% (23) 39% (18) 22% never heard of such tools (0) 0% (0) 0% (0) 0% (2) 3% Total (1) 1% (18) 11% (59) 37% (81) 51% With regard to file formats, translators work most frequently with standard file formats such as Microsoft Word, Excel, PowerPoint and plain text, which are typically supported by CAT tools. The other file formats used are PDF
18 18 Iwona Sikora (Portable Document Format) files over 50%. Other computer-processable file formats were HTML (HyperText Markup Language) files (almost 20%), DTP (Desktop Publishing) files 3%, and other graphic file formats 8%. Moreover, 26% of the translated documents are in paper form (Figure 7). Figure 7. File formats translated most frequently This data shows that a greater majority of translation volume is in a computer-processable form which should be a clear indication for CAT tools designers and producers to widen the range of files supported by such applications. The respondents were also asked to specify the purposes for which they use the Internet in their work. To this question more than one answer was possible. The answers are distributed evenly and the results indicate that the translators use the Internet for three main purposes: communication with clients and other translators ( ing, processing orders, discussions on forums) and as a source of specialist and linguistic knowledge the Internet serves as a tool for searching for relevant information and terminology required for the translation project performed. Fewer translators use the Internet for marketing purposes (Figure 8). other 10% communication with clients and other translators, processing orders as a source of specialist knowledge (searching for specialist information,source texts) as a source of linguistic knowledge (searching for and verifying terminology) 96% 91% 95% for marketing purposes 62% Figure 8. Uses of the Internet
19 Polish translators workstation Familiarity with Information Communication Technology and its usage for translation In this section, the results for questions concerning the level of adoption and usage of various ITC technologies are discussed. Table 2 presents the results for usage of computer tools at the document production level. The results reveal a widespread use of word processing applications, with MS Word being the most commonly used by 98% of the respondents. Another indispensable tool in the translator s workstation is a word counting application. Although most word processors have built-in applications of this type (and 78% of respondents use such tools), 10% declare using an independent word counting tool (e.g., PractiCount, AnyCount, FineCount), 7% report they do not use such tools at all, and 5% never heard of such applications. The total of positive answers is 88%, whereas negative answers amount to 12%. This number is a bit surprising since 99% of respondents declared translating written texts and the official and widely accepted procedure in settlements with clients is charging per an accounting page, which requires using a word counting application. Table 2. Usage of computer tools at the document production level Speech recognition software Yes No Never heard 1% 92% 7% DTP 15% 83% 2% Graphics software 42% 59% 0% OCR software 52% 45% 3% File converters 62% 34% 4% Presentation tools 66% 33% 1% Spellcheckers 85% 11% 4% Word counting tools 88% 7% 5% Word editors 100% 0% 0% In reference to spellchecking devices, 85% of the respondents use a spellchecker built into a word processor, and only two translators use other tools. The negative answers total 15%, with 4% of respondents who never heard of such tools and 11% declaring not using spellcheckers at all. 66% of the group use presentation software (such as MS PowerPoint) and 62% use file converters in their work. The negative answers constitute 34% for presentation programs and 38% for file converters. Within these two latter groups only 1% and 4%, respectively, never heard about such applications. The responses to these two questions are still at a high level but a decreasing ten-
20 20 Iwona Sikora dency can already be noticed. This statement is corroborated by the results for the next question concerning the use of OCR (Optical Computer Recognition) technology: 52% use OCR application in their work, 48% do not use this tool, and only 3% never heard of such technology. With regard to image and graphics editors (e.g., CorelDraw, Adobe Photoshop) 42% declare using these tools and 59% provide negative answers. The percentage of positive answers lowers also for DTP software, with only 15% needing this application in their work. The results are a bit different for Speech Recognition (SR) devices only two respondents (1%) use this tool in their work, whereas negative answers were collected from the overriding majority of 92%. Here the number of respondents who never heard of such devices is slightly higher in comparison to the same category for the previous question 7%. The results presented in Table 2 reveal a decreasing tendency in the usage of standard office computer applications which are used by translators at the document production level. Actually, most of these applications are practically indispensable in the translator s workstation (such as word counting tools, OCR applications or file converters). However, there are still a few respondents who are not even aware of their existence. All in all, it can be seen that the general software used in document production processes has high levels of adoption and usage. The numbers are lower for more specialized software such as DTP, OCR or SR tools. This decreasing tendency is, however, even more noticeable as the level of specialization of translations and technologies increases (Table 3). And thus, 84% translators do not use software for web pages localization; about 11% declare using it and 5% never heard of such tools. The higher percentage of negative answers to this question is, of course, conditioned also by a lower number of translators who perform this specific type of translation. The same observation applies to software localization programs which are used only by 21% of the sample. The remaining 73% do not use it, whereas around 6% never heard of such tools. The last question in this category concerned subtitling tools. Here the numbers are similar: 11% yes, 80% no, and 9% never heard of such software which is the highest number so far for all tools enumerated in the questionnaire (apart from TM and MT systems). Table 3. Usage of special translation tools Yes No Never heard Localization software 21% 73% 6% Webpage translation 11% 84% 5% Subtitling software 11% 80% 9%