Competence Matrix

This section describes the DataLitMT Competence Matrix, which was developed based on the interface between the Professional MT Literacy Framework and the DataLitMT Framework and which guided the design of the DataLitMT learning resources. The descriptive categories of the competence matrix (Data Context, Data Planning, etc., together with their individual subcategories) were derived from the (sub)dimensions of the DataLitMT Framework. The individual competence descriptors refer to various MT-specific application contexts provided by the Professional MT Literacy Framework. The DataLitMT Competence Matrix is presented in concise form here. An exhaustive discussion of the matrix can be found in Krüger/Hackenbuchner (submitted for publication).

Data Context

The data context sets the stage for any data project and comprises general knowledge and a critical awareness of how to use and apply data and potential ethical implications of working with data, as well as the ability to identify and specify individual tasks within a workflow that could be supported or optimised with the help of data. There is, for example, an immediate link between the subdimensions of critical thinking and data ethics and the individual subdimensions of societal MT literacy, which is concerned with the wider ethical and societal impact of MT.

Basic Level
1.1 Conceptual framework Can demonstrate general knowledge and understanding of translation data (e.g. MT training data, machine-translated target data) and its uses and applications in translation scenarios in general.
1.2 Critical thinking Can understand and describe potential general problems, risks and implications of translation data collection/production, evaluation and use practices, and can reflect on the implications of these practices.
1.3 Data ethics Can understand and describe legal and ethical issues associated with collecting/producing, evaluating and using open-source or commercially available translation data.
1.4 Data culture Can describe general areas of application in a given translation scenario which could be supported or optimised using translation data, showing a critical awareness and taking into account ethical considerations.
Advanced Level
1.1 Conceptual framework Can apply general knowledge and understanding of translation data (e.g. MT training data, machine-translated target data) and its uses and applications when analysing specific translation scenarios.
1.2 Critical thinking Can analyse specific translation scenarios with regard to particular problems, risks and implications of translation data collection/production, evaluation and use practices, and can reflect on the implications of these practices.
1.3 Data ethics Can identify and analyse legal and ethical issues associated with collecting/producing, evaluating and using open-source or commercially available translation data in specific translation scenarios, and can describe how to comply with legal and ethical requirements when handling translation data in such scenarios.
1.4 Data culture Can identify and analyse specific areas of application in different translation scenarios which could be supported or optimised using translation data, showing a critical awareness and illustrating how ethical requirements can be complied with.

Data Planning

Data planning links the more theoretical data context with the more practical sections of the competence matrix. It involves performing a data requirement analysis in order to identify which specific data is required to support/optimise individual tasks, developing a data strategy which guides the acquisition of this data, practical aspects of data curation and protection, and identifying and evaluating potential data sources. Data Planning is related, for example, to the MT training pipeline and MT domain adaptation subdimensions of technical MT literacy and to the linguistic quality requirements for the MT subdimension of linguistic MT literacy, since parameters such as volume, domain, language combination and quality of MT training data have to be established during data planning and will in turn guide individual planning steps such as identifying suitable data sources.

Basic Level
2.1 Data requirement analysis Can understand and describe in a general way how MT training data have to be chosen according to specific MT tasks in a given MT-assisted translation scenario.
2.2 Data strategy Can choose a suitable data strategy from a range of options to determine how MT training data requirements can be satisfied in a given MT-assisted translation scenario.
2.3 Data curation/protection Can understand and describe data security risks and their potential business impact in a given MT-assisted translation scenario, and is aware of general data curation requirements.
2.4 Identifying/evaluating data sources Can choose, from a range of options, those MT training data sources which are best suited for a given MT-assisted translation scenario in terms of accessibility, relevance and usability.
Advanced Level
2.1 Data requirement analysis Can identify and critically evaluate which MT training data is suitable to solve specific MT tasks in different MT-assisted translation scenarios.
2.2 Data strategy Can develop a specific strategy to determine how MT training data requirements can be satisfied in different MT-assisted translation scenarios.
2.3 Data curation/protection Can identify and assess data security risks and their potential business impact in different MT-assisted translation scenarios, can propose suitable mitigation measures, and can comply with both general and specific data curation requirements.
2.4 Identifying/evaluating data sources Can identify suitable MT training data sources for different MT-assisted translation scenarios, and can critically evaluate and assess their accessibility, relevance and usability.

Data Collection and Production

Data collection and production describes the process of collecting relevant data as identified in the data planning step, applying tools to work with this data and using this data to create new data. Therefore, data collection/production is directly linked to the MT training pipeline subdimension of technical MT literacy, with data acquisition, organisation, preparation and processing describing the central steps of such an MT training pipeline.

Basic Level
3.1 Data verification Can follow instructions to check MT training data quality for a given MT-assisted translation scenario in accordance with a range of pre-selected criteria.
3.2 Data acquisition Can follow instructions to collect MT training data for a given MT-assisted translation scenario.
3.3 Data organisation/management Can understand basic methods and tools for MT training data organisation to then follow instructions for implementing these methods and for creating and using basic metadata. Can also implement these basic methods for organising additional data produced at later stages of a given MT-assisted translation scenario.
3.4 Data preparation Can understand different MT-specific data types and methods for converting and cleaning MT training data, and can follow instructions to implement these methods in a given MT-assisted translation scenario.
3.5 Data processing Can understand the basic methodology for using MT training data in the training process of an MT system, and can follow instructions to feed previously prepared training data into the MT system in order to create a trained MT model which could be employed in a given MT-assisted translation scenario.
3.6 Data creation Can follow instructions to apply a previously trained MT model to new source data to create new machine-translated target data, and can also follow instructions to save and organise MT output data produced in this data creation step, drawing on previously acquired data organisation/management skills.
Advanced Level
3.1 Data verification Can critically evaluate MT training data quality for different MT-assisted translation scenarios, developing suitable assessment criteria and taking into account data-strategic considerations.
3.2 Data acquisition Can identify and perform the steps required to collect MT training data for different MT-assisted translation scenarios, taking into account data-strategic considerations.
3.3 Data organisation/management Can assess data organisation requirements pertaining to different MT-assisted translation scenarios, can implement suitable methods and tools for MT training data organisation, and can create and use relevant metadata. Can also implement these methods for organising additional data produced at later stages of such MT-assisted translation scenarios.
3.4 Data preparation Can critically evaluate and implement suitable methods for converting and cleaning MT training data in different MT-assisted translation scenarios, and can also identify outliers or anomalies in the data and remove such outliers or anomalies from the data.
3.5 Data processing Can assess and, if necessary, adjust the methodology for using MT training data in the training process of an MT system, and can feed previously prepared training data into the MT system in order to create trained MT models suitable for the requirements of different MT-assisted translation scenarios.
3.6 Data creation Can independently apply previously trained MT models to new source data to create new machine-translated target data, and can apply previously acquired data organisation/management skills to save and organise MT output data produced in this data creation step.

Data Evaluation

Data evaluation focuses on working with the data collected and/or produced in the previous step of a data project. It is concerned with applying methods and tools for data analysis and evaluation, creating graphical or textual representations of data analysis results and deriving key insights from these results. Again, there are points of contact between data evaluation and technical MT literacy (particularly automatic MT quality evaluation/estimation) and linguistic MT literacy (particularly manual MT quality evaluation), since data evaluation in an MT context will usually focus on automatically or manually evaluating new translation data produced by a previously trained MT engine.

Basic Level
4.1 Data analysis Can understand basic methods and tools for manually or automatically analysing machine-translated target data produced in a given MT-assisted translation scenario, and can follow instructions to conduct such analyses.
4.2 Data visualisation Can follow instructions to create basic tables or graphical representations of MT data analysis results, and can evaluate the general effectiveness and accuracy of such tables/representations.
4.3 Data verbalisation Can verbalise the results of MT data analyses in various text forms in a factual manner.
4.4 Data interpretation Can read and understand tables or graphical representations of the results of MT data analyses, and can identify basic insights as well as potential discrepancies in these tables/graphical representations.
Advanced Level
4.1 Data analysis Can develop a detailed plan to manually or automatically analyse machine-translated target data produced in different MT-assisted translation scenarios, and can use this plan to subsequently conduct such analyses.
4.2 Data visualisation Can create sophisticated tables and graphical representations of prior MT data analysis results, ensure their accuracy, and critically evaluate their effectiveness with regard to specific contexts of use.
4.3 Data verbalisation Can verbalise the results of MT data analyses in various text forms in a factual manner, and can critically evaluate the adequacy of these verbalisations with regard to specific contexts of use.
4.4 Data interpretation Can read and interpret tables or graphical representations of the results of MT data analyses, identify key insights and integrate these with other relevant data/information, and can identify potential discrepancies in these tables/graphical representations. Can also recognise, assess and interpret the (statistical) methods used during data analysis, and can recognise the various transformation steps performed on the data from analysis to interpretation.

Data Use

The steps covered by the data use section complete a typical data project. They focus on communicating results of prior data analyses to relevant stakeholders within an organisation, making data-driven decisions based on these results, critically evaluating the impact of data-driven decisions and the overall data project, and taking practical measures such as preserving data and sharing them for future reuse. Data use is related primarily to economic MT literacy, which covers the management/business side of MT-assisted translation projects and is concerned with steps such as effort estimation/measurement in machine translation post-editing (MTPE), price calculation in MTPE, setting up or optimising business processes with a view to MT integration, etc. Ideally, such management/business decisions are data-driven and informed by the results of respective data analyses.

Basic Level
5.1 Data communication Can communicate, in speech and in writing, factual verbalisations and visualisations of previously analysed MT data to relevant stakeholders who were not part of the data evaluation phase in order to achieve a pre-defined communicative goal in a given MT-assisted translation scenario.
5.2 Data-driven decision making Can identify basic insights gathered from MT data analyses to be converted into actionable information, and can weigh the advantages and disadvantages as well as the overall impact of corresponding data-driven decisions in a given MT-assisted translation scenario.
5.3 Critical analysis of DDDM/data project Can identify and evaluate the general effectiveness and impact of data-driven decisions based on MT data analyses in a given MT-assisted translation scenario, and can reflect on the overall effectiveness/impact of a given MT data project.
5.4 Data preservation Can understand and describe the general requirements of MT data preservation, can evaluate the suitability of pre-defined methods and tools for such data preservation, and can follow instructions to preserve MT data in a given MT-assisted translation scenario.
5.5 Data sharing/reuse Can understand and describe the advantages and disadvantages of various pre-defined methods and selected platforms for MT data sharing, and can follow instructions to share such data in a legally correct and ethically adequate manner.
Advanced Level
5.1 Data communication Can effectively communicate, in speech and in writing, factual and purpose-oriented verbalisations and visualisations of previously analysed MT data to relevant stakeholders who were not part of the data evaluation phase in order to achieve various audience-specific communicative goals in different MT-assisted translation scenarios.
5.2 Data-driven decision making Can critically assess and prioritise key insights gathered from MT data analyses to be converted into actionable information, can weigh the advantages and disadvantages as well as the overall impact of corresponding data-driven decisions in different MT-assisted translation scenarios, and can implement such decisions in various contexts of use.
5.3 Critical analysis of DDDM/data project Can critically evaluate the specific effectiveness and impact of data-driven decisions based on MT data analyses in different MT-assisted translation scenarios, can critically reflect on the overall effectiveness/impact of a particular MT data project, and can identify specific areas of future improvements.
5.4 Data preservation Can identify and critically evaluate specific requirements, methods and tools for MT data preservation in different MT-assisted translation scenarios, and can implement specific actions required for preserving such data.
5.5 Data sharing/reuse Can identify and critically assess suitable methods and platforms for MT data sharing in different MT-assisted translation scenarios, and can share such data in a legally correct and ethically adequate manner.