ECTS credits ECTS credits: 5
ECTS Hours Rules/Memories Hours of tutorials: 5 Expository Class: 15 Interactive Classroom: 20 Total: 40
Use languages German, English
Type: Ordinary subject Master’s Degree RD 1393/2007 - 822/2021
Departments: Galician Philology, External department linked to the degrees, Philosophy and Anthropology
Areas: Galician and Portuguese Philology, Área externa M.U Erasmus Mundus en Lexicografía (2ªed), Logic and Philosophy of Science
Center Faculty of Philology
Call: First Semester
Teaching: With teaching
Enrolment: Enrollable
- Training students to work with computer tools for linguistic data processing.
- Giving students skills to design and implement basic tools to automatically extract lexicographic information from texts.
This course presents an introduction to some basic programming methods in scripting languages (e.g. R, Python, etc), aimed at creating lexicographic resources. More precisely, the course will focus on automatic extraction of collocations and lexical relations.
1. Introduction to natural language processing with R
1.1. Basic tasks: tokenization and sentences splitting
1.2. Lemmatization and Part of Speech Tagging
1.3. Named Entity Recognition
2. Corpus linguistics for lexical and grammatical analysis
2.1. Introduction to corpus linguistics
2.2. The corpus study: hypotheses and variables
2.3. Corpus data: retrieval and annotation
2.4. Statistical analysis and visualization of results
3. Collaborative lexicography
3.1. Basics of collaborative work
3.2. Crowdsourced collaborative lexicography: the Wikitionary project
3.3. Some tools for collaborative lexicography
Basic bibliography
Thalken, Rosamond & Jockers, Matthew L. (2020). Text analysis with R: for students of literature, Cham: Springer.
Evert, Stefan (2008). “Corpora and collocations”. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, article 58, pages 1212-1248. Mouton de Gruyter, Berlin.
Stefanowitsch, Anatol (2020). Corpus linguistics: A guide to the methodology. Berlin: Language Science Press. https://doi.org/10.5281/zenodo.3735822
Wu, Winston, / Yarowsky, David (2020). “Wiktionary normalization of translations and morphological information”. In Donia Sot / Nuria Bel / Chengqing Zong, eds., Proceedings of the 28th International Conference on Computational Linguistics , Barcelona: International Committee on Computational Linguistics, pp. 4683-4692.
Complementary bibliography
Abel, Andrea & Meyer, Christian M. (2013). “The dynamics outside the paper: user contributions to online dictionaries”, en Iztok Kosem / Jelena Kallas / Polona Gantar / Simon Krek / Margit Langemets / Maria Tuulik, coords., Electronic lexicography in the 21st century: thinking outside the paper: proceedings of the eLex 2013 conference, 17–19 October 2013, Tallinn, Estonia. Liublliana / Tallin: Institute for Applied Slovene Studies / Institute of the Estonian Language, pp. 179–194. Available at: <http://eki.ee/elex2013/ proceedings/eLex2013_13_Abel+Meyer.pdf>
Arnold, T., & Tilton, L. (2015). Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text (1st ed.). Springer International Publishing AG.
Grefenstette, Gregory (1994). Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, Norwell, MA, USA.
Gries, Stefan Th. (2021). Statistics for linguistics with R: a practical introduction (3.a ed.). Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110718256
Mel’chuk, Igor (1998). “Collocations and Lexical Functions”. In A.P. Cowie (ed.): Phraseology. Theory, Analysis, and Applications, Oxford: Clarendon Press, 23-53.
Meyer, Christian M. / Gurevych, Iryna (2012a): “Wiktionary: a new rival for expert-build lexicons? Exploring the possibilities of collaborative lexicography”, in Sylviane Granger / Magali Paquot, eds., Electronic Lexicography. Oxford: Oxford University Press, pp. 259–595.
Müller-Spitzer, Carolin / Wolfer, Sasha / Koplenig, Alexander (2015): “Observing online dictionary users: studies using Wiktionary log files”, International Journal of Lexicography, 28/1, pp. 1–26.
Padó, Sebastian & Lapata, Mirella (2007). “Dependency-based construction of semantic space models”. Computational Linguistics. 33 (2): 161–199.
Sahlgren, Magnus (2008). “The Distributional Hypothesis”. Rivista dei Linguistica. 20(1): 33–53.
Sweigart, Ao (2015). Automate the Boring Stuff with Python: Practical Programming for Total Beginners, Non Starch Press.
Wolfer, Sasha / Müller-Spitzer, Carolin (2016). “How Many People Constitute a Crowd and What Do They Do? Quantitative Analyses of Revisions in the English and German Wiktionary Editions”. Lexikos. 26: 347-371.
(Additional references could be suggested during the module)
Knowledge or contents: Con03, Con05, Con06
Skills or abilities: H/D05, H/D06, H/D07, H/D09
Competencies: Comp02, Comp03, Comp08
- Lectures guide by the professors, conveying knowledge to students, and open to discussion.
- Lab sessions in and out the classroom following a collaborative methodology.
- Tasks previously proposed as individual work outside the classroom will be the subject of analysis and discussion in the classroom.
1. First chance: Realization and delivery of tasks for each module and active participation: 100%.
2. Second chance: Students who fail the course or who have not submitted their assignments must agree with the correspondent course lecturer on a new deadline for submitting their assignments in order to pass the course on the second opportunity.
Class attendance is mandatory in order to pass the course, as well as the submission of any assignments set during the course. At the end of the course and before the final exam, a list of course grades will be published.
Those students granted by the Faculty authorities with special permission for not attending lessons regularly will necessarily have to write a final work, which will constitute 100% of the final grade.
Academic misconduct (cheating, plagiarism in exercises or tests) will be penalized according to the University regulations on student assessment (“Normativa de avaliación do rendemento académico dos estudantes e de revisión de cualificacións”)
The number of hours for attendance in person is 35, to which we must add the individual work of students.
- It is recommended to take this subject considering the basic skills previously learnt in Introduction to Computer Science and Natural Language Processing.
- It is expected of students’ preparation –before and after– class hours.
- Students will apply in this subject methodologies studied in Resources and tools with lexicographic application: use and design I.
- In cases of fraudulent completion of exercises or tests, the provisions of the Regulations on the assessment of student academic performance and review of qualifications shall apply.
- Gender perspective: It is recommended that non-sexist language be used, both in daily classroom work and in academic assignments, in accordance with the recommendations of the USC.
- Institutional technological tools: the use of the rai.usc email account is mandatory. This account will be necessary to access any of the services provided by the USC (Virtual Campus, Teams, Virtual Secretariat, etc.). No communication made from an email account outside the USC will be answered.
- Mobile phones, computers, tablets or similar devices may not be used, except when used as a work tool in accordance with the instructions given by the teacher, with students being responsible for any legal and academic consequences that may arise from improper use.
- Compliance with data protection regulations is mandatory.
- The materials produced by the teacher are protected by intellectual property and copyright regulations and may not be disclosed or made accessible without the author's permission.
- Students with specific educational support needs and/or disabilities should contact the University Participation and Inclusion Service (SEPIU) and submit their request for adaptations using the form available on the SEPIU website or in the virtual student office. For further information, please contact sepiu.santiago [at] usc.gal (sepiu[dot]santiago[at]usc[dot]gal) or call 881 812 859/ 881 812 858.
Carlos Valcarcel Riveiro
- Department
- External department linked to the degrees
- Area
- Área externa M.U Erasmus Mundus Máster Europeo en Lexicografía
- carlos.valcarcel [at] rai.usc.es
- Category
- External area professor
Martin Pereira Fariña
Coordinador/a- Department
- Philosophy and Anthropology
- Area
- Logic and Philosophy of Science
- Phone
- 881812525
- martin.pereira [at] usc.es
- Category
- Professor: Temporary PhD professor
Vitor Miguez Rego
- Department
- Galician Philology
- Area
- Galician and Portuguese Philology
- vitor.miguez [at] usc.gal
- Category
- Professor: LOU (Organic Law for Universities) PhD Assistant Professor
Tuesday | |||
---|---|---|---|
16:00-18:00 | Grupo /CLE_01 | English | B06 |
12.22.2025 09:30-12:00 | Grupo /CLIS_01 | B06 |
12.22.2025 09:30-12:00 | Grupo /CLE_01 | B06 |
01.15.2026 09:30-12:00 | Grupo /CLIS_01 | B06 |
01.15.2026 09:30-12:00 | Grupo /CLE_01 | B06 |