Resources Credentials to access slides:

username: nlp2019 password: 2019nlp

Notes taken by Luizo ( @LuigiBrosNin on Telegram)

Won’t be taking lessons this year, might do something if i take the exam B)

1.1 Intro

Summarization of the general paradigm of Computational Linguistics

Machine Learning dominates the NLP studies

If we embrace an empiricist view, then for analyzing a specific phenomenon X in a language variety Y (both manually or with computational tools):

  1. we have to collect a good set of textual data (a corpus) composed of texts from the language variety Y;
  2. we will collect all the occurrences of the phenomenon X from the corpus;
  3. we will analyse the occurrences of the phenomenon X to find regularities that will allow us to completely describe the phenomenon inside the corpus;
  4. we will generalize our conclusions, derived from corpus analysis, to the entire variety.

Zipf's law (1949) a small portion of words in a language compose 40~% of each sentence Linguistic levels:

  • Phonetics/Phonology how it sounds / shows for sign languages
  • Morphology the various forms words can have
    • Infection (Derivations, compositions) (eg. cat/cats)
  • Syntax part of speeches, dependency, parsing, etc
  • Semantics Lexical semantics (synonymy, antonymy, etc)
  • Discourse analysis / Pragmatics
    • Anaphors: Mary is a woman. She loves Jhon.