Deep learning for natural language processing: Lab Exercises

Submission: Lab exercises must be submitted on the following ecampus page:

You must submit only two files: your report as a PDF file and your code as a ipynb file. Please, only submit one time per group, and put all group members name at the beginning of each document.

Deadlines (2022):

Lab exercise 1: November 22, 23:59
Lab exercise 2: December 6, 23:59
Lab exercise 3: December 23, 23:59

Deadlines are hard, the submission website automatically close after them, so I strongly advise to not submit during the last minutes in order to avoid problems.

Groups: There should be 2-4 students per group (please ask early for single student group, otherwise there will be a penalty). I expect stronger work for group of 4 students. You cannot change groups between lab exercises.

Lab exercise 1: Sentence classification

Guidelines:

Keep the report short and concise.

Do not use convolutions modules from Pytorch, use simple operations and implement these convolutions yourself (i.e. using the Linear module and other basic operations).

The neural networks should be described formally in the report (that is, not using images copy/pasted from elsewhere)

You need to explain your code in the report: how did you implement this thing and this other thing and why.

If you use minibatches of size greater than one, you need to mask your inputs: explain how this works, and implication for the model. You need to use implicit broadcasting for this.

Report experiments comparing the two models.

(Extra) data analysis: is there any specific sentence where one of the model always fail? Both?

Lab exercise 2: Language modeling

Guidelines:

Keep the report short and concise.

The neural networks should be described formally in the report (that is, not using images copy/pasted from elsewhere) – no need to detail the internal structure of the LSTM

You need to explain your code in the report: how did you implement this thing and this other thing and why.

Explain how you batch computation, the difference between the two models with respect to batching

Report experiments comparing the two models.

Lab exercise 3: Part-of-speech tagging

Instruction: Download notebook or view as html

Model illustration: pdf

UPDATED! I made an mistake when I exported embeddings in the previous version Download data

# To read the embeddings:
with io.open("./data_pos/fasttext_fr", "r", encoding='utf-8') as istream:
    for line in istream:
        line = line.split(' ')
        word = line[0]
        embeddings = [float(f) for f in line[1:]]
        assert len(embeddings) == 300

        # TODO

Extra work:

Build a multi-stack BiLSTM. You will need to use the ModuleList module for this.

Guidelines:

TODO