Deep learning for natural language processing: Lab Exercises

Submission: Lab exercises must be submitted on the following ecampus page:

ecampus page for submission

You must submit only two files: your report as a PDF file and your code as a ipynb file. Please, only submit one time per group, and put all group members name at the beginning of each document.

Deadlines (2022):

  • Lab exercise 1: November 22, 23:59

  • Lab exercise 2: December 6, 23:59

  • Lab exercise 3: December 23, 23:59

Deadlines are hard, the submission website automatically close after them, so I strongly advise to not submit during the last minutes in order to avoid problems.

Groups: There should be 2-4 students per group (please ask early for single student group, otherwise there will be a penalty). I expect stronger work for group of 4 students. You cannot change groups between lab exercises.

Lab exercise 1: Sentence classification

Guidelines:

  • Keep the report short and concise.

  • Do not use convolutions modules from Pytorch, use simple operations and implement these convolutions yourself (i.e. using the Linear module and other basic operations).

  • The neural networks should be described formally in the report (that is, not using images copy/pasted from elsewhere)

  • You need to explain your code in the report: how did you implement this thing and this other thing and why.

  • If you use minibatches of size greater than one, you need to mask your inputs: explain how this works, and implication for the model. You need to use implicit broadcasting for this.

  • Report experiments comparing the two models.

  • (Extra) data analysis: is there any specific sentence where one of the model always fail? Both?

Lab exercise 2: Language modeling

Guidelines:

  • Keep the report short and concise.

  • The neural networks should be described formally in the report (that is, not using images copy/pasted from elsewhere) – no need to detail the internal structure of the LSTM

  • You need to explain your code in the report: how did you implement this thing and this other thing and why.

  • Explain how you batch computation, the difference between the two models with respect to batching

  • Report experiments comparing the two models.

Lab exercise 3: Part-of-speech tagging

# To read the embeddings:
with io.open("./data_pos/fasttext_fr", "r", encoding='utf-8') as istream:
    for line in istream:
        line = line.split(' ')
        word = line[0]
        embeddings = [float(f) for f in line[1:]]
        assert len(embeddings) == 300

        # TODO

Extra work:

  • Build a multi-stack BiLSTM. You will need to use the ModuleList module for this.

Guidelines:

  • TODO