import torch
import torch.nn as nn
import torch.nn.functional as F
import random
The goal of this lab exercise is build two neural language models:
Although the n-gram model is straighforward to code, there are a few "tricks" that you need to implement for the autoregressive model:
The idea of variational dropout is to apply the same mask at each position for a given sentence (if there are several sentences in a minibatch, use different masks for each input). The idea is as follows:
See Figure 1 of this paper: https://proceedings.neurips.cc/paper/2016/file/076a0c97d09cf1a0ec3e19c7f2529f2b-Paper.pdf
To implement this, you need to build a custom module that applies the dropout only if the network is in training mode.
You first need to download the Penn Treebank as pre-processed by Tomas Mikolov. It is available here: https://github.com/townie/PTB-dataset-from-Tomas-Mikolov-s-webpage/tree/master/data We will use the following files:
Check manually the data.
Todo:
def read_file(path):
data = list()
with open(path) as inf:
for line in inf:
line = line.strip()
if len(line) == 0:
continue
data.append({"text": line.split()})
return data
train_data = read_file("./ptb.train.txt")
dev_data = read_file("./ptb.valid.txt")
test_data = read_file("./ptb.test.txt")
print(len(train_data), len(dev_data), len(test_data))
print("\n\n".join(" ".join(s["text"]) for s in train_data[:5]))
class WordDict:
# constructor, words must be a set containing all words
def __init__(self, words):
assert type(words) == set
# TODO
# return the integer associated with a word
def word_to_id(self, word):
assert type(word) == str
# TODO
# return the word associated with an integer
def id_to_word(self, idx):
assert type(idx) == int
# TODO
# number of word in the dictionnary
def __len__(self):
# TODO
train_words = set()
for sentence in train_data:
train_words.update(sentence["text"])
train_words.update(["<bos>", "<eos>"])
word_dict = WordDict(train_words)
len(word_dict) # should be 10001
For evaluation, you must compute the perplexity of the test dataset (i.e. assume the dataset is one very long sentence), see: https://lena-voita.github.io/nlp_course/language_modeling.html#evaluation
Note that you don't need to explicitly compute the root, you can use log probabilities and properties of log functions for this. As during evaluation, you will see sentences one after the other, you can code a small class to keep track of log probabilities of words and compute the global perplexity at the end.
class Perplexity:
def __init__(self):
# TODO
def reset(self):
# TODO
def add_sentence(self, log_probs):
# log_probs: vector of log probabilities of words in a sentence
# TODO
def compute_perplexity(self):
# TODO
The model must be similar to the one presented in the course notes. Note that for training and testing, you can transform the data has a set of multiclass classification problems.
Todo:
This model should rely on a LSTM.
Warning: you need to use the option batch_first=True for the LSTM.