Caio Filippo Corro

Deep learning

This is the website for the Deep Learning course of Master 1 AI.

Grading Scheme

Contact

You can contact me at caio.corro@u-psud.fr, either in French or English, with a subject starting with [OPT11]. Please, do not worry about typos or not being overly formal enough (just treat your instructors and colleagues with the same respect you would like to be treated).

Lectures

Lab exercises

There should be 2 or 3 students per group. Deadlines are hard and you cannot change groups.

Lab exercises comprise two parts:

So the question is: what do you need to write in the report? There are no specific instruction! You must think about the report as an essay: the objective of the report is that you convince us that you understand the theoretical foundation of the model and how to implement it in practice. Use your own word and notations, try to process the course and the lab exercise and explain them to us. Do not write handwavy explanation. You should probably use Latex for this.

Content:

Length: 3-6 pages if double columns, a little bit longer otherwise, but these are not hard limits - you can do less, you can do a little more. Just don’t write too much, go to the essential. Formal notations with minimum writing to be understandable. Just convince me that you know what you are talking about. :)

Scoring: as long as you do the work seriously, that you commented the code you wrote and you submit a nice report, we will give you a good grade. Do not worry if you did not succeed to do everything or if you didn’t understand something. Explain in the report what you did not succeed so we can see you did some effort.

Frequently asked questions?

Lab exercises 1 and 2

You only need to submit the code of the second lab exercise, but to succeed the second one you must succeed the first one…

Hard deadline: April, 7
Send me the notebook and the report (PDF) by email, one email per group with names written explicitly in the email, the notebook and the report (-3 points if you don’t do that) and with title “Deep learning - lab exercise 2” (-2 points if you don’t do that): caio.corro [at] u-psud.fr

Lab exercise 3

This lab is not graded and must not be submitted. However, you need to carefully work it if you want to succeed the project.

Lab exercise 4: Data generation with a Variational Auto-encoder

Hard deadline: April, 29

You need to submit one report (PDF format, 3-6 pages) and one notebook. No zip/rar or other compressed files. The grad will be mainly based on the quality of the report. Expectations are:

The VAE that we will develop is based on the following generative story:

  1. z ~ p(z)
  2. x ~ p(x | z ; \theta)

where the latent representations z take value in R^n. The prior ditribution p(z) is a multivariate Gaussian where each coordinate is independent. We fix the mean and variance of each coordinate to 0 and 1, respectively. The conditional distribution p(x | z ; \theta) is parameterized by a neural network: it is the decoder! The generated pixels x are independent Gaussians with a fixed variance.

Note: this kind of VAE will be quite bad at generating MNIST picture. Therefore, when you do you experiments, you should both generate picture and display the mean parameters of the output distributions. This is a well known problem of VAE, you can try to play with the network architecture and the parameters to improve generation.

Although the decoder is similar to the auto-encoder decoder, the encoder is different: it must return two tensors, the tensor of means and the tensor of variances. As the variance of a Gaussian distribution is constrained to be strictly positive, it is usual to instead return the log-variance (or log squared variance), which is unconstrained. If you exponentiate the log-variance, you get the variance which will be strictly positive as the exponential function only returns positive values.

Similarly to the auto-encoder, there is several hyperparameters you can try to tune. However, for the VAE I strongly advise you to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class VAEEncoder(nn.Module):
    def __init__(self, dim_input, dim_latent):
        super().__init__()
        # TODO

    def forward(self, inputs):
        # TODO

        # mu = mean
        # log_sigma_squared = log variance
        # The idea is that you use two different output projection:
        # one for the mean, one for the log_sigma_squared
        # but all other layers are shared
        return mu, log_sigma_squared

Training loss

To compute the training loss, you must compute two terms:

For the reconstruction loss, you can use the mean square error loss.

To sample values, you can use the reparameterization trick as follows:

1
2
e = torch.normal(0, 1., mu.shape)
z = mu + e * torch.sqrt(torch.exp(log_sigma_squared))

The formula of the KL divergence with the prior is as follows:

1
-0.5 * torch.sum(1 + log_sigma_squared - mu.pow(2) - log_sigma_squared.exp())
see Appendix B of the original paper.

WARNING: you mu carefull check yourself that you mean over elements of your batches correctly so both loss functions have the correct “proportion”. You may need to do something like torch.sum(…, dim=1) before calling torch.mean(…) for the KL divergence: think about it and explain in your report.

Generate new images

1
2
3
4
5
6
7
e = torch.normal(0, 1., (10, 2))
images = decoder(e)

for i in range(10):
    picture = images[i].clone().detach().numpy()
    plt.imshow(picture.reshape(28,28), cmap='Greys')
    plt.show()

Latent space visualization

It is quite useful to visualize the latent space both for the auto-encoder and the variational auto-encoder. You can visualize it either for the training data or the dev data. Note that if you want to visualize a latent space when its dimension is greater than two (useful for the first part!), you could project it in 2 dimensions using PCA (its already implemented in scikit-learn!)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
decoder.eval()

# this code assume that:
# data is a tensor of shape (number of images, input dim)
# and datalabels a tensor of shape (number of images)
# adapt it to your code!

# tensor that will contain all latent points
points = np.empty((data.shape[0], dim_latent))
with torch.no_grad():
    for i in range(0, data.shape[0], batch_dim):
        batch = train_data_pixels[i:i+batch_dim]
        # auto-encoder
        mu = encoder(batch)
        # VAE, keep only the mean
        #mu, _ = encoder(batch)
        points[i:i+batch_dim] = mu.numpy()

plt.scatter(
    points[:,0], points[:, 1],
    # colormap is between 0 and 1, and we have 10 classes
    # so we just divide by 10 :)
    # https://matplotlib.org/3.1.1/tutorials/colors/colormaps.html
    c=matplotlib.cm.get_cmap("tab10")(data_labels / 10.)
)
plt.show()