Gaussian Mixture Model via Gradient Descent

In this exercise, we will see how we can train a Gaussian Mixture Model (GMM) via gradient descent:

Your goal is to study this code, play with params and understand what is going on!

1. Simple preliminary example

In this section, we illustrate how we use Pytorch for optimizing the parameters of a probability distribution.

In the following we solve the same problem but using stochastic gradient descent, i.e. at each update we use only a batch of data.

2. Data

We now generate the data we will use. They come from 3 different bivariate Gaussians where only one of them has a non null correlation.

GMM with diagonal covariance matrices

IMPORTANT You need to exactly understand what is going on in the forward function!

GMM with full covariance matrices

We will now move to the same example, but using a model that can learn models with non null correlation between coordinates.

The trick we use here is to represent the parameterization as explained here:

The correlation coefficient must be in [-1, 1]. Instead, we learn unconstrained parameters and reparameterize them via the hyperbolic tangent function.