# Gaussian Mixture Model via Gradient Descent¶

In this exercise, we will see how we can train a Gaussian Mixture Model (GMM) via gradient descent:

• first, we will consider independant bivariate Gaussians, i.e. the correlation coefficients in the correlation matrix are null, therefore we can simply store the diagonal and model each coordinate independently - this part is already done, you just need to study the code to understand it
• second, we will consider bivariate Gaussian with (possibly) non null correlation coefficients

Your goal is to study this code, play with params and understand what is going on!

## 1. Simple preliminary example¶

In this section, we illustrate how we use Pytorch for optimizing the parameters of a probability distribution.

In the following we solve the same problem but using stochastic gradient descent, i.e. at each update we use only a batch of data.

## 2. Data¶

We now generate the data we will use. They come from 3 different bivariate Gaussians where only one of them has a non null correlation.

## GMM with diagonal covariance matrices¶

IMPORTANT You need to exactly understand what is going on in the forward function!

## GMM with full covariance matrices¶

We will now move to the same example, but using a model that can learn models with non null correlation between coordinates.

The trick we use here is to represent the parameterization as explained here: https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Bivariate_case

The correlation coefficient must be in [-1, 1]. Instead, we learn unconstrained parameters and reparameterize them via the hyperbolic tangent function.