Advice for ML studentsWork in progress: I will try to regulary update this page with more information. Please, send me an e-mail if you feel like something is missing. This page contains my personnal advice to (future) students in our Artificial Intelligence Master at Paris-Saclay, but I hope it will also be useful for other students. It is obviously highly biased toward sub-domains that I know, for example I will cover Natural Language Processing but not Computer Vision. This guide is not intended to be exhaustive but instead to focus on the smallest possible list of recommendations so that you don't feel overwhelmed. Outline:
Importantly, and I cannot stress it enough: do not pay for online courses. Do not pay for online courses. Do not pay for online courses. Do not pay for online courses. Do not pay for online courses. Do not pay for online courses. Do not pay for online courses. Do not pay for online courses. DO NOT PAY FOR ONLINE COURSES. If you are a (future) student in our master and you fail to find one of the book/article listed here, please contact me by e-mail. BackgroundFormalisationIt is really important that you learn to formalize your ideas using math and pseudo-code. Pictures are good for the intuition but:
Of course there are many counter examples, e.g. probabilistic graphical models which have a well defined “graphical semantic”. Let's take a concrete exemple: how to present a multilayer perceptron (MLP)? (do not worry if you don't know what a MLP is, you will learn that in the deep learning course) This simple and common deep learning architecture is often presented using this kind of figure:
Unfortunately, this picture doesn't say much and it is not precise. I doesn't fully specify what is the computation done by the model. A good way to present it would be: Let \(x \in \mathbb R^n\) be an input vector. A multilayer perceptron with a single hidden layer computes an output logit vector \(w \in \mathbb R^m\) as follows: \[ z = \tanh(A^{(1)} x + b^{(1)}) \\ w = A^{(2)} z + b^{(2)} \] where:
If you are not familliar with matrix multiplication it can be helpful to draw the matrices to keep track of dimensions — do it on paper when you code so you are not lost!
This kind of drawing may become more important when you will need to rely on implicit broadcasting. ProgrammingThe main language you will need during this master is Python, which is really popular in machine learning and datascience in general. There are many other popular languages including C, Rust and Julia, but you will have plenty of time to learn them when you will need to use them. For example, C is still used to design specific components in machine learning architectures that require speed and cannot be parallelized easily on GPU. However, in general and in the context of this master, focus on Python: learn how to write programs in Python, how it differs from languages you already know (e.g. loop definition in Python is unusual!) and play with it. It's a computer science master, therefore strong programming skills is a requirement. I strongly advise to NOT spend time learning how to use fancy new languages or libraries. Use the basic libraries that works well and are well documented. There are two important libraries that you will need to use:
Play with them, train yourself to write programs that manipulate vectors/matrices and display graphs. During the master you will probably use Pytorch and Pandas, but start with these two first and once you are proficient with them it will be easy for you to use the others. A commonly used tool to prototype Python scripts is Jupyter. Install it and use it: most of your lab exercises will be based on Jupyter notebooks. Linear algebraLinear algebra is a recurring topic in machine learning and datascience: it will appear everywhere. Yes, everywhere. Having a good understanding of linear algebra will help to understand many algorithms and analysis techniques really quickly. You will save a lot of time right now and later. If you did not study it as an undegrad or if you need to refresh your memory, I can only advise you to watch and study Gilbert Strang's course which available online for free:
My advice is to study topics covered until (and including) lecture 21. However, it is long and you may not have time to study everything. Don't despair if you can't do it all, watching only a few of the videos will already do you a big favor (but watch them in order!). Probabilities/statisticsAlex Tsun uploaded a set of 5 minutes videos on probability that can be useful to understand the main concepts: LatexIn computer science, we formalize and expose our work using mathematics and pseudo-code. You must learn to write reports than contains math and (pseudo-)code nicely, do not display screenshots from other people reports. The common tool for this is latex, there are many software available, I recommend: The Overleaf website contains a nice Latex tutorial and an introduction on writing math in Latex. Write you tex files like you write code:
An important feature of Latex is that it takes care of the layout for you. Do not use \\ to break lines or create a new paragraph: just leave one empty line for a “short paragraph break” and \paragraph{} for a “long paragraph break”. A few tips:
It is important to write clearly, here are a few useful links:
Helpful resourcesHere is a list of books you can use as a resource when you are searching for a specific topic related to machine learning:
We usually accept reports both in french and english (but check with the course teacher first!). However, if you are in a group with non-french speakers, you don't really have a choice. The following websites can help you to improve your English writing:
Searching and applying for an academic internship or PhDOpen positions are usually advertised on mailing lists. You should register to them, note however that you will receive many emails (use Thunderbird to deal with your emails instead of a web client).
A few websites you can also check to find PhD positions: Attach a CV when contacting a researcher. A few tips:
It is also best to have a public website with the same information. You can do something quickly with a static website generator like Hugo, Pelican or Jemdoc. If you don't want to self-host your website, you can also use free services like Cygale or Github pages. Overview of some ML for NLP research topicsThis section is intended for students who aim for a PhD after the master. Scientific research is exciting and fascinating, especially in machine learning which is a popular field these days. However, it is difficult to have a clear view of what are interesting and open research problems. I will try to list a very limited number of them that are highly biased toward what I am personnaly interested in. It is important to understand that “solving task X with machine learning” is vague and this kind of objective may limit the area in which you will contribute as a computer science researcher. This is especially true in natural language processing where obtaining good results for well studied languages is straightforward with large pretrained models like Bert and GPT. But first, the most important advice is the following: I recommend reading the How to Find Research Problems page by Jason Eisner and references therein. Structured predictionPrediction problems you will study during the master will mainly be either (binary) classification and regression. However, there are many other problems including structured prediction problems where:
One of the most basic example in natural language processing is syntactic dependency parsing. Assume you have an input sentence of \(n\) words. The goal is to predict the bilexical dependencies between words which can be represented as a spanning arborescence in a graph.
A graph based parser for this problem works as follow:
Other problems include summing over all possible spanning arborescences which is required for training via a maximum likelihood loss. As such, structured prediction is a research field that includes many computer science topics, including graph theory, formal grammars and (combinatorial) optimization:
If you are interested in structured prediction, Jason Eisner's research summary gives a good view of the field. Designing neural architecturesIn order to train and use a neural network, one must first design its architecture which can be motivated in many ways. For example, LSTMs have been designed to prevent the vanishing gradient problem and Transformer for their ability to be efficiently parallelized on GPU. However, they can also be designed to inject prior knowledge about the task so that they are more data efficient or interepretable:
Expressivity of neural architecturesComputer scientists have developed formal tools to classify the expressivity of formal languages and the hardness of problems, see for example wikipedia pages of Chomsky hierarchy and mildly context-sensitive grammar formalism for the case of formal grammars and complexity class for algorithms. Although the universal approximation theorem states that any function can be approximated by a neural network under certain conditions, researcher have been interested in better characterizations of architectures used in practice (e.g. with fixed width) and using other theoretical tools:
Efficient computationComputational power is expensive. A good GPU for machine learning cost at least 5.000 euros, but a reasonable one for today's neural architectures would be closer to 10.000 euros (e.g. NVidia v100 with 32Go of RAM). State-of-the-art models are usually trained on very large datasets with several GPUs or even TPUs. This is not sustainable and prevent the use of neural networks on small embedded devices. It is therefore important to develop more efficient models both in terms of memory and computational time. If you are interested in this subject, check Kenneth Heafield's webpage. Neural networks are usually implemented with autograd libraries like pytorch that take care of the computation of the gradient automatically. In theory, the backpropagation algorithm as a time complexity of the same order as the forward pass that was used to build the computational graph (more precisely: at most 5 times the cost of evaluating the forward function). However, in some specific cases this can be done more efficiently, for example:
Other methods to improve computation time include, among others:
Learning algorithms beyond gradient descentFirst order optimization methods are popular in machine learning as they are scalable.
Natural Language Processing for low resource languagesAlthough many modern neural methods seem to work reasonably well, most of the ~6000-7000 languages spoken today lack resources for training. Therefore, there is a growing interest in natural language processing for low resource languages. As it is too difficult to give an overview of the field in a few lines, I advise you to check the following reviews: Text generationText generation is an important problem in natural language processing that appears in many problems: machine translation, dialogue systems, paraphrase generation, text simplification, etc. The most common approach is based on autoregressive models where a sentence in generated one word after the other in sequential order. Unfortunately, computing the most probable output in this configuration is intractable, leading to the use of inexact greedy methods like beam search. A few research directions:
|