Papers

Preprints

SaulLM-7B: A pioneering large language model for law

Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, Michael Desa
paper

CroissantLLM: A Truly Bilingual French-English Language Model

Manuel Faysse, Patrick Fernandes, Nuno Guerreiro, António Loison, Duarte Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro Martins, Antoni Bigata Casademunt, François Yvon, André Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo
paper

Discrete latent structure in neural networks

Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova, André F. T. Martins
paper

Publications & Technical Reports

2025

Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain

Benno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro
Coling 2025 Workshop on Financial Technology and Natural Language Processing (FinNLP), Financial Narrative Processing (FNP), and on Large Language Models for Finance and Legal (LLMFinLegal)
paper - code

Few-shot domain adaptation for named-entity recognition via joint constrained k-means and subspace selection

Ayoub Hammal, Benno Uthayasooriyar, Caio Corro
Coling 2025 - International Conference on Computational Linguistics
paper - code

2024

A fast and sound tagging method for discontinuous named-entity recognition

Caio Corro
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing
paper - code

Building quantitative contrastive grammars from syntactic treebanks

Santiago Herrera, Ioana-Madalina Silai, Caio Corro, Bruno Guillaume, Sylvain Kahane
LLcD 2024 - Rencontre annuelle Langues & Langage à la croisé de Disciplines
abstract

Sparse logistic regression with high-order features for automatic grammar rule extraction from treebanks

Santiago Herrera, Caio Corro, Sylvain Kahane
LREC-Coling 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evaluation
paper - code - extracted rules

Régression logistique parcimonieuse pour l'extraction automatique de règles de grammaire

Santiago Herrera, Caio Corro, Sylvain Kahane
TALN 2024 - Conférence sur le Traitement Automatique des Langues Naturelles
paper

Actes de la journée d’étude sur le traitement automatique des langues frugal et la recherche d'information frugale

Caio Corro, Gaël Lejeune
proceedings

2023

Structural generalization in COGS: Supertagging is (almost) all you need

Alban Petit, Caio Corro, François Yvon
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing
paper

A dynamic programming algorithm for span-based nested named-entity recognition in \(\mathcal O(n^2)\)

Caio Corro
ACL 2023 - Annual Meeting of the Association for Computational Linguistics
paper

On graph-based reentrancy-free semantic parsing

Alban Petit, Caio Corro
TACL 2023 - Transactions of the Association for Computational Linguistics
paper

On the inconsistency of separable losses for structured prediction

Caio Corro
EACL 2023 - European Chapter of the Association for Computational Linguistics
paper

2022

Actes de la journée d’étude sur la robustesse des systemes de TAL (Robustal 2022)

Caio Corro, Gaël Lejeune
proceedings

Un algorithme d'analyse sémantique fondée sur les graphes via le problème de l'arborescence généralisée couvrante

Alban Petit, Caio Corro
TALN 2022 - Conférence sur le Traitement Automatique des Langues Naturelles
paper

Ré-ordonnancement via programmation dynamique pour l'adaptation cross-lingue d'un analyseur en dépendances

Nicolas Devatine, Caio Corro, François Yvon
TALN 2022 - Conférence sur le Traitement Automatique des Langues Naturelles
paper - slides

GPU-Accelerated Forward-Backward algorithm with Application to Lattice-Free MMI

Lucas Ondel, Léa-Marie Lam-Yee-Mui, Martin Kocour, Caio Filippo Corro, Lukáš Burget
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
paper

2021

Preventing posterior collapse in variational autoencoders for text generation via decoder regularization

Alban Petit, Caio Corro
NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications
paper

Auto-encodeurs variationnels : contrecarrer le problème de posterior collapse grâce à la régularisation du décodeur

Alban Petit, Caio Corro
TALN 2021 - Conférence sur le Traitement Automatique des Langues Naturelles
paper

2020

Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)

Caio Corro
EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing
paper

Sur l'impact des contraintes structurelles pour l'analyse en dépendances profondes fondée sur les graphes

Caio Corro
TALN 2020 - Conférence sur le Traitement Automatique des Langues Naturelles
paper - code

2019

Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming

Caio Corro, Ivan Titov
ACL 2019 - Annual Meeting of the Association for Computational Linguistics
paper - poster (landscape) - poster (portrait)

Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder

Caio Corro, Ivan Titov
ICLR 2019 - Seventh International Conference on Learning Representations
paper - poster

2018

Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar Parsing

Caio Corro
PhD thesis
pdf - slides

2017

Efficient Discontinuous Phrase-Structure Parsing via the Generalized Maximum Spanning Arborescence

Caio Corro, Joseph Le Roux, Mathieu Lacroix
EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing
paper - poster

Transforming Dependency Structures to LTAG Derivation Trees

Caio Corro, Joseph Le Roux
TAG+ 2017 - 13th International Workshop on Tree-Adjoining Grammar and Related Formalisms
paper - slides

2016

Dependency Parsing with Bounded Block Degree and Well-nestedness via Lagrangian Relaxation and Branch-and-Bound

Caio Corro, Joseph Le Roux, Mathieu Lacroix, Antoine Rozenknop, Roberto Wolfler Calvo
ACL 2016 - Annual Meeting of the Association for Computational Linguistics
paper - slides

Méthode lagrangienne pour les arborescences couvrantes avec application en traitement automatique des langues

Caio Corro, Joseph Le Roux, Mathieu Lacroix, Antoine Rozenknop, Roberto Wolfler Calvo
ROADEF 2016
link