Papers
Preprints
- SaulLM-7B: A pioneering large language model for law
Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, Michael Desa paper
- CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse, Patrick Fernandes, Nuno Guerreiro, António Loison, Duarte Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro Martins, Antoni Bigata Casademunt, François Yvon, André Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo paper
- Discrete latent structure in neural networks
Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova, André F. T. Martins paper
Publications & Technical Reports
2025
- Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain
Benno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro Coling 2025 Workshop on Financial Technology and Natural Language Processing (FinNLP), Financial Narrative Processing (FNP), and on Large Language Models for Finance and Legal (LLMFinLegal) paper - code
- Few-shot domain adaptation for named-entity recognition via joint constrained k-means and subspace selection
Ayoub Hammal, Benno Uthayasooriyar, Caio Corro Coling 2025 - International Conference on Computational Linguistics paper - code
2024
- A fast and sound tagging method for discontinuous named-entity recognition
Caio Corro EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing paper - code
- Building quantitative contrastive grammars from syntactic treebanks
Santiago Herrera, Ioana-Madalina Silai, Caio Corro, Bruno Guillaume, Sylvain Kahane LLcD 2024 - Rencontre annuelle Langues & Langage à la croisé de Disciplines abstract
- Sparse logistic regression with high-order features for automatic grammar rule extraction from treebanks
Santiago Herrera, Caio Corro, Sylvain Kahane LREC-Coling 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evaluation paper - code - extracted rules
- Régression logistique parcimonieuse pour l'extraction automatique de règles de grammaire
Santiago Herrera, Caio Corro, Sylvain Kahane TALN 2024 - Conférence sur le Traitement Automatique des Langues Naturelles paper
- Actes de la journée d’étude sur le traitement automatique des langues frugal et la recherche d'information frugale
Caio Corro, Gaël Lejeune proceedings
2023
- Structural generalization in COGS: Supertagging is (almost) all you need
Alban Petit, Caio Corro, François Yvon EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing paper
- A dynamic programming algorithm for span-based nested named-entity recognition in \(\mathcal O(n^2)\)
Caio Corro ACL 2023 - Annual Meeting of the Association for Computational Linguistics paper
- On graph-based reentrancy-free semantic parsing
Alban Petit, Caio Corro TACL 2023 - Transactions of the Association for Computational Linguistics paper
- On the inconsistency of separable losses for structured prediction
Caio Corro EACL 2023 - European Chapter of the Association for Computational Linguistics paper
2022
- Actes de la journée d’étude sur la robustesse des systemes de TAL (Robustal 2022)
Caio Corro, Gaël Lejeune proceedings
- Un algorithme d'analyse sémantique fondée sur les graphes via le problème de l'arborescence généralisée couvrante
Alban Petit, Caio Corro TALN 2022 - Conférence sur le Traitement Automatique des Langues Naturelles paper
- Ré-ordonnancement via programmation dynamique pour l'adaptation cross-lingue d'un analyseur en dépendances
Nicolas Devatine, Caio Corro, François Yvon TALN 2022 - Conférence sur le Traitement Automatique des Langues Naturelles paper - slides
- GPU-Accelerated Forward-Backward algorithm with Application to Lattice-Free MMI
Lucas Ondel, Léa-Marie Lam-Yee-Mui, Martin Kocour, Caio Filippo Corro, Lukáš Burget ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing paper
2021
- Preventing posterior collapse in variational autoencoders for text generation via decoder regularization
Alban Petit, Caio Corro NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications paper
- Auto-encodeurs variationnels : contrecarrer le problème de posterior collapse grâce à la régularisation du décodeur
Alban Petit, Caio Corro TALN 2021 - Conférence sur le Traitement Automatique des Langues Naturelles paper
2020
- Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)
Caio Corro EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing paper
- Sur l'impact des contraintes structurelles pour l'analyse en dépendances profondes fondée sur les graphes
Caio Corro TALN 2020 - Conférence sur le Traitement Automatique des Langues Naturelles paper - code
2019
- Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming
Caio Corro, Ivan Titov ACL 2019 - Annual Meeting of the Association for Computational Linguistics paper - poster (landscape) - poster (portrait)
- Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder
Caio Corro, Ivan Titov ICLR 2019 - Seventh International Conference on Learning Representations paper - poster
2018
- Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar Parsing
Caio Corro PhD thesis pdf - slides
2017
- Efficient Discontinuous Phrase-Structure Parsing via the Generalized Maximum Spanning Arborescence
Caio Corro, Joseph Le Roux, Mathieu Lacroix EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing paper - poster
- Transforming Dependency Structures to LTAG Derivation Trees
Caio Corro, Joseph Le Roux TAG+ 2017 - 13th International Workshop on Tree-Adjoining Grammar and Related Formalisms paper - slides
2016
- Dependency Parsing with Bounded Block Degree and Well-nestedness via Lagrangian Relaxation and Branch-and-Bound
Caio Corro, Joseph Le Roux, Mathieu Lacroix, Antoine Rozenknop, Roberto Wolfler Calvo ACL 2016 - Annual Meeting of the Association for Computational Linguistics paper - slides
- Méthode lagrangienne pour les arborescences couvrantes avec application en traitement automatique des langues
Caio Corro, Joseph Le Roux, Mathieu Lacroix, Antoine Rozenknop, Roberto Wolfler Calvo ROADEF 2016 link
|