Baptiste Pras

I am a Master's student in Artificial Intelligence at University Paris-Saclay. I aim to further my studies in these fields, with the ultimate goal of pursuing a PhD in Natural Language Processing or Machine Learning. I am looking for a 3-4 months internship starting in May.

baptiste.pras[at]universite-paris-saclay[dot]fr

https://github.com/baptistepras

CV_English CV_French

Education

After completing two years in the Dual Bachelor’s program in Mathematics and Computer Science at University Paris-Saclay, where I developed a strong foundation in algebra, probability, data structures, and algorithms, as well as advanced problem-solving and computational thinking skills, I was admitted to the highly selective Magistère d’Informatique program where I had the chance to focus more on Artificial Intelligence and Distributed Algorithms. I am now pursuing a Master's degree in Artificial Intelligence, where I study machine learning, optimization, deep learning, natural language processing (NLP), probabilistic methods, reinforcement learning, and the theoretical foundations of AI, with a strong focus on research and advanced applications.

Following my passion for language and cultural immersion, I spent a year studying English in New York, where I honed my linguistic skills to achieve a near-native level of fluency. This transformative experience not only strengthened my proficiency in English but also enhanced my adaptability, cross-cultural communication, and independence.

Professional Experience

During my second research internship, I worked on the evaluation and development of models for Biomedical Entity Linking (BEL), a subfield of Natural Language Processing (NLP). My work involved performing a comparative analysis of state-of-the-art models (including Transformer-based and contrastive learning approaches) and addressing the challenge of fair performance comparison across heterogeneous training environments. I conducted detailed evaluations on public benchmarks (BELB), analyzing performance on zero-shot entities, homonyms, and other challenging cases. Additionally, I developed advanced Python scripts to automate model evaluation and I proposed new mention-level evaluation metrics based on linguistic and lexical features such as mention length, ambiguity, synonymy, and surface variation. These metrics provided new insights into model weaknesses. This work led to a research paper, which has been submitted to a venue in the ACL conference family.

During my first research internship, I studied the impact of class imbalance in the training set on classification tasks performance, aiming to identify an optimal imbalance ratio different from 0.5. I conducted experiments in Python, using Scikit-Learn and NumPy, while visualizing results with MatPlotLib. My work involved analyzing various class imbalance scenarios using a hand-made Spherical Teacher-Student Model with different loss functions and learning methods, such as gradient-based training and Langevin dynamics, as well as generating and processing different types of data, primarily Gaussian distributions.

As a Generative AI Trainer at Outlier, I played a key role in improving the performance and reliability of generative AI models. My responsibilities included designing and refining prompts to optimize model outputs and reviewing AI-generated content to ensure its accuracy and quality.

Some of my projects

DualSudoku AI Agent:
Design and implementation of an AI agent for DualSudoku, a competitive variant of Sudoku, developed as part of the Artificial Intelligence course at Université Paris-Saclay. The project was realized in Java, with a strong focus on search algorithms, optimization, and game strategy design.

I developed a custom combination of heuristic search and dynamic evaluation of game states, allowing the agent to adapt its strategy in real time based on both the current grid state and the opponent’s moves.

The AI achieved excellent results and won the final competition of the course, outperforming all other agents submitted by the class. You can explore the complete implementation and source code on my GitHub.

Spherical Teacher-Student Perceptron:
Implementation from scratch of a Spherical Teacher-Student Perceptron, using Python and Numpy (and a bit of Scikit-Learn), and using MatPlotLib for graphics. This project was carried out as part of a research internship at the LISN, supervised by François LANDES. It focuses on studying class imbalance in classification tasks using a spherical Teacher-Student Model. This approach helps analyze how imbalanced training data affects the performance in classification models. You can explore the complete implementation and source code on my GitHub.

Traffic Sign Recognition using Machine Learning:
I developed a machine learning model to accurately recognize specific traffic signs through supervised learning algorithms. The project involved extracting and pre-processing data from images, followed by training and evaluating multiple models to identify the most effective approach. Using libraries such as NumPy, Scikit-Learn, MatPlotLib, and Pandas, I efficiently managed data manipulation, model training, and visualization of results. The final model achieved a performance of over 95% accuracy. This project showcases my expertise in data preparation, model optimization, and practical applications of machine learning. You can explore the complete implementation and source code on my GitHub.

Java-like Interpreter:
Developed a custom interpreter inspired by Java, featuring support for various instructions, basic arithmetic operations, and object-oriented programming concepts such as classes and methods. Designed the language with a strong type system similar to Java’s, ensuring robust and reliable code execution. The interpreter was implemented in OCaml using tools like OCamllex and Menhir for lexical and syntactic analysis, showcasing advanced programming language theory and compiler construction techniques. You can try a simplified version here. The complete source code is available on my GitHub.

Code Execution

// Here is an example of correctly written and typed Kawa code var int x; var bool b; var paire p; var triple t; class paire { attribute int x, y; method void constructor(int x, int y) { this.x = x; this.y = y; } method int test(int n) { while n > 0 { print(n%2==0); n = n - 1; } return n; } } class triple extends paire { attribute int z; method void constructor(int x, int y, int z) { this.x = x; this.y = y; this.z = z; } } main { x = 42; b = true; p = new paire(1, 2); // new initialize the attributes of p t = new triple(1, 2, 3); // newc calls the method constructor on t if b { print(p.x); print(p.y); print(t.x); print(t.y); print(t.z); x = p.test(2); } else { print(x); } }

Console Output:

Air Hockey Game:
Dive into an engaging Air Hockey game that combines the excitement of arcade gameplay with a simple AI opponent. Developed using Python and Pygame, this game offers a dynamic player-vs-AI experience, with collision physics, paddle control, and a smooth gameplay experience. You can play the game by downloading the source code on my GitHub.