Applied AI, Evaluation Engineer job opportunity at Mistral AI.



Date2026-01-21 bot
Mistral AI Applied AI, Evaluation Engineer
Experience: 3-years
Pattern: Full-time
apply Apply Now
Salary:
Status:

Evaluation Engineer

Copy Link Report
degreeOND
loacation Paris, France
loacation Paris....France
Auto GPT Summarize Enabled

<p><strong style="font-size: 18px;">About Mistral</strong></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life. </span></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work.</span></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">We are a dynamic, collaborative team passionate about AI and its potential to transform society.</span></p> <p><span style="font-size: 16px;">Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore. We are creative, low-ego and team-spirited.</span></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on&nbsp;</span><a href="https://mistral.ai/careers" style="font-size: 16px;" class="postings-link">https://mistral.ai/careers</a><span style="font-size: 16px;">.</span></p> <p>&nbsp;</p> <p><strong style="font-size: 18px;">About The Job</strong></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">The Applied AI team is Mistral's customer-facing technical organization. We work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact. Our team combines deep ML expertise with strong customer engagement skills, operating like startup CTOs who own end-to-end project execution.</span></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">However, the AI graveyard is full of great ideas nobody could measure or prototypes that never made it to production. <strong>As a first Evaluation Engineer</strong>, you'll design the methodology, build the infrastructure, and define what "ready for production" means across verticals and use cases.</span></p> <p>&nbsp;</p> <p><span style="font-size: 16px;"><strong>You will design and implement evaluation systems</strong> that help our customers understand model performance across their specific use cases, build robust evaluation infrastructure, and work closely with both research and customer-facing teams.</span></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">Research builds evals for frontier capabilities but customers don't care about MMLU scores. <strong>We need in Applied AI evals and frameworks for customer reality domain-specific, risk-aware, production-grade</strong>. The kind that tell you whether your medical summarization model will hallucinate drug interactions, or whether your legal assistant will invent case citations.</span></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">This role sits at <strong>the intersection of research, engineering, and solutions</strong>, you will play a critical cross role in measuring, understanding, and improving the capabilities of our models for our enterprise customers.</span></p> <p>&nbsp;</p> <p><strong style="font-size: 18px;">What you will do</strong></p> <p>&nbsp;</p> <p><strong><span style="font-size: 16px;">- Design and implement comprehensive evaluation frameworks</span></strong><span style="font-size: 16px;"> to measure LLM capabilities across diverse customer use cases, including text generation, reasoning, code, and domain-specific applications</span></p> <p><span style="font-size: 16px;">-</span><strong><span style="font-size: 16px;"> Build scalable evaluation infrastructure and pipelines </span></strong><span style="font-size: 16px;">that enable rapid, reproducible assessment of model performance</span></p> <p><span style="font-size: 16px;">-</span><strong><span style="font-size: 16px;"> Develop novel evaluation methodologies </span></strong><span style="font-size: 16px;">to assess emerging capabilities or verticalized use cases (cybersecurity, finance, healthcare, etc.) and enable the Solutions (Deployment Strategist and Applied AI) on these topics.</span></p> <p><strong><span style="font-size: 16px;">- Create custom evaluation suites </span></strong><span style="font-size: 16px;">tailored to enterprise customers' specific needs, working closely with them to understand their requirements and success criteria</span></p> <p><span style="font-size: 16px;">-</span><strong><span style="font-size: 16px;"> Collaborate with research teams </span></strong><span style="font-size: 16px;">to translate evaluation insights into model improvements and training decisions</span></p> <p><strong><span style="font-size: 16px;">- Partner with product teams</span></strong><span style="font-size: 16px;"> to continuously improve our evaluation tooling based on customer feedback</span></p> <p>&nbsp;</p> <p><strong style="font-size: 18px;">How We Work in Applied AI</strong></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">- We care about people and outputs. </span></p> <p><span style="font-size: 16px;">- What matters is what you ship, not the time you spend on it</span></p> <p><span style="font-size: 16px;">- Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to. The best idea wins, whether it comes from a principal engineer or someone in their first week.</span></p> <p><span style="font-size: 16px;">- Always ask why. The best solutions come from deep understanding, not from copying what worked before</span></p> <p><span style="font-size: 16px;">- We say what we mean. Feedback is direct, timely, and given because we care. </span></p> <p><span style="font-size: 16px;">- No politics. Low ego, high standards.</span></p> <p><span style="font-size: 16px;">- We embrace an unstructured environment and find joy in it.</span></p> <p>&nbsp;</p> <p><strong style="font-size: 18px;">About you</strong></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">- You are fluent in English</span></p> <p><span style="font-size: 16px;">- 3+ years of experience in ML evaluation, benchmarking for LLM or agentic systems</span></p> <p><span style="font-size: 16px;">- You have proven experience in AI or machine learning product implementation with APIs, back-end</span></p> <p><span style="font-size: 16px;">- You have deep understanding of concepts and algorithms underlying machine learning and LLMs</span></p> <p><span style="font-size: 16px;">- You have strong technical coding skills in Python</span></p> <p><span style="font-size: 16px;">- You hold strong communication skills with an ability to explain complex technical concepts in simple terms with technical and non-technical audiences</span></p> <p>&nbsp;</p> <p><strong style="font-size: 16px;">Ideally you have:</strong></p> <p>&nbsp;</p> <p><span style="font-size: 16px;">- Contributions to open-source evaluation frameworks (e.g., LM Eval Harness, OpenAI Evals) or published research on LLM evaluation</span></p> <p><span style="font-size: 16px;">- Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect or Technical Product Manager</span></p> <p><span style="font-size: 16px;">- Experience with ML frameworks (PyTorch, HuggingFace Transformers)</span></p> <p>&nbsp;</p> <p><strong style="font-size: 18px;">Benefits</strong></p> <p>&nbsp;</p> <p><span style="font-size: 11pt;">🏝️ </span><strong style="font-size: 11pt;">PTO</strong><span style="font-size: 11pt;">: The CDI contract will be a "Forfait 218 jours", corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours</span></p> <p><span style="font-size: 11pt;">⚕️ </span><strong style="font-size: 11pt;">Health</strong><span style="font-size: 11pt;"> : Full health insurance coverage for you and your family</span></p> <p><span style="font-size: 11pt;">🚗 </span><strong style="font-size: 11pt;">Transportation</strong><span style="font-size: 11pt;"> : We offer a €600 annual mobility allowance. This package covers 50% of your public transportation costs and includes the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling.</span></p> <p><span style="font-size: 11pt;">🥕 </span><strong style="font-size: 11pt;">Food</strong><span style="font-size: 11pt;"> : Swile meal vouchers with 10,83€ per worked day, incl 60% offered by company</span></p> <p><span style="font-size: 11pt;">🏀 </span><strong style="font-size: 11pt;">Sport</strong><span style="font-size: 11pt;"> : Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose)</span></p> <p><span style="font-size: 11pt;">🐤 </span><strong style="font-size: 11pt;">Parental policy</strong><span style="font-size: 11pt;"> : 4 additional weeks for parents on top of what is offered by the French state.</span></p> <p>&nbsp;</p> <p><span style="font-size: 11pt;">By applying, you agree to our&nbsp;<a href="https://legal.mistral.ai/terms/applicant-privacy-policy">Applicant Privacy Policy</a>.</span></p>\n<p></p><p><br></p><p></p>\n

Other Ai Matches

Solution Operations Manager, People Growth Applicants are expected to have a solid experience in handling People Growth related tasks
AI Developer Advocate - Singapore Applicants are expected to have a solid experience in handling Job related tasks
AI Engineer, Product Applicants are expected to have a solid experience in handling Product related tasks
Software Engineer, Deployment Infrastructure Applicants are expected to have a solid experience in handling Deployment Infrastructure related tasks