J

REMOTE::Senior Software Engineer LLM Evaluation :: AI-generated @ US ,Western Europe

Jobflarely · Anywhere

🔥21 people viewed this job

About the Role

Title: Senior Software Engineer LLM Evaluation Duration: Long term ( depends on candidates performance) Work Type: Remote ( hybrid or onsite depending on candidate s location) Multiple openings Key skills: Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go Project Overview: As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making corrections in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go; evaluating and refining AI-generated code for efficiency, scalability, and reliability; and working with cross-functional teams to enhance enterprise-level AI-driven coding solutions. What Does a Typical Day Look Like? • Working on AI model training initiatives by curating code examples, building solutions, and correcting code in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go. • Evaluate and refine AI-generated code to ensure that it is efficient, scalable, and reliable. • Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks. • Build agents that can verify the quality of the code and identify error patterns. • Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them • Design verification mechanisms that can automatically verify a solution to a software engineering task. Required Skills: • Several years of software engineering experience (+5 years), including 2+ years of continuous full-time experience at a top-tier product company (e.g., Google, Stripe, Amazon, Apple, Meta, Netflix, Microsoft, Datadog, Dropbox, Shopify, PayPal, IBM Research). • Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools. • Deep understanding of software architecture, design, development, debugging, and code quality/review assessment. • Excellent oral and written communication skills for clear, structured evaluation rationales.

Jobflarely has 2 open positions on Remote Vibe Coding Jobs.

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →