M

Swift Engineer (5+ YOE) – AI / LLM Code Evaluation (Remote, Contract)

Mercor · Anywhere

🔥14 people viewed this job

About the Role

Company: Mercor. Type: Contract (Full-time or Part-time). Location: Remote (Worldwide). Language: Professional English required. Compensation: - USD $30 – $90\/hour (depending on experience & evaluation performance). - Weekly payments via Stripe or Wise. - Flexible workload (project-based, scalable hours). Mission: - Work directly with leading AI teams to improve how large language models reason about code, systems design, and technical problem-solving. - You will evaluate and refine AI-generated responses, making them more accurate, reliable, and aligned with real-world engineering standards. Responsibilities: - Evaluate AI-generated answers to coding and system design problems. - Execute and validate code outputs. - Identify bugs, inefficiencies, and incorrect reasoning. - Assess code quality & readability. - Assess algorithmic correctness. - Assess system design logic. - Annotate responses with structured, actionable feedback. - Follow defined evaluation frameworks and quality benchmarks. Required Skills: Core: - Swift (expert level). - Software Engineering (5+ years). - Data Structures & Algorithms. - Systems Design. - Debugging & Code Review. - Problem Solving (Medium–Hard level). Technical: - Code Execution & Testing. - API Design & Backend Logic. - Performance Optimization. - Version Control (Git). AI \/ Evaluation Context: - Experience using LLMs in development workflows. - Ability to evaluate reasoning, not just outputs. Nice-to-Have Skills: - RLHF \/ AI Model Evaluation. - Competitive Programming. - Open-source contributions (merged PRs). - Multi-language experience (Python, JS, etc.). - Technical writing \/ explaining complex concepts. Ideal Candidate: - Degree in Computer Science or related field (BS\/MS\/PhD). - Strong real-world engineering background. - Detail-oriented and highly analytical. - Comfortable identifying subtle logic flaws and edge cases. - Able to work independently in async environments. What You Will Achieve: - Improve the quality and reasoning of AI-generated code. - Influence how AI systems assist developers globally. - Deliver high-quality evaluation outputs that directly impact model performance. Location: Remote - Anywhere Skills required for this job: • AI model evaluation • API design • Algorithm development • Code review • Data science • Data structures • Debugging • Git • JavaScript • Large language model (LLM) • Performance optimization • Python • Reinforcement learning from human feedback (RLHF) • Software engineering • Swift • Systems design • Technical writing • Testing

Mercor is an American artificial intelligence (AI) hiring startup that provides experts to train AI models and chatbots. The company's three founders became the youngest self-made billionaires in 2025. The company is headquartered in San Francisco.

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →