Anthropic

Research Engineer, Search and Knowledge Post-Training

Anthropic · Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY

Full-timeStaff+Python

🔥12 people viewed this job

About the Role

About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role We want future AI systems to have superhuman epistemics: the ability to parse evidence at enormous scale and draw rigorous conclusions for both itself and the user. Search is the capability that determines whether a model can pick a signal out of noise, weigh conflicting evidence, and know what it doesn't know. Every higher-order capability we care about depends on search being trustworthy. If we want Claude to be a trustworthy collaborator on real knowledge work, it has to be a trustworthy searcher. We're hiring a Research Engineer to advance the science and engineering that goes into making Claude this trustworthy searcher. This is a research role for someone who is unusually rigorous: you'll define hypotheses about what makes a model an epistemically sound searcher, design the experiments that test them, and turn search post-training from a craft into a measurable science. You'll be the person who insists on cleanly isolated variables, calibrated metrics, and reproducible signal, while also having the engineering skill to build the infrastructure necessary to get them. This work sits at the intersection of reinforcement learning, retrieval, and evaluation, and it directly shapes how Claude behaves in any setting where evidence matters: research, analysis, agentic workflows, and beyond. What you'll do • Own a research direction for a class of search post-training problems end-to-end: form hypotheses about latent capabilities, design experiments that isolate them, run training, and decide what to try next. • Build the instrumentation that turns environment design into a controlled experiment so we can study how each environment factor contributes to the capabilities we care about, rather than overfitting to any one regime. • Design frontier-discriminating evaluations that distinguish genuine reasoning over evidence from plausible pattern matching and that hold up as models improve. • Drive optimization rigor across the stack: efficient experiment design, ablations, training run economics, and the discipline to know when a result is real. • Collaborate deeply with researchers across post-training, RL infrastructure, and product to translate model behavior in the wild into concrete training signals and back again. • Set the bar for the team's experimental standards — what we measure, how we measure it, how we know a result is real. Minimum (must-have) • Have an unusually rigorous, quantitative mindset • Are an outstanding software engineer in Python, comfortable across the stack from data pipelines to RL training to evaluation infrastructure • Have shipped real ML research repeatedly, with taste for which experiments are worth running. • You instinctively reach for ablations, controls, and confidence intervals to understand why • Operate well with high autonomy and ambiguity and can identify the most impactful problem to work on next without being told • Want to set research direction, advocate for experimental rigor, and raise the bar for the people around you • Communicate research clearly in writing and in person; you can defend a design choice and update on evidence Preferred (nice-to-have) • Hands-on experience with RL on large language models — environments, reward design, training stability, scaling behavior. • Background in search, retrieval, RAG, or agents that reason over external information sources. • Experience building evaluations for open-ended or knowledge-intensive LLM behavior • Prior work in a research-heavy environment — frontier AI lab, quant research firm, or similarly demanding empirical setting — where rigor is the default. • Published research on LLMs, RL, retrieval, calibration, or related topics. • Experience with distributed training systems and large-scale experimentation infrastructure. Representative projects • Designing a controlled-noise search environment where you can dial up failure rates, conflicting sources, and adversarial content independently — and using it to characterize how each factor shapes the policy a model learns. • Building an evaluation suite that distinguishes calibrated source judgment from confident-sounding guesswork, and that stays discriminating as models get The annual compensation range for this role is listed below. For sales roles, the range provided is the role's On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary:$500,000—$850,000 USDLogistics Minimum education: Bachelor's degree or an equivalent combination of education, training, and/or experience
AI Safety / LLM500-1000 employeesSan Francisco, CAFounded 2021💰 Series E

Anthropic PBC is an American artificial intelligence (AI) company headquartered in San Francisco. It has developed a family of large language models (LLMs) named Claude. Anthropic operates as a public benefit corporation, which researches and develops AI to "study their safety properties at the technological frontier" and use this research to deploy safe models for the public.

PythonPyTorchJaxTypeScriptReact
Competitive salary · Equity · Health/dental/vision

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →