T

LLM Evaluation Engineer

ThirdLaw Molecular · Anywhere

Full-timePythonPyTorchTensorFlowLangChainOpenAI

🔥10 people viewed this job

About the Role

Job Description: • Build the evaluation layer in the ThirdLaw platform for LLM prompts and responses • Design and tune guardrails, classifiers, and semantic judgment systems in real-time • Implement evaluation strategies with semantic similarity, foundation model scoring, and rule-based systems • Integrate model outputs with downstream enforcement actions (e.g. redaction, escalation, blocking) • Prototype, tune, and productize small language models for classification, labeling, or scoring • Collaborate with data infrastructure engineers to connect evaluation logic with ingestion and storage • Build tools to observe, debug, and improve evaluator performance across data distributions • Define abstractions for reusable evaluation components that can scale across use cases Requirements: • 7+ years of experience in ML systems or AI engineering roles • At least 1–2 years working directly with LLMs, NLP pipelines, or semantic search • Deep understanding of foundation models (e.g. OpenAI, Claude, Mistral, Llama) and APIs • Hands-on experience with vector search (e.g. FAISS, Qdrant, Weaviate) and embeddings pipelines • Proven ability to implement real-time or near-real-time evaluation logic using semantic similarity, classifier scoring, or structured rules • Strong in Python, with familiarity using libraries like Hugging Face Transformers, LangChain, and PyTorch or TensorFlow • Ability to reason about model behavior, test prompt configurations, and debug complex decision logic in production Benefits: • Generous benefits • Market cash compensation • Above-market equity • Well-designed benefits

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →