About the Role
AI Prompt & Agent Engineer | Prompt Engineering / LLMs / Evals / Voice AI | Must have startup experience
Location: San Francisco, CA (Onsite 5 days p/w — open to relocation) Package: $90,000 – $160,000 + meaningful equity Eligibility: Open to candidates with existing work authorization — US Residents Only (no visa sponsorship now or in the future after your current visa expires!)
🚨 Please only apply if you have hands-on, commercial experience with ALL of the following 🚨
• Prompt engineering for production LLM systems (not just experimentation)
• Owning agent behaviour in a deployed, real-world product
• Building evals / eval harnesses to ship changes safely
• Working autonomously across prototype → production
• Startup / scale-up experience
• Reading Python fluently and TypeScript comfortably (you don't need to be a senior engineer)
Join a fast-moving AI company building real-time, human-quality voice agents at enterprise scale — on a fully proprietary stack. Custom orchestration layer, fine-tuned speech models, real-time voice infrastructure, all built in-house rather than wrapping a third-party API. This is one of the rare environments where you work at the genuine frontier of applied AI, owning the models and behaviour end-to-end.Demand is outpacing engineering capacity. Join now and you'll own a distinct slice of agent behaviour that touches every customer call — with a direct, measurable line between your work and the company's growth. Early team, high ownership, real momentum.
The RoleYou'll own behavioural slice(s) of live voice agents — writing the prompts that run in production, designing subagent architectures, building the evals that let the team ship multiple times a day without breaking anything. You'll live in the data: every win is small, measurable, and shipped fast.
What You'll Be Doing
Prompt & Agent Ownership
• Write and maintain production prompts: intent classification, information extraction, negotiation flows, closing phrases, verification flows, objection handling, edge-case recovery
• Design subagent architectures and own behaviour shared across every deployment
Evals & Quality
• Build offline eval sets and evaluation harnesses
• Run automated prompt optimisation and establish test suites that protect live deployments
• Simulate real-world scenarios, monitor production performance, and catch drift early
Onboarding & Iteration
• Ship against real call data daily — review failures, deploy fixes, refine
• Work with internal AI agents that translate customer configs into live agent setups
• Design and improve evaluation metrics for new customers as they come online
Must-Haves
• Proven prompt engineering at an AI-native company OR strong linguistics / philosophy / logic-heavy training (e.g. law)
• Evidence you've shipped prompts that broke in production — and learned the difference between dev and real users
• Relentlessly data-driven: you justify changes with numbers ("moved booking rate from 78% to 81% on 400+ calls"), not opinions
• Meticulous and organised under pressure — comfortable with five things in flight
• Strong writing sensibility: you notice register, rhythm, and word choice
• Minimum 1+ year of full-time work experience
Bonus Experience
• Prior work with voice AI, TTS, ASR, or telephony platforms
• Experience analysing large eval datasets for extended periods
• Customer-facing or operations background
• Strong analytical / quantitative track record
ℹ️ Very Important Notes
• This is not a pure software / AI engineering role — heavily eng-focused profiles aren't the right fit
• This is not for recent grads with no formal work experience
• We're prioritising permanent, load-bearing experience over contract / contingent history
• Onsite in SF, 5 days p/w — please only apply if that works for you
If you've got the prompt-engineering chops, the eval discipline, and the startup mindset to own agent behaviour at the frontier — get in touch for a fast response. Let's go. 🚀