About the Role
Full Time Role & On-Site Interview: AI Prompt Engineer/ ML Engineer _ San Francisco, CAOpenings: 4 EngineersFace to Face InterviewLocations: San Francisco (222 Columbus Ave, San Francisco, CA 94133)In-office min 5 days a weekRelocation assistance: Yes
TECH STACK: Python, TypeScript, ReactACCEPTABLE TECH: Python, AI, ML
IMPORTANT: Must Answer Pre-screen Questions for SubmissionHave you worked on AI system used by 1,000+ people? - Has the candidate experienced the pain and urgency of understanding and adapting prompts in response to real user feedbackProjects in prompt/context engineering, data curation, evals in productionVibe coding experience - Public URL?
Must Have:
• 2+ years in prompt engineering, data curation, evals in production
• Strong analytical and problem-solving mindset; comfort with ambiguity
Must NOT Have:Heavy ML focus, NOT prompt engineering experience
Strongly Preferred (Positives):If non-native speaker - Experience as translator or linguistics background and American cultural norms (e.g., date formats: month/day/year)Can Vibe code - Python chops beyond reading: APIs, data pipelines, testing frameworksPrior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)Contact center, SaaS, or customer-facing tech backgroundHealthcare or medical operations experience — you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental schedulingAutomated prompt optimization experience (DSPy, GEPA, MIPROv2)Fine-tuning experienceBachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field
Unlikely to Hire (Negatives)Foreign English speakers lack the required American English dialect familiarity and sufficient US based work experience
INSIDER SCOOP2026-05-18Expect 60+ hours/week for nowSF Office: 222 Columbus Ave, San Francisco, CA 94133In office in SF 5 days/weekOnly US Citizens or Green Card holders
Tech Skills: Prompt or Context Engineering, Data Curation, Evals (evaluation)Prompt engineering with ownership of prompt quality - impact and outcomeCustomer-facing AI interaction responsibility preferredIf non- native speaker, Experience as translator or linguistics background helpfulStrong English language skills required
JOB DESCRIPTION We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You're someone who enjoys looking at the data because the data informs everything else. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time.
Responsibilities
• Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call.
• Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. You'll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work.
• Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments.
• Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones.
• QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast.
What we're looking for
• You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users.
• You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things.
• Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. H