About the Role
Job Title: Lead / Senior QA Engineer – Agentic AI Systems WITH Langfuse , Temporal
100% Remote
Interview Mode:2 Video
6-12 MONTHS CONTRACT
We are looking for a highly skilled QA professional to build and scale a next-generation Agentic AI Quality Engineering function. This role goes beyond traditional QA—focusing on validating autonomous AI systems, designing evaluation frameworks, and ensuring high-quality outputs across multiple AI-driven products.
You will play a critical role in shaping how quality is defined, measured, and improved for agentic systems that operate with minimal human intervention.
Key Responsibilities
1. Agentic QA Strategy & Scaling
Design and scale an agentic QA model for autonomous AI systemsMove QA from human-driven validation to AI-led evaluation and continuous quality monitoringEstablish best practices for testing AI agents across lifecycle stages2. Product Quality Ownership
Own QA for 3 core AI products:
AI Contact Center solutionsAI Chat & Form-based interaction systemsAI Assistants (autonomous / semi-autonomous agents)Define quality benchmarks, SLAs, and success metrics for each productProactively identify quality gaps ahead of customer impact3. Metrics, Observability & Evaluation
Define and track performance outputs for agentic systems (accuracy, latency, resolution quality, hallucination rate, etc.)Build frameworks for:Evals & graders (LLM evaluation pipelines)Output scoring and benchmarkingContinuous feedback loopsLeverage tools like Langfuse for:LLM observability and tracingPrompt monitoring and performance analysisDebugging agent behavior in productionAnalyze:Downstream issuesProduction ticketsFailure patterns4. Automation & Testing Frameworks
Build and scale automation across:Regression testingSmoke testingEnd-to-end agent workflowsDevelop and maintain Playwright-based automation scriptsIntegrate QA into CI/CD pipelines for continuous validation5. Agentic Testing & Validation
Design testing approaches for:Multi-step agent workflowsContext retention and reasoningTool usage by agentsWork with orchestration frameworks like Temporal to:Validate long-running workflowsTest retries, state transitions, and failure handling in agent pipelinesAccount for non-deterministic behavior in AI systemsInvest additional effort in agentic validation, recognizing higher complexity vs traditional QA6. Continuous Improvement & Innovation
Define frameworks to predict and prevent failures before customer exposureContinuously improve QA processes using AI and automationPartner with Product, Engineering, and AI teams to improve system qualityRequired Skills & Experience
5–10+ years in QA / Quality Engineering, with strong automation experienceHands-on experience with:Test automation tools (Playwright preferred)API and system testingStrong understanding of:AI/ML systems (LLMs, conversational AI preferred)Evaluation frameworks and benchmarkingExperience with:Temporal (workflow orchestration, stateful systems testing)Langfuse (LLM observability, tracing, and evaluation)Experience in:Building QA frameworks from scratchWorking with production data, logs, and issue triagingGood to Have
Experience with LLM eval frameworks, prompt testing, or AI red-teamingFamiliarity with agentic architectures / autonomous systemsExposure to observability and analytics platformsWorking Model
Prefer candidates with EST time zone overlapAbility to work closely with global product and engineering teamsWhat Success Looks Like
A scalable, automated QA system for agentic productsMeasurable improvement in AI output quality and reliabilityReduced production issues and faster detection of failuresQA evolving from reactive testing to proactive quality intelligence