About the Role
About us
Ruby Labs is a leading tech company that creates and operates innovative consumer products. We offer a diverse range of opportunities across the health, education, and entertainment industries. Our innovative teams are driving the future of consumer-led products, and we're always looking for passionate individuals to join us. Learn more about our story at: https://rubylabs.com/about-us/
About the role
At Ruby Labs we are looking for a Senior AI Engineer to own and drive the quality, reliability, and evolution of our AI systems in production.
This is a high-ownership role. You will be responsible for end-to-end delivery of major AI features, production stability of AI systems, and data-driven experimentation using tools like Langfuse, Mixpanel and OpenRouter. You'll work in a modern stack built on Next.js, TypeScript, Node.js, and Redis, collaborating closely with product, growth, data, and billing teams. Increasingly, this includes building agentic, tool-using AI systems — defining clean tool contracts (including MCP-based tools) and orchestrating how AI interacts with internal services and business systems.
Our engineering organization uses a squad-based structure. You will operate within an AI engineering squad, contributing as a senior technical voice and driving engineering quality within your area of the product.
Key Responsibilities
•
Take complete ownership and deliver major AI engineering features within agreed timelines.
•
Own AI output quality, structure, and predictability across all user-facing AI interactions.
•
Design, implement, and maintain output-type-based AI systems, including segmentation, routing, and enforcement.
•
Ensure consistent output structure and formatting across different LLMs for the same request type.
•
Integrate and orchestrate multiple LLM providers via OpenRouter, managing model selection, fallback strategies, and cost optimisations.
•
Design and orchestrate tool-using and agentic AI workflows, defining clean tool contracts (including MCP-based tools), function-calling interfaces, and reliable AI-to-system integrations.
•
Build and maintain complex, multi-step LLM workflows, including with orchestration frameworks such as LangChain or LlamaIndex, for advanced reasoning, context reuse, and retrieval.
•
Design and manage production prompt systems with dynamic prompting, context injection, and conditional logic.
•
Own the deployment and release of LLM experiments, prompt management, and Langfuse-based evaluation pipelines.
•
Run A/B tests across models, analyse results, and present data-driven impact assessments of AI features and experiments.
•
Monitor AI system metrics, quality signals, latency, and release health using Langfuse and other observability tools.
•
Deep-debug complex LLM chains using Langfuse traces, identifying bottlenecks and optimising for cost, latency, and context-window usage, and build output-scoring systems to root-cause hallucinations and logic errors.
•
Write clean, scalable, and maintainable TypeScript code across the Next.js and Node.js stack.
•
Build reliable backend logic for AI systems, with strong error handling, request validation, fallback flows, and predictable behaviour in production, including reliable tool execution and AI-to-service integrations.
•
Ensure high code quality through testing, code reviews, and clear engineering standards.
•
Monitor, troubleshoot, and improve production performance, reliability, and system health.
•
Drive maintainability and technical quality through solid architecture, refactoring, and disciplined release practices.
Qualifications
•
6+ years of backend/full-stack software engineering experience, including production-grade TypeScript/Node.js. Experience with Next.js and/or Python is a plus.
•
2+ years of experience building AI/LLM systems in production. Less experience may be considered for exceptional candidates.
•
Deep hands-on experience working with LLM APIs (OpenAI, Anthropic, or similar) in production environments.
•
Experience with Agentic AI, multi-agent orchestration, tool-based workflows (function calling/tool execution), and/or RAG pipelines, including indexing, retrieval, and re-ranking.
•
Experience with LLM observability tools such as Langfuse, LangSmith, or similar platforms.
•
Experience with AI gateways and model routing solutions, such as OpenRouter or equivalent technologies.
•
Solid understanding of Redis and relational databases, such as PostgreSQL.
•
Exceptional ownership mindset and personal responsibility for engineering quality and delivery.
Nice to have
•
Experience with AI-centered development tools such as Cursor, Claude Code, Windsurf, or similar platforms.
•
Familiarity with evaluation frameworks, including LLM-as-a-judge, RAGAS, or similar approaches.
•
Experience working in high-pressure startup environments with rapid product iteration cycles.
•
Experience with MCP (Model Context Protocol), including b