K

AI-DNA, SVP, SaaS Operations

Khoros · United States

Full-timeLeadAWS

About the Role

By the time a human engineer reaches the incident, the AI agents on the team have already validated hypotheses against years of prior incidents and RCAs, parsed the logs and code paths, identified the failure pattern, and proposed — or applied — a remediation inside policy guardrails. That is the operating model you will lead: agentic SRE at the scale the world's largest brands run their customer communities and social engagement on, where reliability translates directly into retention, revenue, and customer trust. The team is small and senior — top 1% engineers whose product is the AI agent surface, with no token or tooling limits — and you will lead it the way the field demands right now: treating every customer's uptime as if it were your own business, holding the bar when an agent tries to skip a step, defining the playbook the rest of the industry is still circling. What You Will Be Doing • Owning platform reliability and customer-experience outcomes for an AI-native community and social engagement platform — uptime trending up, MTTR trending down week over week, customer satisfaction trending up. As the owner of the outcome, you will be the senior leader involved in critical situations (CritSits) with customers and the face of those incidents and escalations. • Designing, governing, and continuously extending the AI agent system that does the operations work — pre-triage, alarm authoring, blocking change-validation gates, permanent-fix lifecycle chase, customer-facing RCA drafting, auto-healing on whitelisted operations. The harness is the product; the team's output is the agent surface. • Leading a small, senior, fully remote team of SRE / SaaS / DevOps engineers across multiple time zones — no L1/L2 tier; every engineer ships agents, writes runbooks, and owns the incident loop end to end. • Staying hands-on yourself — driving outage bridges when the blast radius warrants it, writing RCAs, shipping agent code. A substantial share of your week goes to personal agent build and maintenance. • Representing operations directly to enterprise customer leadership and to the CEO — translating reliability investment into retention, revenue, and customer-experience outcomes. What You Will NOT Be Doing • Growing the team to solve problems that AI should be solving. If you find yourself adding headcount to compensate for agent gaps, you are off-strategy. Fix the AI, not the org chart. • Running a deck-and-roadmap executive function. You will be at the keyboard, on the bridge, and in the agent code regularly. If you have not personally written or shipped production code in the last 12 months, the role will be uncomfortable. • Owning product engineering or customer support. Your scope is the operating layer — where reliability and the AI agent surface live. • Drowning in tickets. The point of this role is to remove ticket-throughput as the primary operating metric, not optimize it. • Sitting on every bridge call. You will drive the bridge personally when a top-tier customer is down or the blast radius is material. The rest of the time, the operator tier handles incident execution and the AI surface absorbs the routine. • Treating this role as a credential or a résumé line. The bar is ownership and obsession, not titles. Responsibilities • Deliver the reliability outcomes — and own them like a founder accountable for them. Platform uptime trending up, MTTR trending down week over week, customer satisfaction trending up. When something is down, that is your problem; you do not sleep peacefully through a customer outage. • Own enterprise-customer escalations. Be the executive face of operations when a customer at the largest tier needs one, and engineer the operating system so escalations keep falling. Customer trust is the metric — measured in retention and contract expansion. • Set and enforce AI agent quality and governance. Every agent in production has a defined scope, a measured acceptance rate, an escape hatch, and a guardrail anchored to documented failure modes. You hold the bar — including against agents that propose to skip a step. • Recruit, develop, and retain a top-1% senior-only team. No L1, no L2 — every engineer is a pioneer in their craft. Recruiting looks more like courtship than triage. • Shape the operating model and the playbook. Where AI does more, where humans must stay, what the agent surface looks like 6 and 12 months from now. The industry playbook for agentic SRE is being written right now — you write a meaningful part of it. • Partner with product engineering and customer success peers. The operating layer is the hinge between them; reliability outcomes depend on those interfaces working. Requirements • Extreme ownership. You run operations as if it were your own business — your money, your reputation, your customers. Expect to bring that same intensity here. • 10+ years operating SaaS at meaningful scale, with at least 3 years in an SVP, VP, or Head-of role managing a senior-only engineer

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →