About the Role
Start Date: ASAPRole Type: Full-Time, SalariedBackground: Software developmentLocation: Remote, Flexible (USA based)Salary: $170,000-$200,000 per year, plus benefitsWho We Are:The Modern Classrooms Project (MCP) is a 501(c)(3) nonprofit organization that empowers educators to build classrooms that respond to every student's needs. Founded by two award-winning teachers, we lead a movement of educators in implementing a self-paced, mastery-based instructional model that leverages technology to foster human connection, authentic learning, and social-emotional growth.To date, we have reached over 100,000+ teachers through our free online course in 150+ countries. We are an ambitious, idealistic team led by former classroom teachers, and we are passionate about what we do.
Job Description - Why we need you! Effective instructional videos make high-quality instruction accessible to all learners, regardless of experience or background. Every day, in classrooms around the world, Modern Classroom educators replace live lectures with instructional videos so that students can learn at their own paces, in school and/or at home. Good videos enhance learning — and they are time consuming to produce. A single high-quality lesson video can take hours to plan, record, edit, and caption.We need an experienced, hands-on, AI-native engineer to build a brand new, state-of-the-art generative pipeline that turns specifications into high-quality instructional videos — complete with animations, synchronized AI narration, captions, and automated ground-truth quality verification. You will own the video render path end to end, from the canonical specification to the final rendered output, creating intuitive, powerful tools that will directly support educators and students every day.
Key Responsibilities
As our AI Video Generation Architect, you will be a senior individual contributor on our Engineering Team, reporting to the Head of Engineering and collaborating closely with the Chief Innovation Officer to ship features that make a real difference for students and educators.You'll be joining a small and growing team of talented software engineers working together to solve the problems teachers and students face every day. We're building a world where every student can succeed, and we need you to help us make that happen.You will:
• Architect the video generation pipeline end to end. Design the gen-AI pipeline that transforms lesson specifications into storyboards, scene graphs, scripts, and production plans. Every stage emits deterministic lessons-as-code and structured intermediate artifacts — scene specs, asset manifests, timing maps — that can be inspected, versioned, cached, diffed, and selectively re-rendered.
• Ship multiple substantial features per week. This is a minimum velocity bar, not an exaggeration. You will leverage AI and agentic coding to build incredible software, very, very quickly.
• Build the multi-agent production workflow. Develop agentic orchestration (LangGraph or equivalent) in which an orchestrator delegates to specialist agents: pedagogy analyst, instructional scriptwriter, slide designer, animator, narrator, and a panel of graders, evaluators, and LLM judges, with structured outputs and human-in-the-loop labeling and fine-tuning.
• Engineer the video generation pipeline. Build brand-consistent, design-system-driven video generation from structured content: layout engines and templates, LaTeX/KaTeX mathematical typesetting, programmatic diagrams and charts, and text-faithful image generation for illustrations with automated readability checks. Design programmatic motion to support worked examples with narration: kinetic typography, transitions, animated number lines and area models. Run parallelized rendering with generative video models (e.g. Veo / Kling / Seedance). Narration with TTS (e.g. ElevenLabs v3 / Gemini-TTS) audio tags and SSML, pronunciation lexicons for mathematical vocabulary, consistent voice identities across a course, multi-voice dialogue, multilingual narration, and open license music embeds.
• Build the ground-truth quality system. Construct golden datasets of spec-to-video pairs annotated by educators. Implement rubric-based scoring with calibrated LLM- and VLM-as-judge evaluators: frame-level visual fidelity, verification of on-screen mathematics, A/V sync validation, pedagogical fidelity checks against the source spec's learning objectives, reading-level analysis, and K-12 content safety screens. Symbolically verify every worked example with a computable ground truth verification system — if the video teaches 3/4 + 1/8, a machine learning model should independently confirm the answer before any student sees it.
• Architect resilient, high-scale media infrastructure. Design and scale the distributed backend across Python and TypeScript that carries the pipeline: render queues and job orchestration, transcoding and streaming (HLS), and provenance-aware metadata for AI-ge