About the Role
Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. Grafana Cloud, our fully managed observability platform, is flexible and built for scale. With Grafana Cloud's actually useful AI, organizations can see, understand, and act on all their disparate data to move at the speed of their ambitions. Today, more than 35 million users and 7,000+ customers – including Anthropic, Bloomberg, NVIDIA, Microsoft, and Salesforce – trust Grafana Labs to ensure reliability of their applications and systems, resolve incidents quickly, and optimize their telemetry to reduce noise and cost. We are a 100% remote company with 1,600+ team members across 40+ countries, and we're backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P. Morgan, CapitalG, and Lead Edge Capital. Learn more at grafana.com and follow us on LinkedIn and X.
We're scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.
You may not meet every requirement, and that's okay. If this role excites you, we'd love you to raise your hand for what could be a truly career-defining opportunity.
This is a remote position. We are looking for candidates in the EST or CT timezone in the United States or Canada.
The Opportunity:
The Session Replay squad is building a new Grafana Cloud product that helps our customers to understand what users actually experienced when something goes wrong.
Session Replay connects frontend signals (errors, performance, synthetic checks) to concrete session-level evidence, enabling faster and more confident investigation of production issues.
This team works at the intersection of:
• frontend observability,
• backend data processing and storage at scale,
• debugging workflows across products,
• privacy and access control,
• performance and cost constraints at scale.
Session Replay is still early, which means you'll help shape both what we build and how it fits into Grafana Cloud. A key part of our next phase is evolving the backend architecture of capturing sessions, including a migration toward columnar/analytical solution as a primary storage and query engine for high-volume session data.
As a company we are remote-first and global, we embrace people of different experiences and backgrounds to build diverse teams where every person brings a new perspective to the software. We are looking for Staff Software Engineers who are passionate about working with data and providing seamless experiences for our customers to join our growing team! Our stack is Golang, Typescript and React, but we build tools for people using many other stacks.
What you'll be doing:
As a Staff Software Engineer, you will operate as a technical leader and systems thinker, driving both product direction and architectural evolution, specifically, you will:
• Own end-to-end technical direction for Session Replay, spanning frontend, backend, and data systems
• Drive the evolution of our backend architecture, including:
• Designing systems around columnar/analytical data storage for large-scale session data
• Defining data models, ingestion pipelines, and query patterns
• Lead the design of investigation workflows, connecting replay with logs, metrics, traces and other telemetry across Grafana Cloud
• Make high-leverage architectural decisions that impact multiple teams and products
• Partner with teams across Grafana (Frontend Observability, Synthetic Monitoring, Core Grafana) to build cohesive cross-product experiences
• Engage directly with customers - joining calls, gathering structured feedback, and helping customers instrument and adopt Session Replay successfully.
• Improve engineering standards, patterns, and operational practices within the team
• Mentor engineers and help grow technical leadership within the team
Technologies you'll work with:
• Go (backend services and APIs)
• Columnar/Analytical data storage (core data storage and querying)
• Object storage (S3, GCS, Azure Blob Storage) and MySQL
• TypeScript / React (user-facing workflows)
• Grafana ecosystem (Mimir, Loki, Tempo, etc.)
What Makes You a Great Fit:
• You are comfortable working in a remote-first company; communication is key.
• For us, working together means being collaborative, friendly, kind, and respectful. We operate by consensus. You can contribute to a discussion, disagree constructively, and commit to the team's decision. You are able to communicate design decisions clearly in written and spoken English.
• Ability to reason about data-intensive systems (ingestion, storage, querying, cost trade-offs)
• You are comfortable owning features in ambiguous problem spaces. We are a small team, working remot