P

Data Engineer (Healthcare)

Prime Health Technologies · United States

Full-timeSeniorPythonKubernetes

🔥15 people viewed this job

About the Role

About the jobPrime Health Technologies is redefining healthcare from reactive to proactive by building the world's first AI-driven Precision Health Operating System for governments and insurers. Our platform integrates biometric data, behavioral patterns, and intelligent reasoning to deliver personalized, preventative care at national scale. We partner with governments and insurers to improve population health outcomes, reduce healthcare costs, and empower every citizen with a personal AI health companion that adapts daily by guiding movement, nutrition, sleep, and lifestyle. Your screening answers — especially years of experience — will be verified during the interview. Generic or AI-generated responses tend to be apparent under follow-up; we prefer honest, specific answers over a polished generic response. Read the entire JD. Role SummaryAs a Data Engineer, you will design, build, and operate the platform's data substrate — the pipelines, storage, governance, and analytics layers everything else stands on. You will own the data plane end-to-end, from ingestion through observability, with the architectural discipline required for regulated, sovereign-deployment work. Your primary mandate is the substrate: governed, reliable, reproducible data flows and curated datasets that downstream engineering, product, clinical, and analytics workloads rely on. As the platform matures toward pilots, the role extends naturally into MLOps (model registry, training orchestration, serving infrastructure, inference logging), applying the same architectural discipline to the model layer. This role is hands-on and production-oriented. Security and compliance are non-negotiable. Your work must align with our information security and AI governance posture (ISO 27001, ISO 42001) and support privacy obligations such as HIPAA and GDPR, per deployment jurisdiction. Key Responsibilities • Build & maintain reliable batch and (where appropriate) streaming pipelines for clinical, operational, product, and third-party data sources, including healthcare & consumer-health integrations: HL7/FHIR, REST APIs, Apple HealthKit, Android Health Connect, and governed adapters for external clinical or wellness sources. • Design data models, transformations, and storage patterns supporting analytics, reporting, AI workloads, and product features — with reproducibility as a first-class requirement (any curated dataset must rebuild deterministically from raw inputs & transformation code). • Design & operate core stores in the in-country PHI data plane (operational database, time-series store, object storage, audit logs) with encryption, access control, and lifecycle management. • Build curated, de-identified-by-default analytics datasets powering operational, regulatory, and client dashboards. • Implement & maintain PHI/PII de-identification & tokenization pipelines; support tightly controlled re-identification workflows when explicitly authorized. • Establish data quality, integrity, and observability controls (validation, reconciliation, idempotency, late-arriving data handling, lineage, monitoring, alerting) and publish quality metrics. • Deliver a discoverable metadata layer so teams can self-serve and trust datasets. • Support sovereign / regional data-residency models, keeping PHI within an approved deployment boundary while enabling derived & aggregate views in out-of-country planes. • Own pipeline observability — logging, metrics, tracing, alerting, cost & performance tuning — across the stack. • Contribute to CI/CD for data components & participate in incident response and postmortems. • Partner with engineering, product, clinical, and business stakeholders translating data needs into scalable technical solutions As the platform approaches pilot deployments, this role extends into MLOps — building the model & serving infrastructure that a future Data Scientist will rely on. This is a natural trajectory for senior data engineers (the same architectural muscle applied to model artifacts) — shown here so candidates understand the path. The actual data science work — feature design, model development, evaluation — is a separate later hire who will plug into the substrate & scaffolding you build. MLOps scope includes: • Training-job orchestration & reproducible dataset versioning • Model registry & artifact storage • Containerized model serving, routing, and shadow-deployment infrastructure • Inference logging back into the warehouse for downstream evaluation • CI/CD for model artifacts (schema validation, contract tests, automated rollouts) Required Qualifications • 7+ years in data engineering or backend engineering with significant data-pipeline ownership; substantial seniority is expected given the regulated, national-scale, and sovereign-deployment context. Prior work in healthcare, wellness, insurance, or other regulated domains • Strong SQL & Python; proven track record building reliable ETL/ELT pipelines in production. • E

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →