About the Role
AI Prompt Engineer (United States - Remote)
Rex.zone is hiring for remote, full-time AI Prompt Engineer roles supporting production LLM features. You will design, test, and optimize prompts and evaluation workflows to improve reliability, safety, and measurable task success across NLP, tool use, RAG, and multimodal use cases.
About The Role
You will own prompt engineering and prompt evaluation workflows, including system prompt versioning, prompt libraries, adversarial test prompts, and A/B prompt experiments. You will partner with engineering to integrate prompts into applications (function/tool calling, RAG) and with data teams to define labeling taxonomies and QA evaluation rubrics. You will help operationalize human feedback loops (RLHF-adjacent), including preference data collection, rubric-based scoring, and prompt regression testing to prevent quality drift.
Key Responsibilities
• Design and iterate system, developer, and evaluation prompts for instruction-following, reasoning, and tool use
• Create prompt test suites for safety, policy, and content quality, including edge cases and adversarial prompts
• Define evaluation criteria and scoring rubrics for prompt evaluation, QA evaluation, and human feedback collection
• Analyze model failure modes (hallucinations, refusal errors, policy gaps) and propose mitigations and guardrails
• Collaborate on RLHF-adjacent workflows: preference ranking, rubric scoring, and dataset iteration with labeling teams
• Integrate prompts into production workflows (RAG grounding, retrieval prompts, tool schemas, guardrails)
• Document prompt standards, versioning practices, and deployment checklists
• Contribute to content safety labeling guidance and escalation procedures for sensitive content
Required Qualifications
• Experience designing prompts for LLM applications and evaluating outputs with structured rubrics
• Strong understanding of NLP concepts, LLM behavior, and common failure modes
• Ability to build evaluation frameworks (gold sets, acceptance tests, regression suites)
• Comfort working cross-functionally with engineering, product, and AI/ML data operations
• Clear writing skills for prompt guidelines, annotation compliance, and QA documentation
Preferred Qualifications
• Hands-on experience with RLHF concepts, preference data, and evaluation datasets
• Experience with RAG, embeddings, retrieval quality, and grounding strategies
• Familiarity with content safety labeling, policy taxonomies, and risk-based evaluation
• Experience with multimodal prompting (text + image) or CV-adjacent evaluation workflows
• Ability to use lightweight scripting or notebooks to analyze evaluation results
Compensation
Competitive hourly rate: $30–$50/hr (base salary).
Location: United States (Remote).