2

Remote | Machine Learning Systems Evaluation Engineer — Up to $90/hour

24-MAG · Anywhere

Full-timePythonDockerKubernetesPyTorchTensorFlowLangChain

🔥8 people viewed this job

About the Role

We are sharing a specialised remote consulting opportunity for experienced machine learning engineers with strong coding agent experience, production ML judgment, and the ability to evaluate complex machine learning and AI engineering implementations across realistic technical scenarios. This role supports current and upcoming remote consulting opportunities focused on machine learning system evaluation, coding-agent-assisted technical workflows, ML implementation review, inference system assessment, MLOps evaluation, and LLM application analysis. Selected professionals may use tools such as Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or comparable coding agents to complete, review, and evaluate technical tasks involving model training, deployment infrastructure, inference workflows, AI-powered products, and production machine learning systems. Key Responsibilities Professionals in this role may contribute to: Machine Learning Implementation Review Use modern coding agents to complete and evaluate complex machine learning and AI engineering tasksReview generated implementations involving model training, inference systems, MLOps workflows, LLM applications, and AI-powered product featuresAssess technical outputs for correctness, quality, maintainability, performance, reliability, and production-readinessApply professional machine learning engineering judgment to realistic technical scenarios MLOps, Deployment & Inference Evaluation Evaluate ML system workflows involving model deployment, inference infrastructure, monitoring, testing, and production integrationReview implementation choices related to scalability, latency, data flow, model serving, reliability, and system maintainabilityIdentify bugs, edge cases, performance issues, failure modes, and weak assumptions in ML engineering outputsProvide structured feedback on MLOps design, deployment patterns, and production ML system quality Coding Agent Output Assessment Compare outputs from multiple coding agents and assess their strengths, weaknesses, accuracy, and practical usefulnessIdentify where generated solutions succeed, where they fail, and where additional ML engineering judgment is requiredEvaluate whether generated machine learning implementations reflect real-world engineering standardsDocument technical review findings clearly for project teams and quality evaluation workflows Technical Documentation & Feedback Produce clear, structured evaluations of machine learning engineering tasks and generated outputsExplain reasoning around model training, inference systems, deployment infrastructure, LLM applications, performance, and architectural trade-offsSupport technical assessment workflows by documenting accepted work, improvement areas, and practical engineering conclusionsHelp ensure outputs reflect production-scale machine learning engineering expectations Ideal Profile Strong candidates may have: 2+ years of professional machine learning engineering experienceHands-on experience building production ML systems, model deployment infrastructure, LLM applications, or AI-powered productsRegular use of AI coding agents such as Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or comparable toolsAbility to evaluate generated machine learning implementations and identify technical trade-offs, bugs, edge cases, and performance issuesExperience deploying ML systems to production is strongly preferredStrong understanding of model training, inference workflows, MLOps, data pipelines, evaluation methods, deployment patterns, and system reliabilityClear written communication skills and comfort documenting technical reasoning in a remote, project-based environment Educational Background A degree in Computer Science, Machine Learning, Artificial Intelligence, Data Science, Software Engineering, Computer Engineering, Statistics, Mathematics, or a related technical field is helpfulEquivalent professional experience in machine learning engineering, applied AI, MLOps, LLM applications, or production ML systems is also highly relevant Nice to Have Experience with Python, PyTorch, TensorFlow, scikit-learn, Hugging Face, LangChain, LlamaIndex, MLflow, Ray, or comparable ML toolsFamiliarity with model serving, feature pipelines, vector databases, embeddings, retrieval systems, LLM application architecture, or evaluation frameworksExperience with cloud platforms, Docker, Kubernetes, CI/CD pipelines, observability tooling, or production deployment workflowsBackground in technical code review, ML architecture review, model performance evaluation, or large-scale AI product engineeringStrong comfort working in sprint-based project environments with focused technical assessment windows Why This Opportunity Remote consulting work aligned with machine learning engineering, coding agent, and technical evaluation expertiseOpportunity to evaluate realistic ML engineering workflows involving model training, inference systems, MLOps, LLM application

24-MAG has 1 open position on Remote Vibe Coding Jobs.

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →