Remote | Machine Learning Systems Evaluation Engineer — Up to $90/hour
Job Description:
We are sharing a specialised remote consulting opportunity for experienced machine learning engineers with strong coding agent experience, production ML judgment, and the ability to evaluate complex machine learning and AI engineering implementations across realistic technical scenarios.
This role supports current and upcoming remote consulting opportunities focused on machine learning system evaluation, coding-agent-assisted technical workflows, ML implementation review, inference system assessment, MLOps evaluation, and LLM application analysis. Selected professionals may use tools such as Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or comparable coding agents to complete, review, and evaluate technical tasks involving model training, deployment infrastructure, inference workflows, AI-powered products, and production machine learning systems.
Key Responsibilities
Professionals in this role may contribute to:
Machine Learning Implementation Review
- Use modern coding agents to complete and evaluate complex machine learning and AI engineering tasks
- Review generated implementations involving model training, inference systems, MLOps workflows, LLM applications, and AI-powered product features
- Assess technical outputs for correctness, quality, maintainability, performance, reliability, and production-readiness
- Apply professional machine learning engineering judgment to realistic technical scenarios
MLOps, Deployment & Inference Evaluation
- Evaluate ML system workflows involving model deployment, inference infrastructure, monitoring, testing, and production integration
- Review implementation choices related to scalability, latency, data flow, model serving, reliability, and system maintainability
- Identify bugs, edge cases, performance issues, failure modes, and weak assumptions in ML engineering outputs
- Provide structured feedback on MLOps design, deployment patterns, and production ML system quality
Coding Agent Output Assessment
- Compare outputs from multiple coding agents and assess their strengths, weaknesses, accuracy, and practical usefulness
- Identify where generated solutions succeed, where they fail, and where additional ML engineering judgment is required
- Evaluate whether generated machine learning implementations reflect real-world engineering standards
- Document technical review findings clearly for project teams and quality evaluation workflows
Technical Documentation & Feedback
- Produce clear, structured evaluations of machine learning engineering tasks and generated outputs
- Explain reasoning around model training, inference systems, deployment infrastructure, LLM applications, performance, and architectural trade-offs
- Support technical assessment workflows by documenting accepted work, improvement areas, and practical engineering conclusions
- Help ensure outputs reflect production-scale machine learning engineering expectations
Ideal Profile
Strong candidates may have:
- 2+ years of professional machine learning engineering experience
- Hands-on experience building production ML systems, model deployment infrastructure, LLM applications, or AI-powered products
- Regular use of AI coding agents such as Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or comparable tools
- Ability to evaluate generated machine learning implementations and identify technical trade-offs, bugs, edge cases, and performance issues
- Experience deploying ML systems to production is strongly preferred
- Strong understanding of model training, inference workflows, MLOps, data pipelines, evaluation methods, deployment patterns, and system reliability
- Clear written communication skills and comfort documenting technical reasoning in a remote, project-based environment
Educational Background
- A degree in Computer Science, Machine Learning, Artificial Intelligence, Data Science, Software Engineering, Computer Engineering, Statistics, Mathematics, or a related technical field is helpful
- Equivalent professional experience in machine learning engineering, applied AI, MLOps, LLM applications, or production ML systems is also highly relevant
Nice to Have
- Experience with Python, PyTorch, TensorFlow, scikit-learn, Hugging Face, LangChain, LlamaIndex, MLflow, Ray, or comparable ML tools
- Familiarity with model serving, feature pipelines, vector databases, embeddings, retrieval systems, LLM application architecture, or evaluation frameworks
- Experience with cloud platforms, Docker, Kubernetes, CI/CD pipelines, observability tooling, or production deployment workflows
- Background in technical code review, ML architecture review, model performance evaluation, or large-scale AI product engineering
- Strong comfort working in sprint-based project environments with focused technical assessment windows
Why This Opportunity
- Remote consulting work aligned with machine learning engineering, coding agent, and technical evaluation expertise
- Opportunity to evaluate realistic ML engineering workflows involving model training, inference systems, MLOps, LLM applications, and production AI systems
- Suitable for engineers who enjoy technical assessment, tool-assisted coding workflows, ML implementation review, and practical system-level problem-solving
- Sprint-based project work that can align with focused availability and remote schedules
Contract Details
- Independent contractor engagement
- Fully remote and flexible scheduling
- Sprint-based, project-based availability
- Some project work may run in focused 12–24 hour sprint windows depending on project requirements
- Compensation may reach up to $90/hour, depending on project scope, experience, and accepted work structure
- Some projects may use accepted-task compensation depending on the specific workflow
- Payments are made weekly via Stripe or Wise based on services rendered
- Projects may be extended, shortened, adjusted, or concluded based on project needs and performance
- Candidates requiring H1-B or STEM OPT sponsorship support are not eligible at this time
- Work must not involve sharing confidential or proprietary information from any employer, client, or institution
About the Platform
This opportunity is available through 24-MAG LLC. We connect experienced professionals with remote consulting opportunities across technical, evaluation, and project-based workstreams.
By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy.