Remote | AI/ML Technical Evaluation Consultant — $60–$90/hour

  • New York, New York, United States
  • Contractor
  • Remote

Job Description:

We are sharing a specialised part-time consulting opportunity for AI, machine learning, data science, data engineering, software engineering, and STEM professionals experienced in technical task design, programming, statistical methods, ML modeling, computational reasoning, agentic workflows, and structured evaluation.

This role supports current and upcoming remote consulting opportunities focused on AI/ML task design, agentic technical evaluation, data science and software workflow review, ground truth solution development, technical feedback, rubric creation, and high-quality project execution. Selected professionals will help design challenging tasks, evaluate AI agent outputs, and improve the rigor of technical evaluation materials across AI, data, and STEM domains.

Key Responsibilities

Professionals in this role may contribute to:

Agentic AI & Technical Task Design

  • Design challenging agentic tasks rooted in real-world machine learning, data science, data engineering, software, and technical workflows
  • Write accurate, well-documented solutions that serve as ground truth for evaluation
  • Surface technical nuances, edge cases, and reasoning gaps that distinguish expert-level work from surface-level responses
  • Create tasks that require strong programming, analytical, statistical, or computational judgment

AI Output Evaluation & Technical Feedback

  • Evaluate AI agent outputs against reference solutions for correctness, efficiency, reasoning quality, and technical rigor
  • Review outputs involving programming, data analysis, ML modeling, statistical methods, software reasoning, or computational methods
  • Identify flawed logic, incomplete solutions, inefficient approaches, weak assumptions, or unsupported technical conclusions
  • Provide detailed written feedback that clearly explains technical issues and improvement areas

Evaluation Frameworks & Review Consistency

  • Develop and refine evaluation frameworks and rubrics for assessing agentic behavior on AI and data science tasks
  • Apply structured review standards across technical domains and task types
  • Collaborate with other subject-matter experts to support consistency and accuracy
  • Maintain high standards for clarity, reproducibility, technical correctness, and written explanation

Ideal Profile

Strong candidates may have:

  • 3+ years of research, academic, or industry experience in machine learning, data science, software engineering, computer science, statistics, engineering, mathematics, physics, chemistry, biology, materials science, or another STEM field
  • Demonstrated technical expertise in at least one of the following: programming, data analysis, ML modeling, statistical methods, or computational methods
  • Ability to design and evaluate complex technical tasks with strong subject-matter judgment
  • Prior experience with data annotation, labeling, evaluation, or human feedback collection as a strong plus
  • Experience with LLMs, AI systems, or agentic workflows as a plus
  • Familiarity with agentic frameworks as a plus
  • Strong written communication skills and ability to explain technical decisions clearly
  • Ability to commit approximately 40 hours per week during weekdays depending on engagement scope

Educational Background

  • Academic or professional backgrounds in machine learning, data science, computer science, software engineering, statistics, mathematics, engineering, physics, chemistry, biology, materials science, or related STEM fields may be highly relevant
  • Research, industry, or applied technical experience in programming, modeling, data analysis, computational methods, or technical evaluation may be especially valuable
  • Equivalent professional experience may be considered depending on project needs

Nice to Have

  • Experience with Python, R, SQL, data pipelines, ML workflows, software development, notebooks, model evaluation, or data engineering tools
  • Experience developing benchmark tasks, evaluation frameworks, rubrics, or technical review guidelines
  • Familiarity with AI agent behavior, tool use, multi-step reasoning, or agentic task execution
  • Experience reviewing AI-generated technical outputs or human-written technical solutions
  • Comfort working across multiple technical domains and evaluating complex reasoning quality

Why This Opportunity

  • Apply AI/ML, data science, software, and STEM expertise to structured remote consulting work
  • Contribute to high-quality technical task design, agentic evaluation, ground truth solution development, and rubric creation
  • Work on assignments aligned with your machine learning, data science, software engineering, or STEM background
  • Use your technical judgment to improve the rigor and clarity of AI and data evaluation materials
  • Remote structure with competitive hourly compensation

Contract Details

  • Independent contractor role
  • Fully remote with weekday availability expected
  • Eligible professionals should be based in the United States depending on project needs
  • Expected commitment of approximately 40 hours per week during weekdays depending on engagement scope
  • Competitive rates between $60–$90 per hour depending on expertise and project scope
  • Weekly payments via Stripe or Wise
  • Projects may be extended, shortened, or adjusted depending on scope and performance
  • Work will not involve access to confidential or proprietary information from any employer, client, or institution

About the Platform

This opportunity is available through 24-MAG LLC. We connect experienced professionals with remote consulting opportunities across technical, evaluation, and project-based workstreams.

By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy.