Remote | AI/ML Technical Evaluation Consultant — $60–$90/hour

New York, New York, United States
Contractor
Remote

Job Description:

We are sharing a specialised part-time consulting opportunity for AI, machine learning, data science, data engineering, software engineering, and STEM professionals experienced in technical task design, programming, statistical methods, ML modeling, computational reasoning, agentic workflows, and structured evaluation.

This role supports current and upcoming remote consulting opportunities focused on AI/ML task design, agentic technical evaluation, data science and software workflow review, ground truth solution development, technical feedback, rubric creation, and high-quality project execution. Selected professionals will help design challenging tasks, evaluate AI agent outputs, and improve the rigor of technical evaluation materials across AI, data, and STEM domains.

Key Responsibilities

Professionals in this role may contribute to:

Agentic AI & Technical Task Design

Design challenging agentic tasks rooted in real-world machine learning, data science, data engineering, software, and technical workflows
Write accurate, well-documented solutions that serve as ground truth for evaluation
Surface technical nuances, edge cases, and reasoning gaps that distinguish expert-level work from surface-level responses
Create tasks that require strong programming, analytical, statistical, or computational judgment

AI Output Evaluation & Technical Feedback

Evaluate AI agent outputs against reference solutions for correctness, efficiency, reasoning quality, and technical rigor
Review outputs involving programming, data analysis, ML modeling, statistical methods, software reasoning, or computational methods
Identify flawed logic, incomplete solutions, inefficient approaches, weak assumptions, or unsupported technical conclusions
Provide detailed written feedback that clearly explains technical issues and improvement areas

Evaluation Frameworks & Review Consistency

Develop and refine evaluation frameworks and rubrics for assessing agentic behavior on AI and data science tasks
Apply structured review standards across technical domains and task types
Collaborate with other subject-matter experts to support consistency and accuracy
Maintain high standards for clarity, reproducibility, technical correctness, and written explanation

Ideal Profile

Strong candidates may have:

3+ years of research, academic, or industry experience in machine learning, data science, software engineering, computer science, statistics, engineering, mathematics, physics, chemistry, biology, materials science, or another STEM field
Demonstrated technical expertise in at least one of the following: programming, data analysis, ML modeling, statistical methods, or computational methods
Ability to design and evaluate complex technical tasks with strong subject-matter judgment
Prior experience with data annotation, labeling, evaluation, or human feedback collection as a strong plus
Experience with LLMs, AI systems, or agentic workflows as a plus
Familiarity with agentic frameworks as a plus
Strong written communication skills and ability to explain technical decisions clearly
Ability to commit approximately 40 hours per week during weekdays depending on engagement scope

Educational Background

Academic or professional backgrounds in machine learning, data science, computer science, software engineering, statistics, mathematics, engineering, physics, chemistry, biology, materials science, or related STEM fields may be highly relevant
Research, industry, or applied technical experience in programming, modeling, data analysis, computational methods, or technical evaluation may be especially valuable
Equivalent professional experience may be considered depending on project needs

Nice to Have

Experience with Python, R, SQL, data pipelines, ML workflows, software development, notebooks, model evaluation, or data engineering tools
Experience developing benchmark tasks, evaluation frameworks, rubrics, or technical review guidelines
Familiarity with AI agent behavior, tool use, multi-step reasoning, or agentic task execution
Experience reviewing AI-generated technical outputs or human-written technical solutions
Comfort working across multiple technical domains and evaluating complex reasoning quality

Why This Opportunity

Apply AI/ML, data science, software, and STEM expertise to structured remote consulting work
Contribute to high-quality technical task design, agentic evaluation, ground truth solution development, and rubric creation
Work on assignments aligned with your machine learning, data science, software engineering, or STEM background
Use your technical judgment to improve the rigor and clarity of AI and data evaluation materials
Remote structure with competitive hourly compensation

Contract Details

Independent contractor role
Fully remote with weekday availability expected
Eligible professionals should be based in the United States depending on project needs
Expected commitment of approximately 40 hours per week during weekdays depending on engagement scope
Competitive rates between $60–$90 per hour depending on expertise and project scope
Weekly payments via Stripe or Wise
Projects may be extended, shortened, or adjusted depending on scope and performance
Work will not involve access to confidential or proprietary information from any employer, client, or institution

About the Platform

This opportunity is available through 24-MAG LLC. We connect experienced professionals with remote consulting opportunities across technical, evaluation, and project-based workstreams.

By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy.