Remote | Mathematics Model Prompt Evaluator — $25–$60/hour

  • San Francisco, California, United States
  • -
  • Remote

Job Description:

We are sharing a specialised part-time consulting opportunity for expert mathematicians with strong backgrounds in mathematical reasoning, proof writing, formal analysis, and high-quality technical question design.

This role supports an exciting collaboration with a leading frontier AI research laboratory focused on improving mathematical reasoning and model evaluation through rigorous, high-quality prompt authoring and verification workflows.

Selected professionals will author and verify open-ended mathematical problems across core subdomains such as probability, statistics, algebra, differential equations, geometry, graph theory, and number theory. The goal is to help advanced AI systems produce higher-quality reasoning in complex mathematical contexts by building challenging, unambiguous evaluation tasks and applying expert judgment to assess prompt quality, scope, and difficulty.

Key Responsibilities

Professionals in this role may contribute to:

Prompt Authoring
Create original, open-ended prompts within an assigned mathematical subdomain across varying difficulty levels, including undergraduate, advanced undergraduate, and graduate or professional levels
Design prompts that require human judgment to evaluate the quality of the AI's response, including tasks involving proof construction, formal reasoning, or multi-step mathematical analysis
Ensure prompts are clear, well-scoped, and sufficiently challenging for meaningful model evaluation

Prompt Verification & Quality Review
Review authored prompts for clarity, uniqueness, scope alignment, and difficulty accuracy
Edit prompts and difficulty assignments where standards are not met
Ensure that prompts within each task are sufficiently distinct from one another and aligned with project expectations

Mathematical Reasoning Evaluation Support
Apply expert judgment to assess the depth and quality of mathematical reasoning required by each prompt
Help establish rigorous evaluation standards for frontier language models operating in mathematical domains
Support high-quality task design across a broad set of mathematical subfields

Ideal Profile

Strong candidates may have:
A Master's degree or higher in Mathematics, Applied Mathematics, Statistics, or a closely related field
2–6 years of professional or research experience in a quantitative field
Strong command of graduate-level mathematical concepts including proof writing, analysis, and formal reasoning
Excellent written English and the ability to craft precise, well-scoped technical questions
Comfort working across structured evaluation tasks requiring depth, clarity, and mathematical judgment

Preferred qualifications

Experience in academic research, mathematical competition design, or quantitative industry roles
Experience across one or more of the following areas: probability and statistics, algebra including linear algebra, ordinary or partial differential equations and dynamical systems, geometry, graph theory, or number theory
Ability to design open-ended mathematical questions that require nuanced reasoning rather than simple factual recall
Strong editorial judgment when reviewing scope, clarity, and difficulty calibration

Why This Opportunity

Contribute specialised mathematics expertise to a cutting-edge AI collaboration
Help improve how advanced AI systems reason through complex mathematical problems and formal analytical tasks
Work on high-impact evaluation workflows that shape mathematical model benchmarking standards
Flexible remote work with structured expectations and competitive hourly compensation

Contract Details

Independent contractor role
Fully remote with flexible scheduling
Hourly compensation of $25–$60 per hour
Expected commitment of 10+ hours per week
Work is fully asynchronous
Projects may be extended, shortened, or concluded early depending on project needs and performance
Weekly payments via Stripe or Wise
Work will not involve access to confidential or proprietary information from any employer, client, or institution
Please note: We are unable to support H1-B or STEM OPT candidates at this time
Start date: Immediate

About the Platform

This opportunity is available through a leading AI-driven work platform that connects domain experts with frontier AI research projects.

Experts contribute to improving advanced AI systems by providing specialised expertise across real-world workflows, structured evaluation, model training support, and domain-specific content validation.

By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy