Remote | Software Engineering, Data Science, and Design Experts — $60–$100/hour

San Francisco, California, United States
-
Remote

Job Description:

We are sharing a specialised part-time consulting opportunity for experienced software engineering, data science, and systems design professionals with strong technical depth, real-world engineering experience, and the ability to evaluate AI-generated coding and technical reasoning outputs at a high level.

This role supports an exciting collaboration with leading AI teams focused on improving the quality, usefulness, and reliability of general-purpose conversational AI systems across coding, software engineering, and technical problem-solving contexts.

Selected professionals will evaluate model-generated responses to coding and engineering queries, validate technical accuracy through fact-checking and code execution, identify conceptual or logical issues, and help improve how advanced AI systems reason about code, generate solutions, and explain technical concepts across a variety of tasks and complexity levels.

Key Responsibilities

Professionals in this role may contribute to:

Technical Evaluation & Response Review
Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness
Assess model responses across programming, data science, and systems design tasks of varying complexity
Ensure model outputs align with expected conversational behavior and system guidelines

Code Validation & Fact-Checking
Conduct fact-checking using trusted public sources and authoritative references
Execute code and validate outputs using appropriate tools to test correctness and reliability
Assess code quality, readability, algorithmic soundness, and explanation quality

Annotation, Feedback & Quality Improvement
Annotate model responses by identifying strengths, weaknesses, and factual or conceptual inaccuracies
Identify subtle bugs, logical flaws, inefficiencies, edge cases, and misleading explanations
Apply consistent evaluation standards using defined taxonomies, benchmarks, and detailed evaluation guidelines
Produce reproducible evaluation artifacts that help improve model performance and reliability

Ideal Profile

Strong candidates may have:
A BS, MS, or PhD in Computer Science or a closely related field
5+ years of real-world experience in software engineering, data science, systems design, or related technical roles
Expertise in at least two relevant programming languages such as Python, Java, C++, C, JavaScript, Go, Rust, Ruby, SQL, PowerShell, Bash, Swift, Kotlin, R, TypeScript, or HTML/CSS
The ability to independently solve HackerRank or LeetCode medium- and hard-level problems
Experience contributing to well-known open-source projects, including merged pull requests
Significant experience using LLMs while coding and a strong understanding of their strengths and failure modes
Strong attention to detail and comfort evaluating complex technical reasoning and subtle implementation flaws
Fluent English language skills

Preferred qualifications

Prior experience with RLHF, model evaluation, or data annotation work
Track record in competitive programming
Experience reviewing code in production environments
Familiarity with multiple programming paradigms or technical ecosystems
Ability to explain complex technical concepts clearly to non-expert audiences

Why This Opportunity

Contribute specialised technical expertise to a high-impact AI collaboration
Help improve how advanced AI systems reason about code, software engineering, and technical problem-solving
Work on evaluation and model improvement tasks that directly shape AI systems used by developers worldwide
Flexible remote work with strong hourly compensation

Contract Details

Independent contractor role
Fully remote with flexible scheduling
Open to both US-based and non-US-based professionals
Full-time or part-time contract work options available
Hourly compensation of $60–$100 per hour
Fluent English language skills required
Projects may be extended, shortened, or concluded early depending on project needs and performance
Weekly payments via Stripe or Wise
Work will not involve access to confidential or proprietary information from any employer, client, or institution
Please note: We are unable to support H1-B or STEM OPT candidates at this time
Start date: Immediate

About the Platform

This opportunity is available through a leading AI-driven work platform that connects domain experts with frontier AI research projects.

Experts contribute to improving advanced AI systems by providing specialised expertise across real-world workflows, structured evaluation, model training support, and domain-specific technical reasoning.

By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy