Remote | LLM Personal Assistant Evaluation Specialist — $70–$180/hour

New York, New York, United States
Contractor
Remote

Job Description:

We are sharing a specialised part-time consulting opportunity for advanced LLM power users experienced in personalized AI workflows, rubric-based evaluation, real-world task assessment, personal productivity systems, and high-context decision support.

This role supports current and upcoming remote consulting opportunities focused on evaluating how AI systems handle personalized, real-world life tasks across food, health, productivity, career, learning, research, planning, and personal workflow scenarios. Selected professionals will create realistic prompts, complete complex AI-assisted tasks, record workflow execution, design or apply detailed rubrics, and evaluate whether AI outputs are useful, personalized, practical, safe, and successful in real-life contexts.

Key Responsibilities

Professionals in this role may contribute to:

Personalized AI Task Evaluation

Create written responses, prompts, and explanations for complex personal-life tasks
Evaluate whether AI outputs are practical, well-reasoned, personalized, realistic, and successful
Identify where outputs succeed, miss context, overreach, provide generic advice, or fail to account for real constraints
Use hands-on LLM experience to assess real-world usefulness across high-context personal workflows

Rubric Design & Quality Assessment

Apply structured rubrics and quality criteria to evaluate AI system performance
Create detailed evaluation rubrics for complex personal tasks and multi-step workflows
Judge outputs against criteria involving usefulness, personalization, reasoning quality, safety, completeness, and success conditions
Write clear, specific, and well-supported feedback explaining evaluation decisions

Real-World Workflow Execution

Execute AI-assisted tasks while recording screens according to project instructions
Review task performance across tools, prompts, reasoning steps, outputs, and final recommendations
Complete research-intensive personal workflows end-to-end within expected turnaround timelines
Maintain careful documentation of task setup, execution, rubric design, and evaluation results

Ideal Profile

Strong candidates may have:

Heavy personal usage of LLM products and AI tools
Experience using AI for multi-step tasks, planning, research, decision-making, personal workflows, or life administration
Familiarity with tools such as ChatGPT, Claude, Gemini, Perplexity, Cursor, Windsurf, Codex, or other AI agents
Strong ability to explain what makes an AI output useful, incomplete, unsafe, unrealistic, generic, or poorly personalized
Extensive rubric experience, including prior rubric design, evaluation, and quality assessment work
Strong written judgment, attention to detail, and ability to evaluate against structured criteria
Ability to complete tasks within 24 hours when project timing requires

Educational Background

Formal degree requirements may vary based on project needs
Practical experience using LLMs for complex personal workflows, rubric-based evaluation, research, writing, QA, product testing, or AI assessment is highly relevant
Experience in education, research, operations, productivity systems, coaching, writing, product evaluation, user research, or AI workflow design may be especially valuable

Nice to Have

100+ hours of prior rubric-related work involving rubric design, evaluation, model assessment, quality review, or structured judgment
Experience evaluating AI tools across personal productivity, career planning, food recommendations, learning workflows, health-adjacent reasoning, or personal research tasks
Strong familiarity with personal AI workflows involving calendars, reminders, errands, job applications, LinkedIn, resumes, study plans, restaurant selection, or decision support
Ability to record screen-based workflows clearly and follow detailed task instructions
Access to a desktop or laptop computer suitable for project work and screen recording

Why This Opportunity

Apply advanced LLM power-user experience to structured remote project work
Contribute to high-quality evaluation of personalized AI assistant workflows
Work on flexible assignments involving practical, real-world personal tasks across multiple domains
Use your judgment to help assess whether AI systems are truly useful, personalized, realistic, and successful
Remote structure with competitive hourly compensation

Contract Details

Independent contractor role
Fully remote with flexible scheduling
Eligible professionals should be based in the United States depending on project needs
Expected commitment of approximately 15–40 hours per week depending on project availability and scope
Participants may be asked to complete a paid work trial as part of onboarding
Work trial compensation may be approximately $30 upon completion depending on project requirements
Tasks may require 24-hour turnaround depending on assignment timing
Desktop or laptop computer required for project work and screen recording
Competitive rates between $70–$180 per hour depending on expertise, project scope, and task type
Weekly payments via Stripe or Wise
Projects may be extended, shortened, or adjusted depending on scope and performance
Work will not involve access to confidential or proprietary information from any employer, client, or institution

About the Platform

This opportunity is available through 24-MAG LLC. We connect experienced professionals with remote consulting opportunities across technical, evaluation, and project-based workstreams.

By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy.