Remote | Urdu-English AI Safety Red Team Evaluator — $20–$30/hour

New York, New York, United States
Contractor
Remote

Job Description:

We are sharing a specialised part-time consulting opportunity for Urdu-English bilingual professionals experienced in AI safety evaluation, red team testing, adversarial review, vulnerability classification, and structured feedback on sensitive text-based AI outputs.

This role supports current and upcoming remote consulting opportunities focused on AI safety evaluation, bilingual red team testing, conversational model assessment, misuse-risk review, vulnerability annotation, and high-quality project execution. Selected professionals will test AI systems using structured adversarial scenarios, identify safety weaknesses, classify risks, and produce clear English-language evaluation artifacts across English and Urdu contexts.

Key Responsibilities

Professionals in this role may contribute to:

Bilingual AI Safety & Red Team Testing

Review English and Urdu AI outputs for safety, reliability, bias, misinformation, and harmful-behavior risks
Stress-test conversational AI models and agents using structured adversarial scenarios
Evaluate model behavior across multi-turn conversations, sensitive topics, and edge-case prompts
Identify vulnerabilities that require stronger safety controls, clearer refusals, or improved response quality

Vulnerability Classification & Risk Review

Annotate failures, classify vulnerabilities, and flag recurring safety patterns
Apply taxonomies, benchmarks, and project-specific playbooks to keep testing consistent
Assess misuse cases, bias exploitation, prompt-injection scenarios, and socio-technical risk patterns at a high level
Generate high-quality human evaluation data through careful review and structured judgment

Reproducible Documentation & Evaluation Artifacts

Produce clear reports, datasets, test cases, and written summaries that support model improvement
Document findings reproducibly so results can be reviewed, compared, and acted upon
Explain risks clearly for both technical and non-technical audiences
Maintain accuracy, consistency, and strong attention to detail across submitted evaluations

Ideal Profile

Strong candidates may have:

Native-level fluency in both English and Urdu
Prior experience in AI red teaming, adversarial testing, cybersecurity, trust and safety, socio-technical risk review, or conversational AI evaluation
Ability to think adversarially while staying structured, careful, and methodical
Experience using frameworks, benchmarks, or rubrics rather than unstructured testing alone
Strong written communication skills and ability to explain safety findings clearly
Comfort reviewing text-based content involving sensitive topics under clear guidelines
Adaptability across project types, safety categories, and evaluation workflows

Educational Background

Formal degree requirements may vary based on project needs
Backgrounds in AI safety, cybersecurity, linguistics, policy, trust and safety, social science, psychology, writing, data evaluation, or technical analysis may be highly relevant
Practical experience in red team testing, model evaluation, content risk analysis, or structured review work may also be valuable

Nice to Have

Experience with adversarial ML concepts, jailbreak datasets, prompt injection, RLHF/DPO attack patterns, or model behavior testing
Cybersecurity experience such as penetration testing, exploit analysis, reverse engineering, or security assessment
Socio-technical risk experience involving harassment, misinformation, abuse analysis, bias testing, or conversational AI safety
Creative probing background, including psychology, acting, writing, role-play design, or unconventional adversarial thinking
Experience producing reproducible reports, labeled datasets, structured risk notes, or benchmark-style evaluation artifacts

Why This Opportunity

Apply Urdu-English bilingual expertise to structured AI safety and red team evaluation work
Contribute to stronger, safer, and more reliable AI systems through careful adversarial testing
Work on flexible assignments aligned with language skills, safety judgment, and structured analysis
Build experience in human data-driven AI safety evaluation and bilingual risk review
Remote structure with competitive hourly compensation

Contract Details

Independent contractor role
Fully remote with flexible scheduling
Eligible professionals may be based in approved project locations depending on project needs
Native-level English and Urdu fluency are required for project work
Work is text-based and may involve sensitive topics such as bias, misinformation, harassment, or harmful-behavior risks
Topic areas will be communicated before exposure to content, and participation in higher-sensitivity projects may depend on candidate comfort and project fit
Part-time commitment depending on project availability
Competitive rates between $20–$30 per hour depending on expertise and project scope
Weekly payments via Stripe or Wise
Projects may be extended, shortened, or adjusted depending on scope and performance
Work will not involve access to confidential or proprietary information from any employer, client, or institution

About the Platform

This opportunity is available through 24-MAG LLC. We connect experienced professionals with remote consulting opportunities across technical, evaluation, and project-based workstreams.

By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy.