AI Coding Evaluator | $15/hr Remote

May 9, 2026

Position: Agentic Coding Annotator Online / Offline Tasks

Type: Short-Term Contract (5 weeks)

Commitment: 8 hours per day with 4 hours overlap with PST

Perform online evaluations by interacting with models on predefined coding tasks and grading outputs
Conduct offline evaluations by designing realistic coding tasks and defining evaluation criteria
Review and analyze model-generated code by reading, debugging, and validating outputs
Run tests, scripts, and terminal commands to verify correctness of solutions
Write clear, evidence-based rationales for trajectory rankings and assessments
Design task-specific rubrics and ensure consistent evaluation across runs
Identify issues in outputs, environments, or instructions and escalate with supporting evidence
Work with agentic coding tools and evaluation frameworks to assess model performance

Requirements

Strong years of experience in software engineering, QA, or similar code-heavy roles
Strong proficiency in at least 1 2 programming languages (Python, JavaScript, Java, C/C++, etc.)
Experience working with Linux/terminal, Git, and development tools
Familiarity with coding-agent tools (e.g., Cursor, Claude Code, OpenCode, or similar)
Ability to read unfamiliar codebases, debug issues, and evaluate correctness
Strong attention to detail and ability to follow structured evaluation processes
Comfortable with repetitive, high-precision evaluation work
Experience with Docker or reproducible environments is a plus
Ability to work independently in a remote environment

Application Process

Apply/Easy Apply and check email for application form
Fill Google form
Assessment Link (After shortlisting; candidates can choose between two options and complete within 24 hours)

DISCLAIMER