Data Curator and Annotator
2+ Years
Data Curation, Dataset Labeling, Annotation Tools (Label Studio, Prodigy), Quality Control, NLP/Search Data, Privacy & Bias Mitigation
Full Time
Remote
Job Description
The Data Curator and Annotator will be responsible for curating, labeling, and maintaining high-quality datasets to support ML training, RAG pipelines, and evaluation. This role requires precision in annotation, strong attention to detail, and the ability to establish reliable guidelines and workflows. The ideal candidate will collaborate closely with engineers and data scientists to ensure datasets are accurate, secure, and aligned with business and research needs.
Responsibilities
- Curate and label datasets for ML training and evaluation.
- Define annotation guidelines and quality control processes.
- Develop efficient labeling workflows with quality gates.
- Ensure privacy, security, and bias mitigation in datasets.
- Collaborate with engineers and data scientists to improve data utility.
- Build trusted evaluation datasets for ranking and RAG tasks.
Requirements
- Experience labeling or curating datasets for NLP or search.
- Familiarity with annotation tools such as Label Studio or Prodigy.
- Strong attention to detail and commitment to labeling consistency.
- Comfort working with enterprise domain data.
- Experience with QA processes for annotation quality.
- Strong written communication for guideline creation.
- Respect for privacy, security, and ethical data principles.
Nice to Haves
- Domain knowledge in BFSI, retail, or healthcare.
- Experience creating evaluation datasets for LLMs.
- Multi-lingual annotation experience.
Comfort with basic Python scripting.