DESCRIPTION
Job summary
The AI Data Team in Amazon Web Services (AWS) is looking for a Senior Language Engineer ready to dive deep and think big about developing solutions for natural language data collections. This position is an opportunity to apply your expertise in a challenging but supportive environment. The position may be remote, or located in Santa Clara, Seattle, or New York City.
The mission of the AI Data Team is to engineer the datasets critical to the success of AWS’s machine learning services. From chatbots to subtitles to search results and beyond, these products support dozens of languages and impact millions of people every day. We are a group of language engineers, linguists, data scientists, data engineers, and program managers, and we partner closely with the science, engineering, and product teams. We are customer obsessed and committed to delivering results with the highest quality and integrity.
As a Senior Language Engineer, you will start by learning the full context of critical projects for Transcribe spoken language understanding (SLU) services. You will consult with stakeholders in science, engineering, and product teams to understand the role data plays in developing models that meet customer needs. You will lead the data collection and annotation strategy for multiple key projects, with a focus on iterative analysis and course correction. You will use your hands-on data analytics skills and up-to-date knowledge of machine learning (ML) techniques when collaborating with scientists to ensure that datasets are optimized for model performance. You will raise issues regarding data availability and level of effort for data collection and propose solutions to overcome potential obstacles.
You will then expand your scope to plan and implement high impact initiatives. You will gain an understanding of the language engineering needs across the various programs supported by the AI Data Team and coordinate with your colleagues to identify opportunities for innovation. You will use data-driven reasoning to quantify the benefits to AI/ML programs and to secure the necessary resources.
You will also function as a technical expert in data-centric AI, staying up to date in developments in the field and sharing your knowledge with colleagues across AWS. You will experiment with new techniques in data collection and annotation. You will collaborate with science teams to develop more effective workflows and influence the roadmap for tooling to support data collection processes.
Key job responsibilities
- Design strategy and lead projects to engineer the natural language datasets needed to train and test machine learning models.
- Innovate on data collection methodologies, guidelines, quality metrics to support new requests.
- Contribute linguistic expertise to cross-functional teams in designing new solutions.
- Write proposals and drive initiatives to advance the AI Data Team’s ability to deliver high quality data efficiently.
- Provide thought leadership on state-of-the-art data collection techniques, with experimentation and knowledge sharing.
BASIC QUALIFICATIONS
- PhD in Computational Linguistics or Linguistics with a strong quantitative focus component (or similar field), or equivalent experience
- 5+ years industry experience developing natural language processing products
- Proficiency in scripting and analytics tools such as Python or R
- Experience leading large-scale and innovative data collection projects, including annotation workflows and data quality assessments
- Practical knowledge of speech processing techniques for segmenting, labelling, and analyzing speech
- Understanding of the ML model development process
- Ability to perform exploratory data analysis (EDA) and data quality assessments
- Strong written and verbal communication skills, with an ability to present complex technical information in a clear and concise manner to a variety of audiences
PREFERRED QUALIFICATIONS
- Experience with programmatic approaches to annotation, including weak supervision and active learning
- Experience designing collection and annotation interfaces to improve accuracy and efficiency
- Experience working with heterogeneous language data such as multilingual or multimodal data
• The pay range for this position in Colorado is $93,400.00 -160,000 (yr.); however, base pay offered may vary depending on job-related knowledge, skills, and experience. A sign-on bonus and restricted stock units may be provided as part of the compensation package, in addition to a full range of medical, financial, and/or other benefits, dependent on the position offered. This information is provided per the Colorado Equal Pay Act. Base pay information is based on market location. Applicants should apply via Amazon's internal or external careers site.
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
Pursuant to the Los Angeles Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Workers in New York City who perform in-person work or interact with the public in the course of business must show proof they have been fully vaccinated against COVID or request and receive approval for a reasonable accommodation, including medical or religious accommodation.