Data Engineer for Speech/Audio Datasets (Internship)

Data Engineer for Speech/Audio Datasets (Internship)

Here at Hugging Face, we're on a journey to advance and democratize good ML for everyone. Along the way, we contribute to the development of technology that is informed by human values and sociotechnical context.

About the Role

As a data engineer for speech/audio datasets, you will work on a 3-6 months project to catalyze progress in speech recognition for the open-source and research community.

The project will deal with:

  • analyzing publicly available audio datasets,
  • providing better access to selected datasets within the 🤗 Datasets library,
  • improving audio data pre- and post-processing features within the 🤗 Datasets library,
  • evaluating state-of-the-art speech recognition systems on a variety of speech datasets.

During your project, you will closely work with the audio and speech community. The goal is to catalyze research in speech recognition by making audio preprocessing as easy as possible for as many datasets as possible, as well as providing reproducible baselines for state-of-the-art speech recognition systems and empowering the speech and audio community to improve current dataset documentation practices.

About you

You'll love this internship if you are passionate about current trends in speech recognition and view sharing your work with the research community as a necessity.

You should be well-versed in Python, have some experience in audio/speech preprocessing, and not be (too) afraid to process multiple terabytes of audio data on a daily basis. Experience with some tabular data libraries, e.g. Apache Arrow, as well as open-source contributions and the ability to communicate feature requests to a diverse open-source community are a plus! It is advantageous if you are comfortable working remotely as most of our collaborations are conducted in a remote setting.

We encourage students enrolled in university (Ph.D., Master, or Bachelor), data scientists, and ML/Data engineers looking for new opportunities to apply for this internship.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where you feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community, as well as the future of machine learning more broadly. Hugging Face is an equal opportunity employer, and we do not discriminate based on race, ethnicity, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or ability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to grow continuously. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options as well as unlimited PTO. We offer health, dental, and vision benefits for employees and their dependents. We also offer 12 weeks of parental leave (20 for birthing mothers) and unlimited paid time off.

We support our employees wherever they are. While we have office spaces in NYC and Paris, we're very distributed, and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We want our teammates to be shareholders. All employees have company equity as part of their compensation package. If we succeed in becoming a category-defining platform in machine learning and artificial intelligence, everyone enjoys the upside.

Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.