ML Research Engineer Internship - Text-To-Speech Reproduction

ML Research Engineer Internship - Text-To-Speech Reproduction

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.

We have built the fastest-growing, open-source library of pre-trained models in the world. With over 100M+ installs and 65K+ stars on GitHub, over 10 thousand companies are using HF technology in production, including leading AI organizations such as Google, Elastic, Salesforce, Algolia, and Grammarly.

About the Role

Text-to-speech (TTS) is an area of research that is receiving increasing attention. Breakthroughs in model architectures and training paradigms have led to a surge of next-generation TTS systems [1]. Powerful TTS models enable tasks as voice assistants, personalized voice synthesis and audiobook narration. However, TTS is largely unexplored in the open-source setting, with open-source systems lagging far behind their proprietary counterparts.

As a Research Engineer for Text-To-Speech Reproduction, you will work on a 3-6 month research project investigating how TTS models can be made more accessible to the open-source community.

You will leverage state-of-the-art techniques to reproduce a performant open-source TTS system. You will be involved in all aspects of reproducing this system, including preparing a large-scale dataset, implementing the model, and training the weights. By sharing your findings, you will help to foster a culture of reproducibility and transparency in the ML community. The model you train and the findings you share will serve as a platform for the future of TTS research.

You will work in the science team, where you'll get to foster one of the most active machine learning communities. You'll interact with Researchers, ML practitioners and Data Scientists on a daily basis, discussing issues related to your research and the wider ML community.

[1] Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., Chen, Z., Liu, Y., Wang, H., Li, J., He, L., Zhao, S., & Wei, F. (2023). Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. ArXiv, abs/2301.02111. URL:

About you

If you love open-source, are passionate about making machine learning technology more accessible, and view sharing your work with the research community as a necessity, then we can't wait to see your application!

You should be well-versed in PyTorch or an equivalent major deep learning framework and have some experience with open-source ML libraries. Experience working in audio is not required and we encourage candidates from all fields of ML to apply, but you should have experience running research experiments.

We encourage students enrolled in University (Masters’, Ph.D.) and ML/Research Engineers looking for new opportunities to apply for this internship.

If you're interested in joining us, but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and background complement one another. We're happy to consider where you might be able to make the biggest impact.

Preferred Location

Ideally, you are based in Paris, but we are open to remote work for the right candidate.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where people feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community. Hugging Face is an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to continuously grow. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options. We support our employees wherever they are. While we have office spaces around the world, especially in the US, Canada, and Europe, we're very distributed and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We support the community. We believe significant scientific advancements are the result of collaboration across the field. Join a community supporting the ML/AI community.

Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.