Data Engineer for Vision/Image Datasets (Internship)

Data Engineer for Vision/Image Datasets (Internship)

This job is no longer open

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.

We have built the fastest-growing, open-source, library of pre-trained models in the world. With over 100M+ installs and 65K+ stars on GitHub, over 10 thousand companies are using HF technology in production, including leading AI organizations such as Google, Elastic, Salesforce, Algolia, and Grammarly.

About the Role

As a data engineer for vision datasets, you will work on a 3-6 months project to catalyze progress in computer vision for the open-source and research community.

The project will deal with:

  • analyzing publicly available vision datasets,
  • providing better access to selected datasets within the 🤗 Datasets library,
  • improving vision data pre-and post-processing features within the 🤗 Datasets library,
  • evaluating state-of-the-art computer vision systems on a variety of vision/image datasets.

During your project, you will closely work with the vision community. The goal is to catalyze research in computer vision by making image preprocessing as easy as possible for as many datasets as possible, as well as providing reproducible baselines for state-of-the-art computer vision systems and empowering the vision community to improve current dataset documentation practices.

About you

You'll love this internship if you are passionate about current trends in computer vision and view sharing your work with the research community as a necessity.

You should be well-versed in Python, have some experience in image preprocessing, and not be (too) afraid to process multiple terabytes of image data on a daily basis. Experience with some tabular data libraries, e.g. Apache Arrow, as well as open-source contributions and the ability to communicate feature requests to a diverse open-source community are a plus! It is advantageous if you are comfortable working remotely as most of our collaborations are conducted in a remote setting.

We encourage students enrolled in university (Ph.D., Master, or Bachelor), data scientists, and ML/Data engineers looking for new opportunities to apply for this internship.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where you feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community, as well as the future of machine learning more broadly. Hugging Face is an equal opportunity employer, and we do not discriminate based on race, ethnicity, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or ability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to grow continuously. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options. We offer health, dental, and vision benefits for employees and their dependents. We also offer parental leave and unlimited paid time off.

We support our employees wherever they are. While we have office spaces in NYC and Paris, we're very distributed, and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We want our teammates to be shareholders. All employees have company equity as part of their compensation package. If we succeed in becoming a category-defining platform in machine learning and artificial intelligence, everyone enjoys the upside.

This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.