Descript

San Francisco

51-200 employees

Record, transcribe, edit, mix, collaborate, and master your audio and video with Descript.

Prior Listings

Other Jobs in ML Engineering

See all

Research Engineer

Descript

Research Engineer

ML Engineering

This job is no longer open

Our vision is to build the next-generation platform for fast and easy creation of audio and video content. In October 2020, we took a huge leap forward by launching a cloud based collaborative video editor and a screen recorder. We have built and shipped some key technologies like voice cloning, one-click speech enhancement, etc. to help us realize our vision. We're used by some of the world's top podcasters and influencers as well as businesses such as BBC, ESPN, Hubspot, Shopify and Washington Post for communicating via video. We've raised $50M from some of the world's best investors like Andreessen Horowitz, Redpoint Ventures and Spark Capital.

We need great people to help us build these cutting-edge technologies and guide its development. In particular, we’re always looking to hire smart research engineers. You will join a team of around a dozen researchers specialized in generative models and deep learning.

Some of our research publications:

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.
Char2Wav: End-to-end Speech Synthesis.
ObamaNet: Photo-realistic lip-sync from text.
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.
Chunked Autoregressive GAN for Conditional Waveform Synthesis
Wav2CLIP: Learning Robust Audio Representations From CLIP

Responsibilities

Read, understand, and replicate recent AI research papers.
Work closely with research scientist to go from problem definition all the way to running and analyzing concrete research experiments.
Collaborate and communicate clearly and efficiently with the rest of the team about the status, results, and challenges of your current tasks.

Challenges

As a member of our research team, you'll play an integral role working through challenges such as:

Using deep learning (including but not limited to NLP, speech processing, computer vision, etc.) to solve problems for media creation and editing.
Creating realistic voice doubles using only a few minutes of audio.
Creating tools to synthesize photo-realistic videos that match our Overdub (personalized speech synthesis) feature.
Designing and developing new algorithms for media synthesis, anomaly detection, speech recognition, speech enhancement, filler word detection, audio and video tagging etc.
Coming up with new research directions to improve our product

Requirements

Proven experience in implementing and iterating on deep learning algorithms.
BSc/BEng degree in computer science, mathematics, physics, electrical engineering, machine learning or equivalent (MSc or PhD preferable).
Good programming skills and experience with deep learning frameworks (preferably Pytorch).
Strong knowledge and experience of Python.
Experience with deep learning in a requirement. Domain specific knowledge in NLP, speech processing, Computer Vision, etc. is not a requirement.
Great debugging skills for training novel deep learning models e.g. resolving nan issues when training neural networks, and finding it exciting to figure out the reasons.
Curiosity towards research papers: how might it work out of the box?
Excitement to partner with people to solve novel problems with deep learning

At least one of the following must be true:

You have implemented a research paper from scratch e.g. writing data loaders, implementing neural network architecture, and trained the model on some datasets.
You have taken an existing deep learning work, modified it in a novel way and trained the model on a different dataset.

This job is no longer open