Sr. ML Infrastructure Engineer

Sr. ML Infrastructure Engineer

This job is no longer open

About the role 🎉

We’re looking for a Senior ML Infrastructure Engineer to help us scale our infrastructure and tooling behind the development, testing, and deployment of our machine learning based products. The ideal candidate for this role has experience provisioning large compute clusters for machine learning workflows, has experience supporting teams to create best practices for reliability and scalability, and thrives in fast-paced, high-ownership environment.

A peek at our technical stack 🔍

The rich UI of our video editing and collaboration tools is powered by Typescript and React/Redux, while the real time compositing and graphics engine behind our interactive preview runs on WebGL2 and WebAssembly. Our video streaming backend components are written in Python, use a lot of FFmpeg/libav and HLS for on-the-fly transcoding, PyTorch and TorchScript for ML inference, and are deployed as containerized services on Kubernetes. Our API endpoints for real-time collaboration and media asset management are written in Typescript and node.js and are deployed as serverless functions on AWS Lambda.

What you’ll do 🎨

  • Manage large compute clusters for ML training, inference, and development
  • Create tooling and infrastructure that abstract compute and storage in ML development workflows
  • Build automation and CI/CD pipelines for developing and deploying new machine learning models

What you’ll need 💻

  • 3+ years of experience in a DevOps or Infrastructure Engineer role building machine learning infrastructure and working with large GPU clusters
  • Knowledge of cloud providers such as AWS, GCP, or Azure, infrastructure-as-code frameworks such as Terraform, observability tools such as Grafana
  • Interest and experience supporting engineering teams in creating robust processes for automation, reliability, and instrumentation
  • Strong communication, collaboration, and documentation skills
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.