Stripe

San Francisco

1,001-5,000 employees

Help increase the GDP of the internet.

Prior Listings

Other Jobs in ML Engineering

See all

Machine Learning Infrastructure Engineer, ML Platform

Stripe

Machine Learning Infrastructure Engineer, ML Platform

ML Engineering

This job is no longer open

Stripe’s mission is to increase the GDP of the internet. To do this, we need to fight fraud at scale and build great software products, which means assembling strong machine learning teams and equipping them with the technologies they need to be effective. Our mission on Machine Learning Platform is to make these teams more impactful by providing reliable and flexible infrastructure to enable Machine Learning at scale.

The Machine Learning Platform team does this by designing and engineering the underlying infrastructure that powers experimentation, training and serving for Stripe’s key machine learning systems. Our flagship products include Railyard and Diorama. Railyard provides an expressive and powerful interface for model training at scale. Diorama enables model serving in real-time with strong reliability and latency guarantees. We work closely with ML engineers, data scientists, and platform infrastructure teams to build the powerful, flexible, and user-friendly systems that substantially increase ML velocity across the company.

You will work on:

Building powerful, flexible, and user-friendly infrastructure that powers all of ML at Stripe
Designing and building fast, reliable services for ML model training and serving, and distributing that infrastructure across multiple regions
Creating services and libraries that enable ML engineers at Stripe to seamlessly transition from experimentation to production across Stripe’s systems
Pairing with product teams and ML modeling engineers to develop easy to use infrastructure for production ML models

We are looking for:

A strong engineering background and experience with data infrastructure and/or distributed systems
Experience optimizing the end-to-end performance of distributed systems
Experience developing and maintaining distributed systems built with open source tools
Experience with or strong interest in developing ML models

Nice to haves:

Experience with Scala and Python
Experience with Kubernetes
Experience with creating developer tools
Experience with model training and serving in production and at scale.
Experience in writing and debugging ETL jobs using a distributed data framework (such as Spark, Kafka, or Flink)

It’s not expected that you’ll have deep expertise in every dimension above, but you should be interested in learning any of the areas that are less familiar.

This job is no longer open