Data Infrastructure Engineer, Feature Computation

Data Infrastructure Engineer, Feature Computation

Stripe has a fantastic set of data, and machine learning is critical for making use of it at scale. The Merchant Intelligence group is responsible for using this data to build a deep understanding of the businesses that use us, which is a priority to protect Stripe and also to optimize our products. Removing barriers to online commerce is at the heart of Stripe’s mission, and doing so requires effectively and efficiently protecting the businesses that trust Stripe at scale.

With all this data, we’re looking for infrastructure and data engineers who can help us develop and grow our capabilities in machine learning: you will work on building the platform, tooling, and pipelines for deep learning as well as new products and applications powered by ML. Machine learning infrastructure engineers in Merchant Intelligence are responsible for the mission-critical work that allows Stripe to unlock access to economic infrastructure for a huge variety of businesses across the globe.

You will work on:

  • Building our production scoring stack for deep learning models.
  • Creating libraries that enable ML engineers at Stripe to seamlessly transition from experimentation to production across Stripe’s data systems.
  • Owning, augmenting and evolving central datasets to enable new products powered by ML.
  • Pairing with product teams and ML modeling engineers to develop easy to use infrastructure for production ML models.
  • Becoming an expert in Tensorflow, Kubernetes, Spark and other technologies that make up parts of our production ML stack.

We are looking for:

  • A strong engineering background and experience in machine learning or data infrastructure. You’ll be writing production Scala and Python code.
  • At least 5 years of software engineering experience
  • Experience with model training and inference in production and at scale.
  • Experience optimizing the end-to-end performance of distributed systems.
  • Experience developing and maintaining distributed systems built with open source tools.
  • Experience in writing and debugging ETL jobs using a distributed data framework (Spark, Kafka, Flink).

Nice to haves:

  • Experience with Scala and Python
  • Experience with Spark or an equivalent framework
  • Experience with TensorFlow or PyTorch

It’s not expected that you’ll have deep expertise in every dimension above, but you should be interested in learning any of the areas that are less familiar.

Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.