Netflix: Sr. Software Engineer - Spark Compute Infrastructure, ML Platform

Would you like to manage our Spark compute infrastructure and optimize the ML Spark pipelines that power Netflix recommendations? We think of the Netflix service as hundreds of millions of different products serving uniquely personalized experiences to each of our 200+ Million members. One of the teams powering this effort is the ML Platform Data & Feature Infra team that is responsible for building a scalable and efficient compute infrastructure that is leveraged to train our personalization ML models.

The Opportunity

In this role, you will have the opportunity to manage the Spark compute infrastructure that is used to train ML algorithms that power Netflix personalization. You will drive operational excellence through tooling and automation and will be working closely with ML researchers and engineers to scale their adhoc explorations and manage Production ML pipelines. This role will allow you to gain intimate knowledge of Netflix Personalization, while working for a unique and pioneering company that is redefining how video content is consumed globally.

Here are some examples of the types of things you would work on:

Optimize the ML Spark pipelines for both resource and latency efficiency and help do capacity planning for our compute infrastructure
Increase research productivity by quickly troubleshooting Spark performance issues and any roadblocks in adoption of our compute infrastructure
Build tools and automation to make infrastructure more robust and for reporting cluster cost utilization and efficiency
Manage a large scale Spark cluster (several thousands of EC2 instances) that powers the ML production pipelines fueling innovation for Recommendations research
Collaborate with our Big Data Platform teams to build, deploy and upgrade our compute infrastructure using the the latest and greatest open source libraries

To learn more, here are some talks/blog posts from the team:

Minimum Qualifications

4+ years of relevant experience managing large scale distributed data systems
Strong automation mindset and a passion for root cause analysis and strategies to mitigate issues
Experience in big data technologies like Spark, Mesos/YARN/Kubernetes, HDFS or ElasticSearch
Experience with performance tuning and debugging scalability issues of Spark applications
Excellent communication and people engagement skills
Expertise in scripting languages
Experience with Cloud Computing platforms like Amazon AWS

Preferred Qualifications

Exposure to functional languages like Scala
Experience working on Notebooks such as Jupyter or Polynote
Experience working on container (Docker) platforms

Netflix is an equal opportunity employer and strives to builddiverse teams from all walks of life. We offer a unique culture of freedom and responsibility with a clear long-term view. We recommend reading through these to understand what working at Netflix is like.

Netflix

Other Open Roles

Prior Listings

Other Jobs in Data Engineering

Sr. Software Engineer - Spark Compute Infrastructure, ML Platform

Sr. Software Engineer - Spark Compute Infrastructure, ML Platform

Here are some examples of the types of things you would work on:

To learn more, here are some talks/blog posts from the team:

Minimum Qualifications

Preferred Qualifications