Data Engineers on the Disney Streaming Machine Learning and Innovation team develop and maintain systems and datasets that are used for content recommendation and personalization on Disney Streaming’s suite of streaming video apps, notably Disney+ and Hulu. In this role you will partner with Applied Machine Learning Engineers and Data Scientists to help manage and scale processes to create algorithm input features and datasets. As a member of this team you will collaborate across Engineering and Data teams to identify internal and external data sources to design and implement ETL strategy, automation frameworks, and scalable data pipelines.
Responsibilities:
Partner with technical and non-technical colleagues to understand algorithm feature and data requirements
Work with Engineering teams to collect required data from internal and external systems
Develop and maintain ETL routines using orchestration tools such as Airflow and Jenkins
Collaborate with machine learning practitioners to design and build data forward solutions
Deploy scalable streaming and batch data pipelines to support petabyte scale datasets
Enforce common data design patterns to increase code maintainability
Create ETL architecture designs and conduct reviews
Perform ad hoc data analysis as necessary
Partner with team leads to identify, design, and implement internal process improvements
Drive and maintain a culture of quality, innovation, and experimentation
Work in an Agile environment that focuses on collaboration and teamwork
Basic Qualifications:
3+ years of data engineering experience
Bachelor’s degree in computer science, engineering, math, statistics, or relevant experience
Deep knowledge of the Python data ecosystem
Experience in building large datasets and scalable services
Experience deploying and running services in AWS and in engineering big-data solutions using technologies like Databricks, EMR, S3, and Spark
Experience loading and querying cloud-hosted databases such as Redshift and Snowflake
Experience designing and developing backend microservices for large scale distributed systems using gRPC or REST
Experience with large-scale distributed data processing systems and cloud infrastructure such as AWS or GCP and container systems such as Docker or Kubernetes
Excellent communication and people engagement skills
Preferred Qualifications:
Knowledge of the Scala and Java data ecosystems
Experience building streaming pipelines using Kafka, Spark, Flink, or Samza
Drive and maintain a culture of quality, innovation, and experimentation
Mentor colleagues on best practices and technical concepts of building large scale solutions
Additional Information:
Location: New York, NY, San Francisco, CA or Seattle, WA preferred but also open to US Remote for the right candidate
#DisneyTech