Impact the moment
Would you like to work in a collaborative environment where products are developed through pair programming, mentoring and knowledge transfer sessions? Our engineering teams are working in an ecosystem where you can grow your career in a way that fits into your life while developing new skills every day.
We are looking for aData Engineer who knows how to fully exploit the potential of our Spark cluster. You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our feature developers and business analysts. This involves both ad-hoc requests as well as data pipelines that are embedded in our production environment.
What can you expect from the position?
· Creating Python/Spark jobs for data transformation and aggregation
· Producing unit tests for Spark transformations and helper methods
· Writing pydoc-style documentation with all code
· Designing data processing pipelines
What you’ll need to be successful:
· Proficiency with Apache Spark 2.x, and deep understanding of distributed systems (e.g. partitioning, replication, etc)
· Experience using MySQL DB platform, programming SparkSQL, PySpark
· Hands-on SQL, Spark query tuning, and performance optimization
· Preferred to have experience in Python, writing Apache Airflow DAGs, AWS services, data warehouse technologies, Docker, and Kubernetes
· Familiarity with Agile development methodologies
· PHP, Java & Scala desirable but not required
As an education innovation company, we’re proud to play our part by inspiring learners around the world. If you bring your curiosity, we’ll help you grow in a collaborative environment where everyone shares a passion for success.
Are you ready for a new challenge? Apply for a career at McGraw Hill and together, we’ll impact the world.