Senior Data Engineer

Senior Data Engineer

This job is no longer open
At Scribd (pronounced “scribbed”), we believe reading is more important than ever. Join our cast of characters as we build the world’s largest and most fascinating digital library: giving subscribers access to a growing collection of ebooks, audiobooks, magazines, documents, and more. In addition to works from major publishers and top authors, we also create our own original content exclusively for Scribd users. Our community includes over 1M subscribers in more than 190 countries. Join us in turning screen time into quality time!

What you'll do

Data quality and integrity are two areas of focus for your work in our existing, organically-grown data infrastructure. You will be in charge of building tools and technology to ensure that downstream customers can have faith in the data they're consuming. Based on the project, this might involve cross-functional work with the Data Science and Content Engineering teams to repartition or optimize business-critical Hive tables, or working with Core Platform to implement better processing jobs for scaling our consumption of streaming data sets. Almost everything you would be working on would be to increase the "customer satisfaction" for internal customers of Scribd data.

Required Skills

    • Strong written and verbal communication skills (we're remote!)
    • You have 5+ years experience in data engineering
    • You have engineered scalable software using big data technologies (e.g. Hadoop, Spark, Hive, Flink, Samza, Storm, Elasticsearch, Druid, Cassandra, etc)
    • You have experience building data pipelines (real-time or batch) on large complex datasets
    • Fluency with at least one dialect of SQL (MySQL and Hive preferred)
    • Expertise in Scala, Java, or Python

Desired Skills

    • You have worked on and have knowledge of Streaming platforms, typically based around Kafka.
    • Strong grasp of AWS data platform services and their strengths/weaknesses.
    • Strong experience using  Jira, Slack, JetBrains IDEs, Git, GitLab, GitHub, Docker, Jenkins, Terraform. 
    • Experience using DataBricks
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.