Scribd

San Francisco

201-500 employees

The best audiobooks, books, and more, available anywhere. Join us on our mission to change the way the world reads.

Prior Listings

Other Jobs in Data Engineering

See all

Senior Data Engineer

Scribd

Senior Data Engineer

Data Engineering

This job is no longer open

At Scribd (pronounced “scribbed”), we believe reading is more important than ever. Join our cast of characters as we build the world’s largest and most fascinating digital library: giving subscribers access to a growing collection of ebooks, audiobooks, magazines, documents, and more. In addition to works from major publishers and top authors, we also create our own original content exclusively for Scribd users. Our community includes over 1M subscribers in more than 190 countries. Join us in turning screen time into quality time!

What you'll do

Data quality and integrity are two areas of focus for your work in our existing, organically-grown data infrastructure. You will be in charge of building tools and technology to ensure that downstream customers can have faith in the data they're consuming. Based on the project, this might involve cross-functional work with the Data Science and Content Engineering teams to repartition or optimize business-critical Hive tables, or working with Core Platform to implement better processing jobs for scaling our consumption of streaming data sets. Almost everything you would be working on would be to increase the "customer satisfaction" for internal customers of Scribd data.

Required Skills

Strong written and verbal communication skills (we're remote!)
You have 5+ years experience in data engineering
You have engineered scalable software using big data technologies (e.g. Hadoop, Spark, Hive, Flink, Samza, Storm, Elasticsearch, Druid, Cassandra, etc)
You have experience building data pipelines (real-time or batch) on large complex datasets
Fluency with at least one dialect of SQL (MySQL and Hive preferred)
Expertise in Scala, Java, or Python

Desired Skills

You have worked on and have knowledge of Streaming platforms, typically based around Kafka.
Strong grasp of AWS data platform services and their strengths/weaknesses.
Strong experience using Jira, Slack, JetBrains IDEs, Git, GitLab, GitHub, Docker, Jenkins, Terraform.
Experience using DataBricks

This job is no longer open