Senior Data Engineer

Senior Data Engineer

PostHog is an open-source product analytics platform. We provide product-led teams with everything they need to understand user behaviour, including funnels, session recordings, user paths, multivariate testing and more. PostHog can be deployed to the cloud, or self-hosted on existing infrastructure, removing the need to send data externally.

We started PostHog as part of Y Combinator's W20 cohort and had the most successful B2B software launch on HackerNews since 2012 - with a product that was just 4 weeks old. Since then, we raised $27m from some of the world's top investors, grew the team to over 30 and have shown strong product-led growth.

We’re now looking for a senior Data Engineer to join our Ingestion team. We have a community of over 20k+ developers using PostHog, mostly on the open source product, plus a 1,000+ Slack community and over 7,000 GitHub stars. And all of these numbers are going up, fast.

We hire globally, but are currently restricted to GMT -5 to +2 time zones.

What you’ll be doing:

We are looking for someone to take our ingestion pipeline to the next level. You will be working with our super talented Ingestion small team to iteratively build out and shore up the functionality of our ingestion pipeline. A good chunk of this work will be focussing on our Plugins service. This is the core of our data ingestion pipeline. It is responsible for transforming, augmenting, routing, and backfilling data to many different final destinations including the warehouse that we use to power PostHog, ClickHouse.

If in your spare time you love reading about Designing Data-Intensive Applications and dream about producing and consuming large amounts of data from Kafka, then this is the spot for you!

If you like to see for yourself exactly the kind of projects you would be working on check out these:

  • Reworking our events schema in order to reduce joins at querytime
  • Performing migrations on TBs of data (in someone else's datacenter) with zero downtime!
  • You can read more about that last one (Async Migrations) here
  • Plugin Server source code (light reading 🍿 )

What we value:

  • We are open source - building a huge community around a free-for-life product is key to our strategy.
  • We aim to become _the_ most transparent company, ever. In order to enable teams to make great decisions, we share as much information as we can. In our public handbook _everyone_ can read about our roadmap, how we pay people, what our strategy is, and who we have raised money from. We also have regular team-wide feedback sessions, where we share honest feedback with each other.
  • We’re an all-remote company and writing things down is hugely important to us. We use asynchronous communication to reduce time spent in meetings. We are structured for speed and autonomy - we are all about acting fast, innovating and iterating.
  • We are a global remote working company, which allows us to hire amazing people from all over the world, and foster an inclusive culture.
  • 7+ years of experience designing or operating large scale realtime or near realtime data pipelines
  • Operational knowledge and experience with Kafka at scale
  • Solid backend engineer skills
  • Experience working with relational databases and ability to write SQL
  • Knowledge about distributed systems and event streaming
  • Ability to write Python and/or JavaScript

Nice to haves (if you don't have any of these you should still apply!):

  • Experience deploying realtime or near-realtime data pipelines to K8s environments
  • Knowledge of stateful streaming computation engines like Apache Flink and/or Samza
  • Experience working with and operating a Data Lake / Lake House / Delta Lake at scale
  • Experience being a user and wearing your product analytics is always a huge advantage
  • Experience operating or being a user of ClickHouse or any other data warehouse
  • You enjoy geeking out about serializations and their tradeoffs
  • ClickHouse experience (or another OLAP database / data warehouse)
  • Experience writing (highly performant) production data pipelines
  • Node.js experience
  • Kafka experience

What we offer in return:

  • 💰 Generous, transparent compensation and employee-friendly equity in PostHog
  • 🌴 Unlimited time off with a 25 day minimum (in 2021 the team on average took 32 days off)
  • 🏥 Private medical insurance, including dental and vision (US and UK only)
  • 👵 👴 Pension/401k contributions (4% matching)
  • 🍼 Generous parental, bereavement and child loss leave
  • 📕 Training budget and free books
  • ☕ $200/month budget towards co-working or café working and $250/month for team socials
  • 🧠 Spill mental health chat
  • 🤝 $100/month budget to provide support to open-source projects
  • 💸 We'll be your first investor
  • 🛫 Regular team off-sites (we went to Iceland in March) with carbon offsetting for work travel with Project Wren

    We believe people from diverse backgrounds, with different identities and experiences, make our product and our company better. That’s why we dedicated a page in our handbook to diversity and inclusion. No matter your background, we'd love to hear from you!

    Also, if you have a disability, please let us know if there's any way we can make the interview process better for you - we're happy to accommodate!

Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.