About the role:
Samsara is seeking an experienced senior data engineer to join our Data Engineering team.
Samsara has hundreds of thousands of devices deployed throughout the world and over 20,000 customers using our cloud based products. This results in a vast amount of data in our central data lake / warehouse.
Our overall goal on the Data Engineering team is to make sure the rest of the company has the correct data sets needed to efficiently and accurately do analysis, train models, and build dashboards off of our product data.
The team is responsible for building data pipelines, primarily in SparkSQL and Pyspark, that exist in our data lake. Our data lake is primary delta/parquet tables on S3, which we process through Databricks. This team has access to all of the “raw” data collected throughout our products. Given we are an IoT company, that’s a lot of data that can be hard for the rest of the company to make sense of. This team becomes deeply familiar with the product and our data in order to build the right tables that the rest of the company can use.
The team works closely with the following teams:
- Data analytics: to build golden data sets that are ready for dashboarding and analytics
- Data engineers across the company (e.g Marketing, Sales): on how to best build pipelines and dashboards off of our product data.
- Data Scientists: on which data sets to use for training and their workflows
Note that there are other data engineering teams throughout Samsara. This team is our product Data Engineering team within R&D, focused on data collected in Samsara products.
In this role, you will:
- Build highly reliable computed tables (including unstructured data like video and audio) combining and transforming data across multiple sources, including Samsara sensor data and customer metadata
- Use Python to access, manipulate, and join external datasets to internal data (e.g., via REST APIs, Pyspark)
- Work closely with stakeholders across the company from product engineers, data scientists, customer support, finance, and more, to build data pipelines that solve business needs
- Champion, role model, and embed Samsara’s cultural principles (Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team) as we scale globally and across new offices
Minimum requirements for this role:
- BA / MS degree in Computer Science, Statistics, or related discipline
- 4+ years experience in data engineering focused team
- Experience with standing up ETL pipelines to handle massive volumes of data
- Experience working with Spark-based data platforms
- Strong proficiency in SQL, Python, and working with REST APIs
- Knowledge of software engineering fundamentals; high level of comfort reading and understanding full-stack / backend development code (e.g., our Go code base)
- Familiarity managing code via git/GitHub or other code versioning tool
An ideal candidate also has:
- Some experience with time series data, including late arriving data
- Experience with product / first party data
- Familiarity with Databricks and running jobs/notebooks there
- 6+ years experience in data engineering focused team