Data Engineer

Data Engineer

This job is no longer open

At Overstory, we process gigabytes of geospatial data every day and we are looking for a Data Engineer to help us with that. We have built a robust cloud-based data-processing system and now we need to scale it up as our business grows.

You will help us develop our ETL pipelines and services. Working within our Data Platform team, you will work on systems that process satellite imagery and other data and make it available to our data science team. You will develop systems that download, process, store, and analyze geographical data. You will work with our current data sources and explore new ones. You will support our continued efforts on standardizing and versioning our datasets and models. You’ll build prototypes to try things out quickly and robust data pipelines when it’s time to deploy to production.

About this role

You are creative and curious. Your work is thorough and you pay attention to both the small details and the larger picture. You can work in a flexible manner, supporting the team in building prototypes quickly, and evolving these into scalable production-quality systems. Teamwork is at your core and you like to help others grow and succeed. You have a passion for the urgency of solving our climate crisis.

About you

  • At least 4 years of experience scoping, designing, and implementing production-grade data pipelines and processes to meet commercial requirements.
  • Extensive working knowledge of Python, SQL, PostgreSQL, (preferably PostGIS), and Git. 
  • Experience working in a collaborative software development environment with the following tools: version control (Git), code-reviews, release life-cycle management.
  • Experience working with satellite imagery or another remote sensing domain.
  • Experience working with cloud computing services (GCP, AWS or Azure) and deploying and monitoring large scale parallel distributed computing.
  • Excellent oral and written communication skills.

Nice-to-haves

  • Familiarity with the Pangeo stack (Jupyter Hub, Dask, Xarray, Zarr, and HDF5 file formats) and/or other Python libraries for satellite imagery manipulation.
  • Experience with GDAL, rasterio, and raster operations.
  • Working knowledge of a GIS software package such as QGIS or ArcGIS.
  • Experience setting up and maintaining a data lake or data warehouse for multiple dataset types.
  • Knowledge of geospatial data indexing & storing such as STAC (Spatio-Temporal Asset Catalog).
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.