Please visit here for more information about our Hiring Process.
About the role:
DataCamp, being a data-driven organization runs a data lake on GCP Big Query and reports are created in Metabase, Power BI and in custom Shiny applications. These reports help the company’s teams and leadership members take action using this data. DataCamp’s Airflow cluster runs a thousand tasks each day (from data ingestion pipeline tasks, to data processing tasks that provide data-sets and data-models used by DataCamps team of data scientists).
To facilitate data processing we have a highly automated pipeline built with Terraform and Ansible which allows infrastructure provisioning of all data engineering tooling, this allows DataCamps data scientists and customers to be provided with all the latest data-sets, refreshed on a daily basis. Through good documentation and continuous improvement, we want to continue to enhance the data engineering capability at DataCamp.
It will be your role as a part of the cross functional Infrastructure and Data team and to work directly with the senior data-engineer and data science team on all data engineering initiatives from the business. You will learn how to maintain and create new data pipelines, and you will be managing company wide shared data resources which support our data architecture, and building upon those internal processes as well as having the creative freedom to shape the processes and roadmap for data engineering at DataCamp.
The team has a strong bias towards providing self-serve systems for deployment and infrastructure provisioning, and aim is to support other teams using these services, making sure they are available and functional, rather than being a central bottleneck in the company. You will under the guidance of our senior data engineer play a key part in planning future improvements and owning your day to day work.
Besides providing data engineering skills to DataCamp you will equally be adept at writing Python and having an understanding of authoring Data Models and a passion for data science and data management, governance and Security on the platform (Python, R, SQL, ..). Evolutions towards regional deployment models are envisioned and will be pivotal for the growth of DataCamp and its data engineering capability.
The ideal candidate:
It's a plus if: