Data Platform Engineer

Data Platform Engineer

This job is no longer open

About us

GetinData is a Big Data solution provider who helps organizations with processing and analyzing a large amount of data. Company was founded in 2014 and collected a group of experienced and passionate Big Data experts with proven track of records. Currently our team consists of more than 80 Big Data experts and we still grow!

We mainly work with customers from Sweden, Switzerland and Poland. So far, we have helped 30+ of companies ranging from fast growing start-ups to large corporations in banking, pharmacy, telco, FMCG and media sectors. Besides projects we deliver practical Big Data Trainings.

We also have a big input in the Big Data community in Poland by co-organizing the largest technical Big Data conference in Warsaw and meetups of Warsaw Data Tech Talks.

Currently, we are looking for a Data Platform Engineer to our project.


Customer

Our customer is one of the first banking institutions to process client-focused data on an open-source based Big Data platform. Thanks to the data democratization, 50% of employees have a controlled and secure access to the information, thanks to which they can make data-driven decisions, while reducing data discovery time by 30%!

Due to its structure, the Data Analytics Platform is adaptable to highly regulated markets of over 40 individual countries.

Everyone of us is a client of financial institutions and we put a lot of trust in them.
That’s why all the platforms deployed by such organizations, must not only be effective, but also have the highest level of data security.

Project

As a Data Platform Engineer, you will play a pivotal role in providing and developing a Data ingestion framework and Data discovery service for Data Analytical Platform based on open source-technologies. Preparation of requirements and implementations of Data Management solutions.

  • Service stack migration from on-prem k8s to GCP and it’s managed services

  • R&D of data ingestion framework in both batch and streaming manner

  • R&D of data discovery service currently based on amundsen


Responsibilities

  • Development of an advanced analytical platform. We use open-source technologies (including Kafka, Spark, Presto, Airflow, Jupyter), and the platform is built in a portable way, so that it can also be run on the public cloud (project includes K8s, Docker, Ceph, S3 API)

  • Implementation of processes collecting, analyzing and loading data into the platform

  • Implementation of the necessary, pioneering solutions to help others work with data and metadata management, e.g. data discovery, data lineage, data profiling. Apache Atlas (ultimately to be replaced) and fresh products such as Amundsen from Lyft, Deequ from Amazon, building a platform for running ML experiments using tools such as MLflow or Kubeflow

  • Contributions to open-source projects, promoting the solution being built at conferences and blogs

Key technologies and programming languages

Software Engineering:

  • Python - most codebase

  • Java/Scala - 10%-15% codebase

Data Engineering:

  • Airflow
  • Python
  • Spark
  • Amundsen
  • Cassandra
  • Kafka

DevOps:

  • Kubernetes + Helm
  • Azure DevOps
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.