Data Engineer

Data Engineer

This job is no longer open
We are looking for a Data Engineer to join our team! As the first dedicated Data Engineer in our growing data science team, you will support the development of cutting-edge machine learning models by transforming data and creating ingress pipelines as well as developing, constructing, and testing cloud-based data architectures. 

You will be collaborating with both data scientists and application-side data engineers, you will optimize the databases you work with using your knowledge of data warehouse solutions, data modeling, and ETL. Although not a data scientist, you’ll be fully integrated into our data science team and the modeling process, serving as an internal resource for all things data, advising on data quality issues, feature creation, and deployment best practices.

Responsibilities:

    • Build and maintain data systems and pipelines to transform legacy data structures distributed across dozens of cloud databases, on-prem databases, and a data warehouse into clean, well-organized, and documented data structures and data features ready for use by data scientists.
    • Develop and maintain the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using primarily SQL Server, DataBricks, and Azure technologies.
    • Use Python and SQL to assist with data cleaning, data profiling, and data exploration.

Qualifications:

    • 3+ years in a data engineer role supporting a data science team, preferably in a Software as a Service (SaaS) company or other environments with web-scale data (billions of rows) and cloud model deployments.
    • Prior programming experience in Python and SQL.
    • Experience with Apache Spark, Hadoop, or other parallel processing technologies preferred.
    • Experience with data visualization libraries or tools preferred.
    • Strong understanding of data transformation techniques and cloud data platforms (preferably Azure Data Factory and Azure ML).
    • High-level familiarity with machine learning techniques and data science libraries (Pandas, Scikit-learn, Keras, TensorFlow, etc.).
    • Ability to thrive in a fast-paced remote-first startup environment.
    • Must have a proactive and self-directed work ethic; excels at identifying opportunities to add value and contribute to the team's overall success.
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.