Data Engineer

Data Engineer

This job is no longer open
Labelbox’s mission is to build the best products for humans to advance artificial intelligence. Real breakthroughs in AI are reliant on the quality of the training data. Our training data platform enables organizations to improve their machine learning models far quicker and more accurately. We are determined to build software that is more open, easier-to-use, and singularly focused on getting our customers to performant ML faster.  

Current Labelbox customers are transforming industries within insurance, retail, manufacturing/robotics, healthcare, and beyond. Our platform is used by Fortune 500 enterprises including Allstate, Black + Decker, Bayer, Warner Brothers and leading AI-focused companies including FLIR Systems and Caption Health. We are backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures (Google's AI-focused fund), Databricks Ventures, Snowpoint Ventures and Kleiner Perkins.

About the Role

Labelbox is hiring a Data Engineer to build new data pipelines and scale existing ones. As our company grows, this person will build data infrastructure that brings together tech, product, and operational functions and informs strategic decision making at the executive level. You will be responsible for transforming raw data in the data warehouse into clean, reliable, organized data models that allow our organization to make informed data-driven decisions. Our tech stack currently consists of Bigquery, DBT, and Looker along with other tools to replicate all of our data to our data warehouse.


What You'll Do

    • Develop and optimize large-scale batch and real-time data pipelines that ingest structured and unstructured data from a variety of sources using a combination DBT, Fivetran, and other tools
    • Build, rebuild and performance tune data transformation tasks within the central data store
    • Take over and scale our DBT and Looker setup
    • Manage incoming data requests and prioritize the highest value projects in an organized fashion
    • Communicate data-backed findings to a diverse constituency of internal and external stakeholders
    • Help create best practices and standards for data modeling, documentation, and testing
    • You will have strong autonomy designing and implementing operationally excellent data interfaces
    • Rigorously design data warehouse schemas to allow for performant access to digestible datasets
    • Become the analytics infrastructure and tooling expert, supporting business-focused pipelines and data interfaces
    • Data modeling, Data warehouse management, and Data orchestration

About You

    • Expert-level SQL skills
    • Experience in a role performing data warehouse and analytics solution design and development using a variety of techniques such as clustering and partitioning on tables over 1B rows
    • Understanding of data architecture design, data modeling, and physical database design and tuning
    • Hands-on experience in the implementation of cloud data warehouses using Bigquery, postgres, and Mysql databases
    • Experience using DBT
    • Knowledge of data visualization tools such as Looker
    • Hands-on coding experience in Python

Technology You’ll Use

    • Bigquery/GCS, Mysql, Postgres
    • DBT
    • Fivetran
    • Looker
    • Github
    • Jira

Do great work. From anywhere.

We hire great people regardless of where they live. Work wherever you’d like as reliable internet access is our only requirement. We communicate asynchronously, work autonomously, and take ownership of our work.

#LI-Remote
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.