Data Engineer II

This job is no longer open

New Visions works to make the public education system in New York City a place where students from every background can graduate high school and successfully transition into their post-secondary future. The School Systems and Data Analytics Department aims to accomplish this mission by creating a comprehensive data management ecosystem to support school and district users in making decisions that maximize their students’ likelihood of graduating and succeeding beyond high school. We currently support over 900,000 students in nearly 1600 schools, working with teachers and administrators to help students progress towards graduation and beyond into post-secondary success.

The School Systems and Data Analytics department is at the core of supporting New Visions staff and schools in translating data into action. Multiple times per day, we manage the processing of data from multiple internal and external source systems into New Visions databases and into live tools, providing staff with actionable, timely, and accessible information to make data-informed decisions. 

The Data Engineer II plays a crucial role within the unit and the organization, working closely with the data team and portal team to develop and maintain a robust data model, monitor and troubleshoot core data processing pipelines, and manage the New Visions’ data tools ecosystem to provide the right data quickly to key stakeholders. The Data Engineer II is primarily responsible for building and maintaining the infrastructure used to operate and scale the data platforms that the organization supports. 

Who You Are

You are excited about public service and the prospect of solving problems that are challenging and affect urban schools everywhere.

You are detail oriented and enjoy organizing data in a way that will facilitate the work of team members.

You care about creating a data model that is optimized for performance and quality

You enjoy working with analysts, designers and product managers to determine the best way to grow our data model to accommodate new features and tools.

You love working in teams to solve complex challenges.  You thrive in a fast-paced, highly collaborative environment

What You’ll Do

  • Develop, monitor, and improve the NV data model and its pipelines
  • Use a combination of R, Python, SQL and other tools to manipulate and transfer data
  • Create cross-sectional and longitudinal data sets from raw data files
  • Collaborate with software engineers and product managers  to create schema in mongoDB and deliver data that fulfills feature requests
  • Create systems for assuring data quality and accuracy
  • Create alerts and process-monitoring tools to understand the flow of data
  • Ensure consistency between data analyses, Google-based tools, Tableau dashboards, and the NV portal
  • Manage infrastructure for processing large data sets for use in NV data tools and the NV data portal
  • Maintain repository of R scripts for automated overnight data processing and incorporate new data streams as needed
  • Research, test, and integrate methods to streamline the NV data model
  • Support the integration of additional data sources into New Visions data warehouse and data tools
  • Support the operationalizing of robust data quality assurance within data infrastructure
  • Monitor and evaluate core data processing efforts to identify areas for improvement as well as troubleshoot technical issues
  • Collaborate with internal colleagues to support current and forthcoming features for NV tools and the NV data portal
  • Collaborate with product managers, designers, analysts and software engineers to meet product specifications
  • Communicate best-practices in data engineering to the respective teams 
  • Deliver data to the teams that meet product specifications
  • Provide metrics of data processing to highlight areas for growth in core data processing efforts

Required Knowledge and Skills

  • A minimum of 4 years of experience in data engineering, software engineering, or advanced data analytics
  • Proficiency in R and/or Python required
  • Proficiency with SQL databases (Redshift, Postgres, etc.)
  • Proficiency with noSQL databases (mongoDB, Cassandra, etc.)
  • Proficiency with ETL development
  • Expertise managing data pipelines to support continuing increases in data volume and complexity
  • Demonstrated ability to manage Airflow instances and develop best practices for scheduling jobs
  • Demonstrated ability in SQL, as well as common practices in schema design and data storage
  • Exceptional strategic, analytical, and critical thinking skills
  • Strong project management and organizational skills.
  • Excellent written and oral communication skills.
  • Close attention to detail
  • Demonstrated ability to prioritize, multi-task, work under pressure and meet deadlines
  • Demonstrated persistence and independence in learning technical subject matter, and in solving technical problems.
  • Demonstrated ability to identify problems and suggest solutions for discussion
  • Demonstrated ability to identify problems and lead improvements to existing codebases and processes
  • Knowledge of public education data in New York State

Desired Knowledge and Skills

  • Expertise in Git and Github - including branching, merging, diffs, and hotfixes
  • Expertise in Python, R and SQL
  • Experience building scalable solutions with AWS Redshift, MongoDB, and PostgreSQL
  • Experience using big data technologies (Hadoop, Spark, etc.)
  • Experience building data pipeline tools to manage data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it

Our Technology Stack

  • Data/Database Layer 
    • AWS (Redshift, S3, RDS), MongoDB 
  • Code 
    • Linux, R and Python
  • Orchestration Layer
    • Apache Airflow, Docker, AWS ECS
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.