Data Engineer

Data Engineer


Data Engineer

Harvard Medical School


Position Description

The Center for Computational Biomedicine (CCB) is a new center within the Blavatnik Institute at Harvard Medical School. Our mission is to provide cutting-edge computational capabilities, data analysis, and data integration technologies to support medical and biological research within the Medical School. Based at the Harvard Medical School Longwood Campus, we are part of a vibrant community of scientists, physicians, and engineers whose goal is to advance the boundaries of knowledge and improve patient care. The working environment combines the best features of a startup (fast pace, flexibility, flat hierarchies) with those of one of the leading medical schools (excellent benefits, outstanding opportunities for learning, great resources, name recognition).

CCB is looking for an individual to join the Data and Analytic Platforms Group, a group of engineers and scientists developing data warehousing and analytic solutions in support of epidemiology, healthcare economics, machine learning, and basic science research.

The Group works to reduce the burden on faculty by developing centrally managed and shareable data solutions to be used across research silos. We curate very large public and private healthcare utilization (insurance claims, electronic health record), multi-omics, environmental exposure, and social determinants data sets, provision access to those curated data sets, and develop analytic frameworks to accelerate reproducible academic research on top of them. Collectively these data sets contain information relating to hundreds of millions of patients.

This position reports to the Director of the CCB Data and Analytic Platforms Group. Primary responsibilities will include designing and implementing relational database architecture (schema, indexing, stored procedures, ETL processes, etc.) to warehouse multi-terabyte data sets in Microsoft SQL Server. This will include periodically evaluating various query performance metrics to ensure real-time availability to the research community and recommending modifications to the underlying database platform to resolve any identified issues. The bulk of this design work will be left up with the candidate, while a small portion will involve refactoring (or strategically deciding to abandon) existing ETL / indexing strategies. The data sets will be staged into a combination of proprietary schemas as well as the open-source i2b2 data model.

Additional opportunities will be available for the candidate to interact with individual scientific research teams to help improve their workflows.

Basic Qualifications

  • Minimum of seven years’ post-secondary education or relevant work experience

Additional Qualifications and Skills

  • Bachelor’s Degree in Computer Science or related degree preferred. At least 5 years experience as a software systems architect, including experience developing solutions with both relational database systems and at least one of the following languages: Java, Python, R.
  • Master’s Degree in a related field (Computer Science / Electrical Engineering, Bioinformatics, Statistics, Data Science, etc.) preferred.
  • Excellent communication skills, both written and oral
  • Experience with Microsoft SQL Server or cloud-based data warehousing technologies
  • Experience designing and maintaining multi-terabyte analytic relational databases, including index and query optimization
  • Experience orchestrating and optimizing Extract-Transform-Load (ETL) processes for multi- terabyte data warehouses
  • Comfort doing basic system administration in a Linux environment Comfort doing basic system administration in a Windows environment Experience with relational database index optimization
  • Experience with containerized (Docker or Singularity) workflows/paradigms
  • Experience with non-relational database systems (graph, key/value, document, array data stores) Experience with the R statistical computing platform
  • Experience with Java Experience with Python
  • Experience with high-performance computing
  • Comfort independently exploring distributed computing and database technologies and generating executive reports
  • Experience with public cloud platforms (AWS, Azure, Google Cloud)

Additional Information

This is a 12-month term appointment with the possibility of renewal contingent on funding.

The health of our workforce is a priority for Harvard University. With that in mind, we strongly encourage all employees to be up-to-date on CDC-recommended vaccines.

Please note that we are currently conducting a majority of interviews and onboarding remotely and virtually. We appreciate your understanding.

Harvard University offers an outstanding benefits package including:

  • Time Off: 3 - 4 weeks paid vacation, paid holiday break, 12 paid sick days, 12.5 paid holidays, and 3 paid personal days per year.
  • Medical/Dental/Vision: We offer a variety of excellent medical plans, dental & vision plans, all coverage begins as of your start date.
  • Retirement: University-funded retirement plan with full vesting after 3 years of service.
  • Tuition Assistance Program: Competitive tuition assistance program, incredibly affordable classes directly at the Harvard Extension School, and discounted options through participating Harvard grad schools.
  • Transportation: Harvard offers a 50% discounted MBTA pass as well as additional options to assist employees in their daily commute.
  • Wellness options: Harvard offers programs and classes at little or no cost, including stress management, massages, nutrition, meditation, and complementary health services.
  • Harvard access to athletic facilities, libraries, campus events, and many discounts throughout metro Boston.

The Harvard Medical School is not able to provide visa sponsorship for this position.

Not ready to apply? Join our Talent community to keep in touch and learn about future opportunities!

Job Function

Information Technology, Research

Department Office Location

USA - MA - Boston

Job Code

I1359P IT Data Architect Prof V

Work Format




Salary Grade



Center for Computational Biomedicine


00 - Non Union, Exempt or Temporary

Time Status


Pre-Employment Screening

Criminal, Identity


​35 hrs. per week | Monday - Friday | 9:00 am - 5:00 pm

Commitment to Equity, Diversity, Inclusion, and Belonging

We are committed to cultivating an inclusive workplace culture of faculty, staff, and students with diverse backgrounds, styles, abilities, and motivations. We appreciate and leverage the capabilities, insights, and ideas of all individuals. Harvard Medical School Mission and Community Values

EEO Statement

We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, gender identity, sexual orientation, pregnancy and pregnancy-related conditions, or any other characteristic protected by law.

Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.