Lark

Mountain View, CA
51-200 employees
Lark is the nation's fastest scaling AI healthcare provider, partnering with some of the largest payers to deliver integrated, accessible, and effective care.

Data Engineer - ETL/Data Pipeline

Data Engineer - ETL/Data Pipeline

This job is no longer open

ABOUT US

Lark is the world's largest A.I. healthcare provider, having provided care to more than a million patients suffering from, or at risk of, chronic diseases like Diabetes and Heart Disease. We’re on a mission to improve people’s health and happiness through our digital health coaching platform and smart connected devices. We are the only A.I. platform ever to become medically reimbursed to 100% replace a live healthcare professional, providing deeply scalable care to create real change. Since launch, Lark has continued to receive awards and accolades for both our product, and our leadership, including:

  • Apple's Top 10 Apps in the World
  • Business Insider's "Most Innovative Companies in the World" along with Uber and Airbnb
  • CEO who was Named as the #1 in "Top 10 Women in Tech to Watch" by Inc Magazine
  • Most Promising Digital Health Companies in the World by CB Insights
  • The 15 Most Promising Companies in Healthcare by FierceHealthcare

  

ABOUT THE ROLE

What You'll Do:

Duties & Responsibilities

  • Member of a small, agile team fairly early in its data-engineering journey and have a tremendous opportunity to make a big impact.
  • Design and build data infrastructure with efficiency, reliability and consistency to meet rapidly growing data needs .
  • Design data pipelines and data integrations to collect, clean and store large datasets (streaming and batch).
  • Maintain the privacy of our users and partners by helping ensure best practices in security and data handling continue as we grow.
  • Help establish and maintain a high-level of operational excellence in data engineering
  • Collaborate with teams across the company to help develop data products that drive company success.
  • Evaluate, integrate and build tools to accelerate Data Engineering, Data Science, Business Intelligence, Reporting and Analytics as needed
  • Drive data literacy across business functions

 

What You’ll Need:

Knowledge & Skills

  • Expertise in Spark (or Storm/Flink/MapReduce/Impala/Hive)
  • In-depth knowledge of AWS (including EMR, DMS, Athena, RDS, Aurora, Lambda, Redshift, etc.)
  • Expertise in stream data processing (e.g., DMS, Flink, Spark, Kinesis, Kafka)
  • Advanced SQL skills
  • Strong proficiency in two of the following programming languages: Python, Scala and Java
  • Demonstrated expertise in Object Oriented and/or Functional programming including a solid grasp of common design patterns, idioms and design
  • Fluency in data structures, algorithms, distributed computing, storage systems, and assorted consistency models.
  • Familiarity with pandas, SciPy, scikit-learn, seaborn, SparkML
  • A love of data and a desire to help people use data effectively in a startup environment
  • Knowledge of multiple database technologies, their tradeoffs, and how to make the best use of each
  • Willingness to learn and mentor in a collaborative team environment
  • Humility with an intrinsic positive drive
  • Passion for developing a world-class engineering culture
  • Value, respect, and an enthusiasm for diversity, inclusion, and alternative perspectives
  • Goal-oriented, with a desire to create an environment of psychological safety
  • Willingness to ask questions and the ability to break down a problem to its smallest parts
  • Effective communication (oral and written!) across a range of audiences
  • Ability to thrive in an environment promoting and enabling collaboration

 

Credentials & Experience:

  • 4+ years data engineering or equivalent knowledge and ability
  • 7+ years software engineering or equivalent knowledge and ability
  • Designing and maintaining at least one type of database (object, columnar, in-memory, relational) experience
  • Relational, object, tabular, key-value, triple-store, tuple-store, and related database types experience
  • Data warehouse modernization, building data-marts, star/snowflake schema designs, infrastructure components, ETL/ELT pipelines, and BI/reporting/analytic tools experience
  • Building production-grade data backup/restore strategies, and disaster recovery solutions experience
  • BS or MS in Computer Science, Mathematics, Computer Engineering or related field, or equivalent experience, knowledge, and ability

Bonus points for familiarity with the following key technologies:

  • SparkML (or scikit-learn/TensorFlow)
  • GraphX (or TitanDB/neo4j/range++/graph engine/orientdb)
  • Delta Lake
  • Airflow (or luigi/oozie/azkaban/pinball/chronos)
  • Snowflake
  • Hadoop Ecosystem (MapReduce/Yarn/HDFS/Pig/Hive)
  • Periscope[Sisense]/Tableau/Domo/Looker/Superset - any fine

JOIN US

Lark offers the option to work remotely when both the employee and the job are suited to such an arrangement. This option is only applicable to US employees due to current regulations. Lark is primarily based in California; as such the Company’s core hours are 10:00 am to 6:00 pm Pacific Coast Time. Please be aware that employees must adhere to these hours, within reason, when working remotely or in office.

Our team works with cutting edge tools and technology related to Artificial Intelligence and Machine Learning. We are using NLP to process millions of meals, and accelerometer data to compute activity and sleep amounts from users' phones. Our chat AI is the most sophisticated digital health engagement tool in the world. Join us and make it even better!

Lark is an Equal Opportunity Employer. Lark does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.

 

 

This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.