Lead Data Engineer (India 2023)
Location: Remote - India
phData works exclusively in the realm of data and machine learning. By partnering with the biggest cloud data platforms (Snowflake, Azure, AWS, Cloudera), we’re able to help the world’s largest companies solve their toughest challenges.
Our work is challenging and our standards are high, but we invest heavily in our employees, starting with a 2-4 week bootcamp to ensure you’ll be successful. Plus, you’ll get to work with the brightest minds in the industry and the latest data platforms on the market. And, because the data and ML industry is changing rapidly, you will always have the opportunity to learn - whether that’s a new technology, diving deeper into your preferred stack, or picking up an entirely new skill set.
Even though we're growing extremely fast, we maintain a remote-first, exciting work environment. We hire top performers and allow them the autonomy to deliver results. Our award-winning workplace fosters learning, creativity, teamwork. Most importantly, our team has the option to work from the conveniences of their home or work from our state of art Bangalore office located in the heart of the city.
- 2022 Snowflake Partner of the Year
- 2022 Best Places to Work
- Best Places to Work (2017, 2018, 2019, 2020, 2021)
- Inc. 5000 Fastest Growing US Companies (2019, 2020, 2021)
- Minneapolis/St. Paul Business Journal’s Fast 50 (2021)
- Snowflake Elite, Snowpark Accelerated & Snowflake Partner of the Year (2020 & 2021)
Core Competencies
Must Have Skills:
- Good working technical knowledge on end to end data pipeline for small and large scale data set from a variety of sources (structure and/or semi-structure and/or un-structure) to the data platform (Hadoop or Cloud Native Platform) using ingestion/cleansing/transformation process.
- Good and thorough understanding of technical aspect of data engineering work (designing/developing/validation/deployment/monitoring/optimization)
- Have worked on Apache Spark (Cloudera On-Prem or similar Hadoop distribution) or Databricks (AWS or Azure) or any cloud native data services from AWS or Azure or GCP or Snowflake and deployed production grade data solutions.
- Understand the Cloud storage solutions(be it AWS S3 or Azure ADLS Gen2 or GCP Buckets) and know-how with respect to small/large dataset with different data formats.
- Understand the data transformation patterns for small and large scale and different approaches including pros and cons.
- Very good understanding and working knowledge of SQL (standard/analytical/advance) alongside traditional/conventional data warehousing design patterns.
- Solid understanding of data validation process using some kind of utilities or manual process for small and large scale data sets.
- Good working knowledge with bash or python scripting for automation.
- Hands-on experience troubleshooting, optimizing, and enhancing data pipelines and bringing improvements.
- Very well versed with version control, continuous integration and deployment procedure (Ex. GitHub or GitLab or Bitbucket or Code Build or Jenkins etc)
- Good ability to participate and contribute business data requirement, data source integration requirement, SLA requirement, security expectation, and translate them into system requirement specifications.
- Must have working experience with one of the data engineering orchestration tools (like Apache Airflow)
Nice To have Skills (In Past 2 Years)
- End to End Data Migration Experience from Legacy (Oracle or SQL Server or DB2 or Netezza etc) to Snowflake
- Data Transformation tool like dbt (dbt cli or dbt cloud)
- Data Integration Cloud Tool life Fivetran
- Low Code No Code ETL Tool like Matillion
Behavioral Requirement
- Must be curious and hungry to learn.
- Be ready to learn quickly (in a very structured & methodological manner) and adapt to new technologies or new types of tools as and when required.
- Demonstrated ability to work independently as well as with team and customer/client stakeholders
- Good communication skills (verbal and written) - one of the most important skills when working with phData as a consulting and service organization.
- Able to create quality technical documentation and should know how to organize different type of technical documentation, their purpose and basic structure
- Good and collaborative team player with in-house team as well as hybrid team with members from client, vendor, and in-house resources.
- Given the project execution dimension and deliverables, a strong sense of time management is required. (For example, schedule variance, effort variance, cost variance, and so on.).
- The ability to guide and drive a project team during unforeseen circumstances or when risk becomes an issue and the team is racing against deadlines.
- A keen attention to detail is required, whether it is for requirement documentation, code review, architectural review, or any other task that may or may not have an impact on project deliverables.
- Understanding how a team works and the art of delegation, as well as how to get a team to deliver value based on project or organizational goals, are essential.
Qualifications Requirements
- BE/BTech in computer science or MCA or equivalent degree with sound industry experience (8-10 years)
- A minimum of 5 years experience in developing production/enterprise grade big data (like Cloudera, Hortonworks, HDInsight, Hadoop/Spark Cluster) and elementary working knowledge with cloud native data engineering solutions (AWS Data/Storage Service, Azure Data/Storage Services, GCP Data/Storage Services) or 2 years working experience with Snowflake or Databricks technologies.
- Good Programming or Scripting Language Experience (Python or Java or Scala). Must have developed the small or mid side applications or data product with complete SDLC cycle any time in the career.
- Good awareness on how a cloud-based system works (be it AWS or Azure or GCP) including basic and common features like storage, security and data services.