Solutions Architect - Data Engineering

Solutions Architect - Data Engineering

phData works exclusively in the realm of data and machine learning. By partnering with the biggest cloud data platforms (Snowflake, Azure, AWS, Cloudera), we’re able to help the world’s largest companies solve their toughest challenges.

Our work is challenging and our standards are high, but we invest heavily in our employees, starting with a 2-4 week bootcamp to ensure you’ll be successful. Plus, you’ll get to work with the brightest minds in the industry and the latest data platforms on the market. And, because the data and ML industry is changing rapidly, you will always have the opportunity to learn - whether that’s a new technology, diving deeper into your preferred stack, or picking up an entirely new skill set.

Even though we're growing extremely fast, we maintain a remote-first, exciting work environment. We hire top performers and allow them the autonomy to deliver results. Our award-winning workplace fosters learning, creativity, teamwork. Most importantly, our team has the option to work from the conveniences of their home or work from our state of art Bangalore office located in the heart of the city.

  • 2022 Snowflake Partner of the Year
  • 2022 Best Places to Work
  • Best Places to Work (2017, 2018, 2019, 2020, 2021)
  • Inc. 5000 Fastest Growing US Companies (2019, 2020, 2021)
  • Minneapolis/St. Paul Business Journal’s Fast 50 (2021)
  • Snowflake Elite, Snowpark Accelerated & Snowflake Partner of the Year (2020 & 2021)

Core Competencies

The role of a data architect in phData involves working with clients to understand their business goals and data needs, and then designing and implementing data architecture solutions that align with those goals.

Must Have Technical Delivery Skills

  1. Strong working experience with Hadoop & Apache Spark (on Prem or Cloud version) including designing, developing, maintaining and optimizing activities for production grade applications.
  2. Deep technical knowledge on end to end data pipeline for small and large scale data sets from a variety of sources (structure and/or semi-structure and/or unstructured) to the data platform (like Hadoop Platform or Cloud Native Data Platforms) using ingestion/cleansing/transformation/validation process. 
  3. Good working knowledge on how to architect small/large scale complex data requirements and translate that architecture into a working data solution.
  4. Having working exposure on defining data governance policies and procedures, including data security and access controls.
  5. Understand the data integration & data transformation patterns (one time load, history load, delta load etc) for different scales of data sets including pros and cons and how to apply the right solution for a given business case based on customer provided constraints or other technical constraints.
  6. Good to have some degree of understanding on the Cloud storage services (be it AWS S3 or Azure ADLS Gen2 or GCP Buckets) and know-how with respect to small/large dataset with different data formats.
  7. Very strong understanding and working knowledge on SQL (standard/analytical/advance) alongside traditional/conventional data warehousing design patterns. Also good knowledge on best practices around SQL and how to enforce them in an enterprise environment. 
  8. Solid understanding of the data validation process using some kind of utilities or automation. 
  9. Good working knowledge with bash scripting or python scripting to enable automation in the unix platform. 
  10. Hands-on experience troubleshooting, optimizing, and enhancing data pipelines and bringing improvements in the production environment.
  11. Strong knowledge and working experience  with one of the version control systems (Ex. GitHub or GitLab or Bitbucket or Code Build) and  continuous integration and deployment procedure patterns in data engineering space.
  12. Good ability to produce architectural and design documents, best practice documents,  data integration diagrams and artifacts related to data design. 
  13. Must have working experience with one of the data engineering orchestration tools (like Apache Airflow or Apache Oozie or any other commercial tool).
  14. Staying up-to-date with industry trends and technologies related to data management and architecture. 

Nice To have Skills (In Past 2 Years)

  1. End to End Data Migration Experience from Legacy (Oracle or SQL Server or DB2 or Netezza etc) to Snowflake
  2. Data Transformation tool like dbt (dbt cli or dbt cloud)
  3. Data Integration Cloud Tool life Fivetran
  4. Low Code No Code ETL Tool like Matillion 

Behavioral Requirement

  1. Must be curious and hungry to learn.
  2. Be ready to learn quickly (in a very structured & methodological manner)  and adapt to new technologies or new types of tools as and when required. 
  3. Demonstrated ability to work independently as well as with team and customer/client stakeholders
  4. Good communication skills (verbal and written) - one of the most important skills when working with phData as a consulting and service organization.
  5. Good and collaborative team player with in-house team as well as hybrid team with members from client, vendor, and in-house resources. 
  6. Given the project execution dimension and deliverables, a strong sense of time management is required. (For example, schedule variance, effort variance, cost variance, and so on.).
  7. The ability to guide and drive a project team during unforeseen circumstances or when risk becomes an issue and the team is racing against deadlines.
  8. A keen attention to detail is required, whether it is for requirement documentation, code review, architectural review, or any other task that may or may not have an impact on project deliverables.
  9. Understanding how a team works and the art of delegation, as well as how to get a team to deliver value based on project or organizational goals, are essential.

Team Management Skills

  1. Leading and motivating a team of data engineers to achieve project and organizational goals.
  2. Providing guidance and support to immediate team members to help them develop their skills and careers.
  3. Setting performance expectations and conducting performance evaluations for team members.
  4. Identifying and addressing conflicts or problems within the team, and facilitating resolution.
  5. Supporting and promoting a positive and inclusive team culture.
  6. Ensure the team members are learning and staying up-to-date with new technologies and making sure all of them are aligned with the organization's larger objectives. 

Qualifications & Other Requirements

  1. BE/BTech in computer science or MCA or equivalent degree with sound industry experience (10+  to 15 years)
  2. A minimum of 5 years experience in developing production/enterprise grade  big data (like Cloudera, Hortonworks, HDInsight, Hadoop/Spark Cluster) and elementary working knowledge with cloud native data  engineering solutions (AWS Data/Storage Service, Azure Data/Storage Services, GCP Data/Storage Services) or 2 years working experience with Snowflake or Databricks technologies. 
  3. Good Programming or Scripting Language Experience (Python or Java or Scala). Must have developed a small or mid side applications or data product with complete SDLC cycle.
  4. Good awareness on how a cloud-based system works (be it AWS or Azure or GCP) including basic and common features like storage, security and data services.

Perks and Benefits:

  • Medical Insurance for Self & Family
  • Medical Insurance for Parents
  • Term Life & Personal Accident
  • Wellness Allowance
  • Broadband Reimbursement
  • Professional Development Allowance
  • Reimbursement of Skill Upgrade Certifications
  • Certification Bonus
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.