Data Engineer

Data Engineer

This job is no longer open
Bixal will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. If you require any accommodation as part of our recruitment process, please contact us at Talent@bixal.com. You can expect a response from a team member within 24 hours during the regular work week and on the next operating day during the weekend or holidays.


Location
This role can work remotely from anywhere in the USA. You must be legally authorized to work in the US.  Bixal does not provide visa sponsorship. 

What will you do?
We are seeking a skilled and motivated Data Engineer to join our dynamic team as we continue to build and optimize data pipelines using PySpark, Databricks, AWS, and other cutting-edge technologies. The successful candidate will have experience working with large volumes of data in an Agile environment, employing best practices for version control using Git, and designing and implementing data ingestion jobs using Terraform or similar tools.

All roles at Bixal will:

    • Participate in client events and forums to positively represent the organization and develop and maintain relationships with the industry.    
    • As appropriate, play an active role in new client acquisition by supporting business development initiatives to enhance and grow Bixal's business in each of the areas that it works. 

Responsibilities:

    • Collaborate with data scientists, analysts, and other engineers to design and develop complex data pipelines using PySpark, Databricks, and AWS services like S3, EC2, EMR, RDS and IAM roles and policies.
    • Write clean, efficient, and well-documented code that can be easily maintained and extended by the team.
    • Implement new data ingestion jobs utilizing Terraform or other infrastructure as code tools to automate workflows and improve overall data processing efficiency.
    • Optimize existing data pipelines for performance, scalability, and reliability using best practices in distributed computing.
    • Contribute to the continuous integration and delivery of high-quality software by collaborating with team members on Agile methodologies.
    • Document and maintain technical documentation related to data engineering processes, tools, and infrastructure.
    • Collaborate with DevOps engineers to ensure that CI/CD pipelines are functioning effectively and efficiently, and that deployment processes are well-defined and efficient.
    • Provide support for critical production systems as needed.

Qualifications:

    • We are looking for a candidate with a bachelor’s degree in a technology field and 4+ years of experience in a Data Engineer role.
    • 4+ years of experience working with PySpark or other distributed computing frameworks (Python preferred).
    • Proficiency in Python and Scala programming languages for data engineering tasks.
    • Experience with Databricks (Databricks Notebooks, workflows, Unity Catalog)
    • Experience with AWS services such as S3, EC2, EMR, RDS and IAM roles and policies.
    • Strong knowledge of data pipeline design and implementation, including data transformation techniques, data storage optimization, and data security best practices.
    • Proficient at using version control systems like Git for managing code repositories and collaborating with team members on Agile projects.
    • Familiarity with Terraform or other infrastructure as code tools for automating infrastructure deployment and configuration management.
    • Experience working on Linux environments for data engineering projects, including accessing containers remotely, installing packages, managing files, services, and processes.
    • Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
    • Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
    • Strong problem-solving skills, ability to work independently and as part of a team, and excellent verbal and written communication skills.
    • Comfortable working in a highly collaborative environment with strong attention to detail and a commitment to delivering high-quality software.
    • Must be eligible for a public trust security clearance.

Nice to haves:

    • Familiarity with cloud computing concepts, particularly as they apply to data engineering, is a plus.
    • Experience working with other data frameworks such as Apache Hive, Apache Hadoop, or Apache Spark is a plus.
    • Experience working with Alation, or other data governance tools is a plus.
    • Federal consulting experience
$100,000 - $125,000 a year
Perks & benefits
Competitive base salary
Flex hours
Work from home flexibility
401K with matching incentive
Parental leave
Medical/dental/vision benefits
Flex spending account
Company provided short-term disability
Company provided life insurance
Commuter benefits
Generous PTO
11 paid holidays
Professional development opportunities
Business development incentive bonuses

Please note that candidates selected may undergo a background investigation and, if applicable, meet eligibility requirements for suitability.

Bixal is an equal opportunity and affirmative action employer. It ensures equal employment opportunity without discrimination or harassment based on race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity or expression, age, disability, national origin, marital or domestic/civil partnership status, genetic information, citizenship status, veteran status, or any other characteristic protected by law. We are dedicated to promoting diversity, equity, and inclusion within our organization and beyond.
This job is no longer open
Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.