Data Engineer

Data Engineer

Edge & Node stands as the revolutionary vanguard of web3, a vision of a world powered by individual autonomy, shared self-sovereignty and limitless collaboration. Established by trailblazers behind The Graph, we’re on a mission to make The Graph the internet’s unbreakable foundation of open data. Edge & Node invented and standardized subgraphs across the industry, solidifying The Graph as the definitive way to organize and access blockchain data. Utilizing a deep expertise in developing open-source software, tooling, and protocols, we empower builders and entrepreneurs to bring unstoppable applications to life with revolutionary digital infrastructure.

Edge & Node acts on a set of unwavering principles that guide our journey in shaping the future. We champion a decentralized internet—free from concentrated power—where collective consensus aligns what is accepted as truth, rather than authoritative dictation. Our commitment to censorship resistance reinforces our vision of an unyielding information age free from the grasp of a single entity. By building for open-source, we challenge the stagnant landscape of web2, recognizing that true innovation thrives in transparency and collaboration. We imagine a permissionless future where the shackles imposed by central gatekeepers are not only removed, but relegated to the dustbin of a bygone era. And at the foundation of it all, our trust shifts from malevolent middlemen to trustless systems, leveraging smart contracts to eliminate the age-old vulnerabilities of misplaced trust.

The Data Science team works closely with teams across Edge & Node to deliver high quality data for product research & development and go to market, as well as business analytics. We work across the data lifecycle from infrastructure to data analytics.

We are looking for an early-career Data Engineer to be focused on developing and maintaining data science pipelines. Ideally, the team would like to bring on someone who has experience with the current tools being used by the team which include, but are not limited to, Redpanda, Materialize, and GCP. In this role, you will monitor and maintain reliability of the Redpanda cluster, streaming database, DBT jobs, QoS oracle, and other data engineering systems. You’ll be expected to learn Materialize and help migrate BigQuery models to reduce costs. In addition, you will help establish and maintain good standards around documentation and internal educational tools and respond to data engineering/devops requests in our incident management process.

What You’ll Be Doing

  • Learning our infrastructure and data engineering toolset
  • Partnering closely with our Data Science and SRE teams to perform various data warehouse jobs and periodic RedPanda/streaming database devops tasks
  • Manage historical data models in BigQuery/DBT
  • Develop pipelines to support dashboards and perform devops tasks to support dashboards 

What We Expect

  • Experience with one or more of the following: BigQuery, ETL automation/workflow tools (DBT), BI/dashboarding tools (Apache Superset/Metabase), streaming data platforms (Apache Kafka, Redpanda, or Confluent), or other data engineering and data warehouse toolsets/environments 
  • Some experience or knowledge of container orchestration tools such as Kubernetes and Kustomize preferred
  • Some experience or knowledge of monitoring and alerting (Grafana dashboards) preferred
  • Some experience or knowledge of SQL–able to create and manage tables within a SQL database
  • Proficiency in one or more programming languages, such as Python, R, or Rust 
  • Must be able to to serve on-call shifts and support devops needs
  • Ability to create documentation and communicate with a a variety of audiences
  • Clear communication skills (written and verbal) to document processes and architectures
  • Ability to work well within a multinational team environment
  • Preference to be physically located in The Americas, however the team is open to candidates in European time zones or other locations

About the Graph

The Graph is the indexing and query layer of web3. The Graph Network’s self service experience for developers launched in July 2021. Developers build and publish open APIs, called subgraphs, that applications can query using GraphQL. The Graph supports indexing data from multiple different networks including Ethereum, NEAR, Arbitrum, Optimism, Polygon, Avalanche, Celo, Fantom, Moonbeam, IPFS, and PoA with more networks coming soon. To date, tens-of-thousands of subgraphs have been deployed on the hosted service, and now subgraphs can be deployed directly on the network. Over 28,000 developers have built subgraphs for applications such as Uniswap, Synthetix, KnownOrigin, Art Blocks, Balancer, Livepeer, DAOstack, Audius, Decentraland, and many others.

If you are a developer building an application or web3 application, you can use subgraphs for indexing and querying data from blockchains. The Graph allows applications to efficiently and performantly present data in a UI and allows other developers to use your subgraph too! You can deploy a subgraph to the network using the newly launched Subgraph Studio or query existing subgraphs that are in the Graph Explorer. The Graph would love to welcome you to be Indexers, Curators and/or Delegators on The Graph’s mainnet. Join The Graph community by introducing yourself in The Graph Discord for technical discussions, join The Graph’s Telegram chat, and follow The Graph on Twitter, LinkedIn, Instagram, Facebook, Reddit, and Medium! The Graph’s developers and members of the community are always eager to chat with you, and The Graph ecosystem has a growing community of developers who support each other.

The Graph Foundation oversees The Graph Network. The Graph Foundation is overseen by the Technical Council. Edge & Node, StreamingFast, Messari, Semiotic  and The Guild are five of the many organizations within The Graph ecosystem.

Logos/outerjoin logo full

Outer Join is the premier job board for remote jobs in data science, analytics, and engineering.