Responsibilities
Create and implementing scalable and reliable ETL/ELT pipelines and processes to ingest data from different data sources
Assist DevOps personnel to maintain blockchain nodes
Assist in the implementation of best in class CI/CD frameworks
Facilitate near real-time data collection
Own technical solutions for the Data Lake Infrastructure
Collaborate and cooperate with other team members to fulfill the data needs
Requirements
5+ years Python/Scala/Java development experience
Experience of working with RestAPI/JSON-RPC
Big data processing experience like Hadoop, Apache Spark or Apache Flink
Experience building data pipelines using workflow management engines such as Airflow, Luigi, Prefect, Google Cloud Composer, AWS Step Functions, Azure Data Factory, UC4, Control-M
3+ years experience of working on cloud or on-prem Big data/MPP platforms(AWS EMR, Azure HDInsight, GCP Dataflow/Dataproc, AWS Redshift, Azure Synapse or BigQuery etc.)
GCP strongly preferred
ElasticSearch preferred
Experience with modern query engines such as Presto/Apache Impala etc.
Desired Qualifications
Excitement for blockchain, Web 3.0, and similar decentralized technologies.
Experience with GitHub Actions and self-hosted runners in particular.
Experience working remotely in a distributed team.
A strong desire to grow and challenge yourself. While this role is mainly focused on maintenance, we would expect you to constantly find ways to improve and automate services under your purview.