Responsibilities
Support planning, execution, hiring, and mentoring staff, as well as your hands-on development within the role
Develop and maintain developer tooling to support PySpark data pipelines
Implement processes and systems to ensure data is accurate and available for key stakeholders and business processes
Develop and maintain data platform components including; Kafka data producers and consumers, pipeline architecture, data lake, data warehouse, and Business Intelligence tooling
Collaborate closely with fellow data team members as well as tech and product teams and company leaders
Support continuing increases in data velocity, volume, and complexity
Write unit/integration tests and document work
Perform data analysis required to troubleshoot data-related issues and assist in the resolution of data issues
Requirements
Experience with or knowledge of Agile Software Development methodologies
Excellent problem solving and troubleshooting skills
Strong SQL and Python development experience
Proven experience with building systems using a microservice architecture
Proven experience with schema design and dimensional data modeling
Proven experience developing pipelines utilizing event streaming data and familiarity with the Apache Kafka framework
Practical experience with SQL and NoSQL databases
Practical experience supporting Business Intelligence tooling and third-party systems
Experience designing, building, and maintaining data processing systems
Experience working with MapReduce and Spark clusters
Experience detecting and reporting data quality issues
Familiarity with Docker, CI/CD (such as Jenkins/Circle), AWS