Travel Requirement: 2-3 days travel to client office on a monthly basis
Responsibilities:
Design and Development:
- Develop and implement scalable data pipelines using Spark-SQL and PySpark in Azure Databricks.
- Design and build ETL pipelines leveraging Azure Data Factory (ADF).
- Architect and maintain a Lakehouse architecture in Azure Data Lake Storage (ADLS) and Databricks.
Data Preparation and Maintenance:
- Execute data preparation tasks such as data cleaning, normalization, deduplication, and type conversion.
- Monitor and control data processes, identifying and resolving errors promptly.
- Apply corrective actions to ensure data integrity and identify root causes for long-term solutions.
Collaboration and Leadership:
- Collaborate with global Data Science and Business Intelligence teams to share best practices, insights, and innovative solutions.
- Lead projects involving cross-functional teams and contribute actively to initiatives led by others.
Change Management:
- Utilize change management tools, including training, communication, and documentation, to facilitate system upgrades, data migrations, and process changes.
Production Deployment:
- Work closely with the DevOps team to ensure smooth deployment of data solutions in production environments.
Must-Have Skills:
- Proficiency in PySpark and Spark-SQL.
- Strong expertise in Azure ecosystem, including Azure Data Factory (ADF), Databricks, and ADLS.
- Solid experience in designing and implementing ETL pipelines.
- Advanced SQL skills for querying and data transformation.
Good-to-Have Skills:
- Familiarity with DevOps practices and tools for data solution deployment.
- Experience with change management tools and processes.
Desired Attributes:
- Strong problem-solving and analytical skills.
- Effective communication and documentation abilities.
- Ability to work in a collaborative global team environment.
- Leadership experience in managing data engineering projects.