The Lead Technical Consultant- Data Engineering is expected to have strong development and programming skills in Spark with a focus on Scala/Java and other ETL development experience in the big data space and on cloud platforms, to be experienced and fluent in agile development and agile tools as well as code repositories and agile SDLC/DevOps frameworks.
- Work with data engineering team to define and develop data ingestion, validation, transformation, and data engineering code.
- Develop open-source platform components using Hadoop, Spark, Scala, Java, Oozie, Hive and other components
- Deliver on cloud platforms and integrate with services such as Azure Data Factory, ADLS, Azure DevOps, Azure Functions, Synapse or AWS Glue, Redshift, Lambda, and S3
- Document code artifacts and participate in developing user documentation and run books
- Troubleshoot deployment to various environments and provide test support.
- Participate in design sessions, demos, and prototype sessions, testing and training workshops with business users and other IT associates
- At least 3+ years of experience in developing large scale data processing/data storage/data distribution systems, preferably Databricks
- At least 3+ years of experience on working with large Hadoop projects using Spark and Python and working with Spark DataFrame, Dataset APIs with SparkSQL as well as RDDs and Scala function literals and closures.
- Hands-on experience with Hadoop, Hive, Sqoop, Oozie, HDFS. Great SQL Skills.
- Experience with ELT/ETL development, patterns and tooling is recommended
- Experience with AWS and/or Azure cloud environments
- Experience with SQL including Postgres, MySQL RDBMS platforms
- Experience with Linux (RHEL or Centos preferred) environments.
- Experience with various IDE and code repositories as well as unit testing frameworks.
- Experience with code build tools such as Maven.
- Fundamental knowledge of distributed data processing systems and storage mechanisms.
- Ability to produce high quality work products under pressure and within deadlines with specific references
- Strong communication and collaborative skills
- At least 5+ years of working with large multi-vendor environment with multiple teams and people as a part of the project
- At least 5+ years of working with a complex Big Data environment
- 5+ years of experience with JIRA/GitHub/Git and other code management toolsets
Preferred Skills and Education:
- Bachelors’ degree in Computer Science or related field
- Certification in Spark, Databricks, AWS, Azure or other cloud platform