Connecting...

Big Data Engineer - Fully Remote (EST)

Job Title: Big Data Engineer - Fully Remote (EST)
Contract Type: Contract
Location: United States
Industry:
Start Date: ASAP
Contact Name: Tyler Gerhardt
Contact Email: tyler.gerhardt@twentyrecruitment.com
Job Published: June 16, 2021 17:13

Job Description

The Big Data Engineer is responsible for engaging in the design, development and maintenance of the big data platform and solutions. This includes analytical solutions that provide visibility and decision support using big data technologies.

The role involves administering a Hadoop cluster, developing data integration solutions and working with data scientists, system administrators and data architects to ensure the platform meets business demands.

Minimum Qualifications

  • Bachelor’s degree in computer science, information technology, or a related field or equivalent experience

Preferred Qualifications

  • 2 years of experience with big data/Hadoop distribution and ecosystem tools, such as Hive, HBase, Spark, Kafka, NiFi and Oozie
  • 2 years of experience developing batch and streaming ETL processes
  • 2 years of experience working with relational and NoSQL databases, including modeling and writing complex queries
  • Master’s degree in computer science, information technology, or a related field or equivalent experience
  • Experience with programming languages, such as Python, Java or C#
  • Experience with Linux system administration, Linux scripting and basic network skills
  • Experience coding against and developing REST API’s

Responsibilities

  • Develop ELT processes from various data repositories and APIs across the enterprise and ensure data quality and process efficiency
  • Develop data processing scripts using Spark
  • Develop relational and NoSQL data models to help conform data to meet users’ needs using Hive and HBase
  • Integrate platform into existing EDW and various operational systems
  • Develop administration processes to monitor cluster performance, resource usage, backup and mirroring to ensure a highly available platform
  • Address performance and scalability issues in a large-scale data lake environment