Location: North America (North America Timezone Coverage Required)
**Remote/Work From Home Opportunity**
Linux /Hadoop Ecosystem Specialist
Do you thrive on solving tough problems under pressure? Are you motivated by fast-paced environments with continuous learning opportunities? Do you enjoy collaborating with a team of peers who push you to constantly up your game?
At Pythian, we are building a Site Reliability Engineering team that is focused on Hadoop service operations and open source, cloud-enabled infrastructure architecture. We need motivated and talented individuals on our teams, and we want you!
You’ll act as a technology leader and advisor for our clients, as well as a mentor for other team members. Projects would include things such as Hadoop deployment, upgrade, disaster planning, system and ecosystem tuning, infrastructure architecture, performance analysis, deployment automation, and intelligent monitoring.
You will work with amazing clients from small, high-velocity startups to large enterprises with complex, hybrid infrastructures and large data processing requirements.
- Flexible environment: Work remotely from home
- Outstanding people: Collaborate with the industry’s top minds.
- Generous vacation: Start with a minimum 3 weeks’ vacation.
- Personalized training allowance: Hone your skills or learn new ones; experiment and explore using our in-house sandbox; participate in professional development days.
- Fun, fun, fun: Blog during work hours; take a day off and volunteer for your favorite charity.
- Deploy, operate, maintain, secure and administer solutions that contribute to the operational efficiency, availability, performance and visibility of our customers’ infrastructure and Hadoop platform services, across multiple vendors (i.e. Cloudera, Hortonworks, MapR).
- Gather information and provide performance and root cause analytics and remediation planning for faults, errors, configuration warnings and bottlenecks within our customers’ infrastructure, applications and Hadoop ecosystems.
- Deliver well-constructed, explanatory technical documentation for architectures that we develop, and plan service integration, deployment automation and configuration management to business requirements within the infrastructure and Hadoop ecosystem.
- Understand distributed Java container applications, their tuning, monitoring and management; such as logging configuration, garbage collection and heap size tuning, JMX metric collection and general parameter-based Java tuning.
- Observe and provide feedback on the current state of the client’s infrastructure, and identify opportunities to improve resiliency, reduce the occurrence of incidents and automate repetitive administrative and operational tasks.
- Contribute heavily to the development of deployment automation artifacts, such as images, recipes, playbooks, templates, configuration scripts and other open source tooling.
- Be conversant about cloud architecture, service integrations, and operational visibility on common cloud (AWS, Azure, Google) platforms. Understanding of ecosystem deployment options and how to automate them via API calls is a huge asset.
- Understand the end-to-end operations of complex Hadoop-based ecosystems and manage / configure core technologies such as HDFS, MapReduce, YARN, HBase, ZooKeeper and Kafka.
- Understand the dependencies and interactions between these core components, alternative configurations (i.e. MRv2 vs Spark, scheduling in YARN), availability characteristics and service recovery scenarios.
- Identify workflow and job pipeline characteristics and tune the ecosystem to support high performance and scalability, from the infrastructure platform through to the application layers in the ecosystem.
- Understand and enable metric collection at all layers of a complex infrastructure, ensuring good visibility for engineering and troubleshooting tasks, and ensure end to end monitoring of critical ecosystem components and workflows.
- Understand the Hadoop toolset, how to manage and copy data between and within a Hadoop cluster, integrate with other ecosystems (for instance, cloud storage), configure replication and plan backups and resiliency strategies for data on the cluster.
- Comprehensive systems hardware and network troubleshooting experience in physical, virtual and cloud platform environments, including the operation and administration of virtual and cloud infrastructure provider frameworks. Experience with at least one virtualization and one cloud provider (for instance, VMWare, AWS) is required.
- Experience with the design, development and deployment of at least one major configuration management framework (i.e. Puppet, Ansible, Chef) and one major infrastructure automation framework (i.e. Terraform, Spinnaker, CloudFormation). Knowledge of DevOps tools, processes, and culture (i.e. Git, continuous integration, test-driven development, Scrum).
- Ability to pick up new technologies and ecosystem components quickly, and establish their relevance, architecture and integration with existing systems.
Want to know more? See what puzzles we’ve been working on - read posts from our Pythian Blog!
Intrigued to see what it is like to work at Pythian or check us out @Pythian and #pythianlife.
Stay connected with us! Follow @PythianJobs on Twitter and @loveyourdata on Instagram!
- An equivalent combination of education and experience, which results in demonstrated ability to apply skills will also be considered.
- Pythian is an equal opportunity employer.
- All applicants will need to fulfill the requirements necessary to obtain a background check.
- Pythian will not sponsor, or file petitions of any kind on behalf of, a foreign worker