Hadoop Site Reliability Engineer

Engineering Team | San Francisco, CA

Use of Hadoop at Dropbox is growing; come join us. With over 1,000 nodes in production, we’re looking for additional members for a tiny team that’s shaping Hadoop at Dropbox into a cohesive, centrally-managed data platform. Today Dropbox is primarily using Hadoop for HDFS, HBase, MRv1, Hive, and Presto.

Responsibilities

Get involved in every part of our Hadoop stack—from the earliest stage of system design and development to deployment, troubleshooting, and performance analysis
Design and build tools to manage a rapidly growing number of services
Work with various teams including Analytics, Data Infrastructure, System Engineering, and Capacity Planning
Help build tooling for testing, monitoring, capacity planning, and hardware acceptance
Have the freedom to open source your contributions to the Hadoop ecosystem
Participate in a periodic on-call rotation

Requirements

4+ years of SRE experience, including 2+ years of Hadoop operations experience. Experience with HBase and other Hadoop components is a bonus.
Extensive experience in managing large scale systems.
Expert-level Linux system administration skills. Ubuntu Linux is a plus.
Shell scripting and high-level language expertise. We like Python a lot. We like Go, too. Experience in JVM performance tuning is a plus.
Fanaticism about automation—make the computers do the work for you.