Apply Now

Site Reliability Engineering Manager

Engineering Team | San Francisco, CA

Site Reliability Engineering (SRE) is a hybrid software/systems group who works with traditional software engineering, capacity engineering, and infrastructure teams to ensure that Dropbox runs smoothly. Managing a SRE team requires a high degree of technical mastery, the ability to brutally prioritize and execute, and a focus on growing teams both by recruiting and mentorship.

Responsibilities

  • Manage engineers working with infrastructure and product engineering teams. Example services may include our metadata storage infrastructure run on MySQL, Go based processes serving as rpc systems for frontend components, or frontend components themselves such as the Photos tab.
  • Understand technical architectures, failure domains, tooling/automation, product launch plans, disaster recovery/business continuity plans, and other issues. You will be asked to create plans for prioritizing technical and resourcing challenges within the infrastructure organization.
  • Partner with product management, network engineering, product engineering, and other related groups.
  • Help engineers develop their careers, assigning them to projects tailored to their skill levels, long-term skill development, personalities, and work styles.
  • Work closely and drive recruiting with a dedicated recruiting staff. This will include sourcing candidates, interviewing candidates, organizing Dropbox participation in conferences/events, and onboarding new employees.
  • Balance the need to "keep things running" with allocating time to long-term, high-impact projects.
  • Assess employee performance frequently by providing feedback on an ongoing basis, address under-performance, and recognize excellent performance.

Requirements

  • BS or MS in Computer Science, Engineering, or a related technical discipline or equivalent experience
  • At least three years of direct management experience in a technology company
  • Previous experience with hiring and performance management, including working with under-performers
  • Sound knowledge of Linux and TCP/IP networks
  • Ability to code well in at least one language
  • Above average knowledge of basic large-scale internet service architectures (such as load balancing, LAMP, CDNs)
  • Good understanding of how to think about data durability (think backups, max time to recovery, and generally how to avoid losing data at all costs)
  • Good communications skills
  • Lastly a very healthy understanding of what “We not I” means

Other open positions for the Engineering Team

Site Reliability EngineerSan Francisco, CA
Software EngineerSan Francisco, CA
Software Engineer - iOSSan Francisco, CA
Software EngineerNew York, NY
Product Software EngineerSan Francisco, CA
Web DeveloperSan Francisco, CA
UI Software EngineerSan Francisco, CA
Software Engineer - OS XSan Francisco, CA
Build EngineerSan Francisco, CA
Technical WriterSan Francisco, CA