Senior HPC Cluster Systems Administrator

📁
Information Technology
💼
IC-Information Technology
📅
105547 Requisition #

Berkeley Lab’s (LBNL) Information Technology Division (IT) has an opening for a Senior HPC Cluster Systems Administrator to join their ScienceIT Team!


In this exciting role, you will support the Berkeley Lab research community by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems, and Kubernetes clusters. This role provides extensive expertise in High Performance Computing infrastructure and delivers advanced Linux solutions to further scientific endeavors at Berkeley Lab. The mission of Scientific Computing under ScienceIT is to facilitate groundbreaking fundamental research globally by providing essential computing tools, networks, and expertise to enable pioneering science. 

 

This position has an anticipated start date of January 5, 2026.

 

We’re here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!

Why join Berkeley Lab?

We invest in our employees by offering a total rewards package you can count on:

  • Exceptional health and retirement benefits, including pension or 401K-style plans
  • Opportunities to grow in your career - check out our Tuition Assistance Program
  • A culture where you’ll belong - we are invested in our teams! 
  • In addition to accruing vacation and sick time, we also have an annual Winter Holiday Shutdown
  • Parental bonding leave (for both mothers and fathers)
  • Pet insurance

 

What You Will Do:

  • Perform Linux system and HPC cluster maintenance and installations, operating system upgrades, system security hardening and intrusion detection, storage and file system management, system hardware, customization of user group working environment, troubleshooting, network monitoring, and crash recovery.
  • Design, deploy, and manage scalable applications using Kubernetes, ensuring the availability, performance, and readiness of the Kubernetes infrastructure.
  • Automate deployment, scaling, and management of containerized applications, and collaborating with DevOps and development teams to streamline CI/CD pipelines. 
  • Design, deploy, and manage the global storage platform to ensure high performance, massive scalability, reliability, and future-proof solutions.
  • Support storage technologies such as Lustre, VAST, and networks. 
  • Resolve I/O issues related to business applications, including diagnosing and resolving complex storage, Linux, and networking challenges in a fast-paced environment.
  • Research new storage management technologies, techniques, and provide recommendations.
  • Participate in developing system administration, security, and network policies, documentation, and tools oriented towards efficient systems management.
  • Participate in cluster support to staff and researchers, including initial installation, integration, and ongoing maintenance of Linux High-Performance Computing cluster systems. This includes travel to remote sites if as needed.
  • Co-leading technical efforts with other senior system administrators in areas of HPC technologies such as job schedulers, high-performance interconnects, parallel file systems, cybersecurity, cluster management, container orchestration, VM infrastructure, networking, performance tuning, or data center planning.
  • Co-leading group projects of small to medium size and complexity, to implement and deploy new computing technologies and associated services to the research community.

 

What We Are Looking For:

  • A Bachelor’s Degree (or equivalent knowledge/training) in Computer Science, Engineering, or a related discipline, and a minimum of 12 years of relevant experience in Linux system administration within a large distributed computing environment, including experience providing systems and end-user support for multiple scientific or computational research groups or an equivalent combination of education and experience.
  • Demonstrated ability to manage large-scale, performance-critical environments, including capacity planning, scaling, and optimization.
  • Significant experience deploying, scaling, and managing Kubernetes clusters, with a strong understanding of its architecture (pods, deployments, services, ingress) and container orchestration. Proven proficiency with CI/CD tools like Jenkins or GitLab CI.
  • Proven experience with Red Hat derivatives (CentOS, Scientific Linux, Rocky Linux), Debian, Ubuntu, and large-scale system and configuration management tools (Kickstart, Ansible, Puppet, Chef, Warewulf). Expertise in supporting standard services (NFS, LDAP, SMB, MySQL, Apache/Nginx HTTPD).
  • Strong HPC expertise, including Linux, job schedulers, high-performance interconnects, parallel file systems, cybersecurity, container orchestration, cluster management, VM infrastructure, networking, performance tuning, scientific application support, and data center planning.
  • Proficiency in Python and Bash for building, optimizing, and debugging scientific codes (C, C++, Fortran, Java), including experience with compilers (GCC, Intel), debuggers, Makefiles, and version-control (git, Subversion). 
  • Expertise in storage system design and optimization (Lustre, S3, VAST, Weka, Ceph, DDN), including a deep understanding of the storage stack (kernel to user space, including file systems, block storage, I/O schedulers, VFS), storage benchmarking, and performance tuning (throughput, latency, IOPS, workload-specific optimizations).
  • Excellent oral and written communication skills including experience organizing and presenting customer focused technical data, reports, and projects to audiences with varying degrees of technical expertise.
  • Strong interpersonal skills including experience with research facilitation and project management in a multidisciplinary team environment.

 

Desired Qualifications:

  • An Advanced Degree (or equivalent knowledge/training) in Computer Science, Engineering, or a related discipline.
  • Experience with software engineering and/or software development.       
  • Familiarity with Kubernetes-related tools like Helm, Istio, and Prometheus.
  • Demonstrated experience supporting research at a National Lab and/or in an academic or research environment. 

 

Additional Information:

  • Application Deadline: For full consideration, please apply with a resume and a cover letter describing your interest by November 30, 2025.
  • Appointment type: This is a full-time, career appointment, exempt (monthly paid) from overtime pay.
  • Salary Information: This position is expected to pay $178,644 - $218,364 annually, which fits within the full salary range of $158,808 - $267,996 annually for job code C70.4. It is not typical for an individual to be offered a salary at or near the top of the range for a position. Salary for this position will be commensurate with the final candidate’s qualification and experience, including skills, knowledge, relevant education, certifications, and aligned with the internal peer group.
  • Background Check: This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
  • Work Modality: This position is eligible for a hybrid work schedule - a combination of teleworking and performing work on site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA 94720. Work schedules are dependent on business needs. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Starting May 7, a REAL ID or other acceptable form of identification is required to access Berkeley Lab sites (for more information click here).
  • Relocation: This position is eligible for relocation assistance.
  • Work Authorization: Applicants must be legally authorized to work in the United States. Berkeley Lab does not provide visa sponsorship for this position.

 

Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov

Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.

Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.

Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

 

Previous Job Searches

My Profile

Create and manage profiles for future opportunities.

Go to Profile

My Submissions

Track your opportunities.

My Submissions

Similar Listings

EG-Engineering

Bay Area, California, United States

📁 Information Technology

Requisition #: 105099

AL-Advanced Light Source

Bay Area, California, United States

📁 Information Technology

Requisition #: 105441

NE-NERSC

Bay Area, California, United States

📁 Information Technology

Requisition #: 105209