Northrop Grumman Principal Site Reliability Engineer in Annapolis Junction, Maryland
Principal Site Reliability Engineer
Requisition ID: 18003127
Location(s): United States-Maryland-Annapolis Junction
US Citizenship Required for this Position: Yes
Relocation Assistance: Relocation assistance may be available
Are you looking to work in a fast paced, high energy environment working with the latest and greatest, Crème de la Crème of Technology? If so, Northrop Grumman has the perfect opportunity for you. We are currently looking to hire Principal Level Site Reliability Engineers for a highly visible program working to secure the Mission.
Selected candidates will be supporting an open-source system for automating deployment, scaling, and management of containerized applications. You will be contributing to build upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community. If you are familiar or have worked in an environment with these terms: Virtualized Infrastructure, Containerization, Container Orchestration, Docker, Kubernetes, Self-Healing Application, Googlesk, then we’re asking for you to join our team.
The Site Reliability Engineer will augment the existing Operations team for a large analytic cloud repository. Candidates for this position must have experience as a system administrator for large Hadoop and Accumulo based clusters, and a strong background in troubleshooting operational system issues as they arise. This position is focused on building and implementing monitoring tools in a DevOps type environment to increase the stability of various hardware instances associated with the customer’s various analytic hosting platforms.
Minimally, the candidate should be skilled in Java. A successful candidate for this position has experience working with large Hadoop and Accumulo based clusters and has a familiarity with computer networking. Additional knowledge of Python would benefit the candidate.
Roles and Responsibilities:
The ideal candidates will need to have experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution. Will need to have experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management.
• 15 years of overall software development experience.
• 11 years of experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution.
• 7 year pf experience in system engineering/architecture.
• 7 year of experience working with products that support highly distributed, massively parallel computation needs such as Hbase, Hadoop, CloudBase/Acumulo, Big Table, Cassandra, Scality etc.
• 7 year of experience writing software scripts using scripting languages such as Perl, Python, or Ruby for software automation.
• 3 years of experience managing and monitoring large Cloud System (>200 nodes).
• Cloud Systems Administrator or Developer Certification.
• Experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management.
• 7 years of experience in the cleared environment.
• 7 year of demonstrated experience developing software for one of the following: Windows, UNIX, or Linux OS.
• Knowledge and experience with developing distributed storage routing and querying algorithms.
• Experience in developing documentation required to support a program’s technical issues and training situations.
• 7 year of experience developing software systems using object-oriented programming languages (i.e. Java, Python, etc.).
• Experience developing solutions integrating and extending COTS products.
• Experience "wrapping" legacy systems or components as Web Services within a SOA framework.
• Demonstrated knowledge of analytical needs and requirements, query syntax, data flows, and traffic manipulation.
• 7 years of experience in developing system performance, availability, scalability, manageability, and security requirements for mid-to-large scale programs.
• Experience designing, developing, testing, evaluating, and integrating information systems into a services oriented environment.
• Experience optimizing storage, retrieval, backup, and retention strategies across globally distributed, high throughput, text and multimedia storage within clustered or cloud environments.
• A Bachelor's Degree in Computer Science or in a related technical field is highly desired which will be considered equivalent to 2 years of experience. A Master's degree in a Technical Field will be considered equivalent to 4 years of experience.
Security Clearance: An active TS/SCI clearance with polygraph is required.
Northrop Grumman is committed to hiring and retaining a diverse workforce. We are proud to be an Equal Opportunity/Affirmative Action Employer, making decisions without regard to race, color, religion, creed, sex, sexual orientation, gender identity, marital status, national origin, age, veteran status, disability, or any other protected class. For our complete EEO/AA and Pay Transparency statement, please visit www.northropgrumman.com/EEO . U.S. Citizenship is required for most positions.
Title: Principal Site Reliability Engineer
Location: Maryland-Annapolis Junction
Requisition ID: 18003127