SRE Business Analyst/Scrum Master
Description
The NMCI Service Management Integration and Transport (SMIT) group at Leidos has an opening for a Site Reliability Engineering (SRE) Scrum Master to bridge the gap between business objectives and technical requirements for the Site Reliability Engineering (SRE) teams. Under the SMIT Contract, the Leidos team is responsible for the core backbone for the Navy-Marine Corps Intranet, including cybersecurity services, network operations, network engineering, service desk, seat support services, and data transport.
The SRE Scrum Master will work closely with internal and customer stakeholders across the engineering organization to understand business needs, gather requirements, and translate them into technical specifications that drive system reliability and performance. You will analyze existing processes, identify improvement opportunities, and support the implementation of solutions that align with the organization's goals for reliability and operational excellence. Your work will contribute to the development of robust and scalable services that operate reliably in production.
Primary Responsibilities:
Requirements Gathering and Analysis:
- Collaborate with business stakeholders to identify and understand their needs and expectations regarding system reliability and performance.
- Gather and document functional and non-functional requirements for SRE initiatives, ensuring alignment with business objectives.
- Analyze existing processes and systems to identify gaps and areas for improvement in reliability and operational efficiency.
- Collect, analyze, and interpret data related to system performance, incident management, and user experience.
- Develop dashboards and reports to provide insights into system reliability, performance metrics, and key performance indicators (KPIs) relevant to SRE efforts with our quality team.
- Present findings and recommendations to stakeholders, enabling data-driven decision-making for reliability initiatives.
- Work closely with SRE, development, and operations teams to translate business requirements into technical specifications and actionable tasks.
- Facilitate communication between technical teams and business stakeholders to ensure clarity and alignment with designated SRE teams.
- Participate in design reviews and assist in validating that solutions meet business requirements and reliability standards.
- Identify opportunities for process improvements that enhance the reliability and efficiency of systems and workflows.
- Lead initiatives to implement best practices in incident management, change management, and other operational processes to minimize downtime and enhance service quality.
- Collaborate with teams to establish and refine service level objectives (SLOs) and service level indicators (SLIs) that reflect business priorities.
- Create and maintain documentation related to business requirements, process flows, and technical specifications for SRE teams in Jira and/or Azure DevOps.
- Develop training materials and conduct training sessions for stakeholders to promote understanding of SRE practices and tools.
- Stay up to date with industry trends and best practices related to site reliability engineering, data analysis, and process improvement.
- Participate in continuous improvement efforts, contributing to a culture of learning and innovation within the SRE team.
- Requires BS degree in IT, software development or related technical domain and 4+ years as a Business Analyst within a technical or software engineering environment.
- Currently possesses an active DoD Secret security clearance
- Experience with Agile and Scrum methodologies and tools like Jira, Confluence, Trello, or Azure DevOps
- Strong understanding of site reliability engineering or DevSecOps principles, practices, and methodologies.
- Familiarity with monitoring and observability tools used in SRE.
- Strong analytical and problem-solving skills, with the ability to synthesize complex information and provide actionable insights.
- Ability to evaluate and prioritize business needs and align them with technical capabilities.
- Skilled at working with geographically distributed teams
- Excellent communication skills, both written and verbal, with the ability to convey technical concepts to non-technical stakeholders.
- Proven ability to collaborate effectively with cross-functional teams and build strong relationships with stakeholders.
- Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge
- Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.).
- Knowledge of the Risk Management Framework (RMF), DISA STIGs
- Certification in Scaled Agile Framework (SAFe) Scrum Master or similar Scrum Master certification and related Agile field.
- Experience in the software development lifecycle (SDLC) and understanding of DevOps practices.
- Knowledge of incident management and service reliability best practices.
- Improved system reliability, as measured by adherence to Service Level Objectives (SLOs) and reduced Mean Time to Recovery (MTTR).
- Comprehensive and regularly updated automated test coverage for all critical systems and infrastructure components.
- Timely identification and resolution of performance bottlenecks and failure points.
- Increased scalability and performance of systems under high load due to effective performance testing.
- Locations
- Kenya
- Remote status
- Fully Remote
About Remote World
Remote World was founded in 2023 with a simple belief: the future of work is remote, and everyone should have access to quality remote opportunities regardless of their location.
We noticed that while remote work is growing rapidly, many talented individuals around the world struggled to find legitimate, well-paying remote positions. At the same time, employers were struggling to find qualified remote talent from a global pool.
Remote World was created to solve this problem by building a comprehensive platform that not only connects job seekers with employers, but also empowers workers to develop the skills they need to succeed in remote environments.
Today, we're proud to have connected over 500 talented professionals with remote opportunities worldwide, and we're just getting started.