Software Reliability Engineer
Skyroot
About Us
Job Description
Job Description :
The Software Reliability Engineering (SRE) department/team ensures that all software developed for our space missions, ground infrastructure(checkout systems and simulations), and supporting tools adheres to world-class standards of design assurance, reliability, compliance and mission readiness, enabling consistent performance across launch vehicles, satellites, and ground control systems.
As the independent authority for software assurance, the team governs development processes, verifies implementation integrity and validates system behaviour under mission conditions. It also leads the design, qualification, and maintenance of software test setups, ensuring that all verification and validation activities are executed in reliable, traceable, and mission-representative environments.
Working closely with the Software Design Team, SRE builds a unified, mission-ready software ecosystem that is robust, traceable, and launch-certified.
Vision: To establish a world-class software assurance framework that guarantees predictable, verified, and validated software performance — from design to orbit — ensuring that every mission is backed by reliability and confidence.
Roles & Responsibilities
1. Software Process & Compliance
· Developing and evolving software development lifecycle processes.
· Establish and monitor configuration management and version control (branching strategy, release tagging, change tracking).
· Oversee toolchain qualification and validation (compilers, linkers, analyzers, CI/CD pipelines).
· Conduct process audits, ensuring full traceability across requirements, design, code, and test artefacts.
· Lead continuous improvement initiatives to enhance software maturity and development efficiency.
· Automate key stages of the software development lifecycle with AI -assisted tools and scripts — including requirements traceability, design documentation, code generation, static analysis, test case generation, and compliance reporting.
· Implement dashboards for live tracking of process compliance, test coverage, and reliability metrics.
· Integrate model-based development and verification (MBD/MBV) to enhance traceability from system models to code and test.
2. Software Verification & Validation (IV&V)
· Develop, maintain, and execute software planning (test plans, verification strategy, process and test reports) for all mission-critical software.
· Perform independent unit, integration, hardware-in-the-loop (HIL), and system-level testing.
· Design, manage, and qualify software test setups, simulators, and automated test frameworks(Regression & Continuous Integration, Hardware/Integration Test Automation etc) used for verification and validation of flight and ground mission software (excluding ground checkout system software test setups).
· Ensure independent reviews and verifications of requirements, design, and code before release.
· Own test coverage analysis, traceability matrices, and certification documentation.
· Provide final software readiness certification and sign-off for launch.
3. Software Reliability Engineering
· Develop and maintain software reliability models and define quantifiable reliability metrics (e.g., failure rate, MTBF, FMECA).
· Conduct robustness, stress, and fault-injection testing to validate fault tolerance.
· Collaborate with design teams on FDIR (Fault Detection, Isolation, and Recovery) strategy validation.
· Analyse test anomalies and post-flight data to drive continual software reliability improvements.
· Establish feedback loops from testing and operations to refine reliability models.
4. Collaboration with Software Design
· Participate in early design and architecture reviews to ensure compliance and testability.
· Maintain bidirectional traceability between requirements, design, and verification artefacts.
· Collaborate with design teams to define “autonomous verification pipelines” — reducing manual validation cycles and increasing confidence in critical codebases.
· Collaborate on root-cause analysis and implement corrective actions.
· Collaborate on post-test data review and performance assessment.
· Define handover and acceptance criteria between design and IV&V stages.
· Promote a culture of “Design for Reliability” across all software development phases.