Overview

L2 Disaster Recovery Engineer Jobs in Johannesburg at AKB Technology

Part-Time (80 hours/month) contract.

L2 Disaster Recovery Engineer

(Process Resilience & DR Operations – Top 100 Business Processes)

Role Overview

L2 Disaster Recovery Engineer (DR Operations & Process Resilience)

Position Summary

We are seeking a highly structured and technically capable Level 2 Disaster Recovery Engineer responsible for supporting and maintaining the organisation’s Disaster Recovery (DR) readiness across the Top 100 critical business processes.

The role focuses on documenting, mapping, testing, and maintaining recovery procedures for the most critical operational and technology processes within the organisation. The engineer will ensure that DR documentation is accurate, recovery procedures are validated through testing, and that systems supporting critical services can be recovered within agreed RTO (Recovery Time Objective) and RPO (Recovery Point Objective) targets.

This role works closely with Infrastructure, Security, ServiceDesk, Applications, and Business Process Owners to ensure operational resilience and business continuity.

Core Responsibilities

1. Disaster Recovery Process Mapping (Top 100 Processes)

  • Identify and maintain documentation for the Top 100 critical operational and technical business processes.
  • Map dependencies between applications, infrastructure, databases, networks, and services supporting each process.
  • Document process flows, technical dependencies, recovery steps, and escalation paths.
  • Maintain a Process-to-System Dependency Map.
  • Work with business units to ensure accurate recovery prioritisation.

2. Disaster Recovery Documentation & Runbooks

  • Develop and maintain Disaster Recovery Runbooks for all Tier 1 and Tier 2 systems. Document step-by-step recovery procedures including:
  • Server recovery
  • Database recovery
  • Network recovery
  • Application recovery
  • User access restoration
  • Maintain technical architecture diagrams supporting recovery plans.
  • Ensure documentation is updated after changes, upgrades, or infrastructure modifications.

3. DR Infrastructure Support

Support and maintain DR capabilities across key platforms including:

  • Backup platforms (e.g., Veeam or equivalent)
  • Replication technologies
  • Virtualisation platforms (VMware / Hyper-V / Nutanix)
  • Storage replication
  • Network failover configurations
  • Cloud-based DR environments

Responsibilities include:

  • Monitoring DR replication status
  • Verifying backup integrity
  • Ensuring DR environments remain operational and ready for activation

4. DR Testing & Simulation

  • Plan and execute regular DR test exercises.
  • Validate recovery procedures for critical systems.
  • Conduct failover and failback testing where applicable.
  • Document test results and improvement actions.
  • Work with service owners to ensure recovery objectives are achievable.

Testing types include:

  • Tabletop exercises
  • Partial DR recovery tests
  • Full DR simulation exercises

5. Incident Support & Recovery Coordination

  • Act as L2 technical support during major incidents requiring DR activation.
  • Assist in failover procedures and service restoration.
  • Coordinate with infrastructure and application teams during recovery.
  • Ensure communication flows correctly during recovery events.

6. Business Continuity Alignment

  • Work closely with Business Continuity and Risk teams.
  • Ensure DR plans align with Business Continuity Plans (BCP).
  • Assist business units in identifying:
  • Critical systems
  • Recovery priorities
  • Operational recovery requirements.

7. Compliance, Governance & Audit Support

Ensure DR practices comply with organisational and regulatory requirements including:

  • ISO 27001
  • ITIL Service Continuity Management
  • POPIA data protection requirements

Responsibilities include:

  • Maintaining DR evidence for audits
  • Ensuring documentation standards are followed
  • Supporting internal and external audit requests.

8. Continuous Improvement

  • Identify gaps in recovery capability.
  • Recommend improvements to DR architecture.
  • Work with infrastructure teams to improve resilience and failover capability.
  • Assist with the development of automation where possible for recovery processes.

Deliverables

  • Complete Top 100 Business Process DR Mapping Document.
  • Fully documented Disaster Recovery Runbooks.
  • Quarterly DR Testing Report.
  • Updated System Dependency Maps.
  • Recovery readiness reports for management.
  • DR risk and improvement recommendations.

Required Experience & Skills

  • 3–6 years experience in Infrastructure, DR, or Systems Engineering.
  • Strong understanding of:
  • Virtualisation platforms (VMware / Hyper-V / Nutanix)
  • Backup solutions (Veeam or similar)
  • Enterprise storage platforms
  • Windows and Linux server environments
  • Network fundamentals
  • Experience with Disaster Recovery planning and testing.
  • Ability to document technical recovery procedures clearly.
  • Strong analytical and troubleshooting skills.

Preferred Qualifications

  • ITIL Foundation certification
  • Disaster Recovery or Business Continuity certification
  • VMware / Microsoft infrastructure certifications
  • Experience working in Managed Service Provider (MSP) environments

Key Performance Indicators (KPIs)

  • 100% documentation of Top 100 critical processes.
  • DR runbooks maintained and updated quarterly.
  • Successful DR testing with documented recovery times.
  • Reduction in recovery gaps and risk exposure.
  • Compliance with internal governance and audit requirements.

Expanded Key Performance Indicators (KPIs)

L2 Disaster Recovery Engineer – Top 100 Business Processes

1. Documentation of Top 100 Critical Processes

Objective

Ensure that the organisation has complete visibility of all critical operational and technical processes required to recover the business during a disaster event.

Measurement Criteria

  • 100% of the Top 100 critical business processes identified and documented.
  • Each process must include:
  • Process description
  • Business owner
  • Supporting applications
  • Supporting infrastructure
  • Dependencies (network, storage, identity services)
  • Recovery sequence
  • RTO and RPO targets
  • Escalation contacts

Deliverables

  • Top 100 Process Catalogue
  • Process Dependency Map
  • Recovery Priority Matrix

KPI Target

  • 100% completion within first 90 days of role start

Reporting

  • Monthly progress report to Operations Management.

2. Disaster Recovery Runbooks Maintained and Updated

Objective

Ensure all critical systems have step-by-step recovery procedures that engineers can follow during a disaster.

Measurement Criteria

Each DR runbook must contain:

  • Recovery steps
  • Required credentials / access levels
  • Infrastructure dependencies
  • Estimated recovery time
  • Validation steps after recovery

Runbooks must be updated whenever:

  • Infrastructure changes occur
  • Software versions change
  • Systems are migrated
  • New dependencies are introduced

KPI Target

  • 100% of Tier 1 systems documented
  • 95% of Tier 2 systems documented
  • Runbooks reviewed quarterly

Deliverables

  • DR Runbook Repository
  • Infrastructure Recovery Procedures
  • Application Recovery Procedures

Reporting

Quarterly report showing:

  • Runbooks reviewed
  • Runbooks updated
  • Systems missing recovery documentation

3. Disaster Recovery Testing & Validation

Objective

Validate that documented recovery procedures actually work and meet defined recovery targets.

Testing Types

  • Tabletop simulation
  • Partial infrastructure recovery
  • Full failover testing

Measurement Criteria

Each DR test must measure:

  • Recovery Time Objective (RTO)
  • Recovery Point Objective (RPO)
  • Service restoration validation

KPI Target

  • Minimum 2 DR test exercises per year for critical systems
  • 100% documentation of DR test results
  • Recovery targets achieved for 90% of tested systems

Deliverables

  • DR Test Plan
  • DR Test Results Report
  • Improvement Action Register

Reporting

Quarterly DR readiness report to management.

4. Reduction in Recovery Gaps & Risk Exposure

Objective

Continuously improve the organisation’s resilience and recovery capability by identifying weaknesses in systems, processes, or infrastructure.

Measurement Criteria

Identify and track:

  • Systems without DR coverage
  • Infrastructure without replication or backup
  • Single points of failure
  • Missing documentation
  • Recovery steps that fail during testing

KPI Target

  • All critical recovery gaps identified and documented within first 90 days
  • Reduction of DR risks by at least 50% within first 12 months

Deliverables

  • DR Risk Register
  • Infrastructure Resilience Improvement Plan
  • DR Gap Analysis Report
  • Reporting
  • Monthly risk reduction report.

5. Governance, Compliance & Audit Readiness

Objective

  • Ensure DR capabilities align with regulatory, security, and operational standards, particularly for organisations operating under governance frameworks.

Compliance Areas

  • ISO 27001 (Business Continuity & Disaster Recovery controls)
  • ITIL Service Continuity Management
  • Internal operational governance standards
  • Customer contractual SLAs

Measurement Criteria

  • DR documentation available for audit
  • Evidence of DR testing
  • Recovery objectives defined and approved
  • DR procedures aligned with business continuity plans

KPI Target

  • Zero audit findings related to DR documentation or recovery readiness
  • 100% DR documentation stored in controlled repository

Deliverables

  • DR Compliance Evidence Pack
  • Audit Response Documentation
  • DR Governance Reports

Reporting

Audit readiness report submitted annually or when required.

Optional KPI (Highly Recommended)

6. Disaster Recovery Readiness Score

A maturity metric that measures overall readiness.

Components include:

  • DR documentation coverage
  • Runbook completeness
  • Recovery test success rate
  • Infrastructure redundancy
  • Backup integrity

KPI Target

Maintain DR Readiness Score ≥ 85%

Work Location: In person

Title: L2 Disaster Recovery Engineer

Company: AKB Technology

Location: Johannesburg

 

Upload your CV/resume or any other relevant file. Max. file size: 800 MB.