ISMS Copilot
ISMS documentation

Incident Management and Business Continuity

ISMS Copilot has established incident management and business continuity procedures to ensure rapid detection, containment, and recovery from security incidents or service disruptions. Our approach prioritizes customer data protection and service availability.

Incident response is integrated with our change management process and escalation procedures to ensure coordinated response.

Incident Response Process

Our incident management follows a five-phase approach:

  1. Detection — Monitoring systems, customer reports, or security scans identify potential incidents

  2. Assessment — Incident severity and scope evaluated to determine response level

  3. Containment — Immediate actions taken to limit impact and prevent spread

  4. Recovery — Systems restored to normal operation with fixes deployed

  5. Post-Incident Review — Root cause analysis conducted and preventive measures implemented

Roles and Responsibilities

Our incident response team includes defined roles:

  • Incident Commander — CEO leads overall response coordination and stakeholder communication

  • Primary and Secondary On-Call — Technical responders available for rapid assessment and remediation

  • Communication Lead — Manages customer notifications and status updates

For security incidents involving customer data or compliance implications, we escalate to leadership immediately.

Escalation Procedures

Incidents are escalated based on severity and impact:

  • Team coordination via dedicated Slack #incidents channel

  • Leadership notification via email for high-severity incidents

  • Customer communication for service-affecting incidents

  • Regulatory notification if required by GDPR or other compliance frameworks

Business Continuity Planning

Beyond incident response, we maintain business continuity procedures including:

  • Backup and disaster recovery capabilities

  • Third-party dependency monitoring and contingency planning

  • Infrastructure redundancy for critical services

  • Data retention and recovery procedures

  • AI provider failover and resilience mechanisms

AI Provider Failover and Resilience

To ensure continuity of AI-powered compliance services during provider outages, ISMS Copilot implements automatic failover mechanisms:

Default Provider Path (Anthropic/OpenAI):

  • Circuit Breaker Monitoring: Real-time health tracking of primary AI provider (Anthropic Claude) monitors 5xx errors, 529 rate limits, and network failures in a sliding window

  • Automatic Failover: When errors exceed threshold, requests automatically route to backup provider (OpenAI) without user intervention

  • Automatic Recovery: System probes primary provider periodically to detect recovery and switch back when healthy

  • User Notification: Persistent banner alerts users during failover events while service continues uninterrupted

  • Provider Selection Bypass: Users who explicitly select specific models (e.g., Gemini, Grok, Mistral) bypass automatic failover—their selection is respected

Automatic failover provides high availability for the majority of users on default provider paths, minimizing disruption during AI provider incidents.

Advanced Data Protection Mode (EU-Only via Mistral):

  • No Failover Available: Users with Advanced Data Protection enabled (EU-only processing) use Mistral AI exclusively

  • Single Provider Limitation: Mistral is currently our only EU-based provider with zero-retention agreements, so no EU backup exists

  • Service Impact: Mistral outages may cause service disruption for EU-only users until provider recovers

  • Trade-off Rationale: EU-only mode prioritizes data sovereignty and zero retention over failover resilience

  • Future Enhancement: We are actively working to add a second EU provider to enable failover for Advanced Data Protection users

Organizations choosing Advanced Data Protection Mode accept this availability trade-off in exchange for strict EU data residency and zero AI provider retention. For critical uptime requirements, evaluate whether default mode (with automatic failover but US processing) is acceptable for your compliance posture.

Monitoring and Transparency:

  • Provider health metrics are monitored continuously via circuit breaker instrumentation

  • Failover events are logged and reviewed in post-incident analysis

  • Status page communications inform users of ongoing provider incidents

  • Circuit breaker status is exposed via internal monitoring endpoint for operational visibility

Post-deployment incidents trigger our change management rollback procedures while maintaining incident documentation for review.

Documentation and Learning

Every incident generates documentation including timeline, impact assessment, root cause, and preventive actions. These learnings feed back into our risk register and threat prevention planning.

Our incident management procedures align with our overall ISMS framework and support SOC 2, ISO 27001, and NIST compliance requirements.

Was this helpful?