Incident Management and Business Continuity

ISMS Copilot has established incident management and business continuity procedures to ensure rapid detection, containment, and recovery from security incidents or service disruptions. Our approach prioritizes customer data protection and service availability.

Incident response is integrated with our change management process and escalation procedures to ensure coordinated response.

Incident Response Process

Our incident management follows a five-phase approach:

Detection — Monitoring systems, customer reports, or security scans identify potential incidents
Assessment — Incident severity and scope evaluated to determine response level
Containment — Immediate actions taken to limit impact and prevent spread
Recovery — Systems restored to normal operation with fixes deployed
Post-Incident Review — Root cause analysis conducted and preventive measures implemented

Roles and Responsibilities

Our incident response team includes defined roles:

Incident Commander — CEO leads overall response coordination and stakeholder communication
Primary and Secondary On-Call — Technical responders available for rapid assessment and remediation
Communication Lead — Manages customer notifications and status updates

For security incidents involving customer data or compliance implications, we escalate to leadership immediately.

Escalation Procedures

Incidents are escalated based on severity and impact:

Team coordination via dedicated Slack #incidents channel
Leadership notification via email for high-severity incidents
Customer communication for service-affecting incidents
Regulatory notification if required by GDPR or other compliance frameworks

Business Continuity Planning

Beyond incident response, we maintain business continuity procedures including:

Backup and disaster recovery capabilities
Third-party dependency monitoring and contingency planning
Infrastructure redundancy for critical services
Data retention and recovery procedures
AI provider failover and resilience mechanisms

AI Provider Failover and Resilience

To ensure continuity of AI-powered compliance services during provider outages, ISMS Copilot implements automatic failover mechanisms:

Default Provider Path (Anthropic/OpenAI):

Circuit Breaker Monitoring: Real-time health tracking of primary AI provider (Anthropic Claude) monitors 5xx errors, 529 rate limits, and network failures in a sliding window
Automatic Failover: When errors exceed threshold, requests automatically route to backup provider (OpenAI) without user intervention
Automatic Recovery: System probes primary provider periodically to detect recovery and switch back when healthy
User Notification: Persistent banner alerts users during failover events while service continues uninterrupted
Provider Selection Bypass: Users who explicitly select specific models (e.g., Gemini, Grok, Mistral) bypass automatic failover—their selection is respected

Automatic failover provides high availability for the majority of users on default provider paths, minimizing disruption during AI provider incidents.

Advanced Data Protection Mode (EU-Only via Mistral):

No Failover Available: Users with Advanced Data Protection enabled (EU-only processing) use Mistral AI exclusively
Single Provider Limitation: Mistral is currently our only EU-based provider with zero-retention agreements, so no EU backup exists
Service Impact: Mistral outages may cause service disruption for EU-only users until provider recovers
Trade-off Rationale: EU-only mode prioritizes data sovereignty and zero retention over failover resilience
Future Enhancement: We are actively working to add a second EU provider to enable failover for Advanced Data Protection users

Organizations choosing Advanced Data Protection Mode accept this availability trade-off in exchange for strict EU data residency and zero AI provider retention. For critical uptime requirements, evaluate whether default mode (with automatic failover but US processing) is acceptable for your compliance posture.

Monitoring and Transparency:

Provider health metrics are monitored continuously via circuit breaker instrumentation
Failover events are logged and reviewed in post-incident analysis
Status page communications inform users of ongoing provider incidents
Circuit breaker status is exposed via internal monitoring endpoint for operational visibility

Post-deployment incidents trigger our change management rollback procedures while maintaining incident documentation for review.

Documentation and Learning

Every incident generates documentation including timeline, impact assessment, root cause, and preventive actions. These learnings feed back into our risk register and threat prevention planning.

Our incident management procedures align with our overall ISMS framework and support SOC 2, ISO 27001, and NIST compliance requirements.

Was this helpful?