Alert Monitoring
Our monitoring service covers key layers of the technology stack – over a hundred system life points. Our platform has two independent system monitors – one on each compute node – making the monitoring itself redundant. Monitoring anticipates key system, KVM cluster, environmental (e.g. power spikes) or other resource issues. Alerts are immediately sent to both the System Administrator and to the Alteeve Support Team when critical thresholds are exceeded. Our team will connect into the cluster to quickly determine the cause and follow the issue through to resolution.
The life point data collected and reported by our monitoring service includes:
Compute Node:
- Multiple thermal sensors (indicators of failed HVAC systems)
- Internal cooling systems
- Power rail voltage
- Power draw
- Component fault monitoring
Replicated Node Storage:
- Individual drives (including read errors)
- Member drive environment (indicator of unusual heating pattern)
- Controller, cache and battery/flash
Cluster Stack and Resources:
- Unexpected migrations and server recovery events
- Replication storage events
- Unexpected cluster membership changes
Networking:
- Individual links
- Link speed or duplex changes
- Switches
Power:
- Line distortion
- Over/under voltage events
- Full power failure
- Battery end of life