Alert Monitoring

Our monitoring service covers key layers of the technology stack – over a hundred system life points. Our platform has two independent system monitors – one on each compute node – making the monitoring itself redundant. Monitoring anticipates key system, KVM cluster, environmental (e.g. power spikes) or other resource issues. Alerts are immediately sent to both the System Administrator and to the Alteeve Support Team when critical thresholds are exceeded. Our team will connect into the cluster to quickly determine the cause and follow the issue through to resolution.

The life point data collected and reported by our monitoring service includes:

Compute Node:

Multiple thermal sensors (indicators of failed HVAC systems)
Internal cooling systems
Power rail voltage
Power draw
Component fault monitoring

Replicated Node Storage:

Individual drives (including read errors)
Member drive environment (indicator of unusual heating pattern)
Controller, cache and battery/flash

Cluster Stack and Resources:

Unexpected migrations and server recovery events
Replication storage events
Unexpected cluster membership changes

Networking:

Individual links
Link speed or duplex changes
Switches

Power:

Line distortion
Over/under voltage events
Full power failure
Battery end of life