Availability Services (AvS)
Librato Availability Services (AvS) is a comprehensive and robust checkpoint/restart solution for serial jobs and parallel jobs that use Message Passing Interface (MPI) for inter process communication. With AvS, IT managers can maximize job throughput while retaining flexibility and gaining piece of mind. IT managers gain flexibility because low priority jobs can be preempted and migrated across nodes as needed without having to deal with any application or operating system modifications.
Simply put, AvS will seamlessly integrate into an existing cluster with minimal configuration, performance overhead and disruption. Piece of mind comes from knowing that even in the event of a system failure (e.g. hardware malfunction, software crash) precious time and computer cycles will not be wasted because AvS is taking periodic checkpoints of each job. Restarting a job is simply a matter of rolling back to the previous checkpoint and starting it on the same node (assuming the hardware/software problem has been resolved) or migrating the job to another compute node.

Key Features
- Robust checkpoint/restart solution
- Completely transparent to application and operating system
- Integrates seamlessly with popular queuing systems such as Platform LSF, PBS Pro and others
- Less than 5% performance overhead
- Supports parallel MPI applications
Key Benefits
- Maximizes job throughput
- Enables job migration
- Enables job priority SLA management
- Provides application fault tolerance from system failures