IT teams face the question of cost in any discussion of application high availability (HA) or disaster recovery (DR). They are concerned about the cost of redundancy needed to protect their infrastructure: In an Azure- HA environment, regardless of the solution you choose, you’ll be paying for virtual machines (VMs) and storage running in at least two availability zones (AZs).
How you choose to implement HA or DR on that infrastructure, though, can have a dramatic effect on cost. SQL Server Enterprise Edition, Always On Availability Groups (AGs) is a popular choice for HA and DR. While it works with Windows Server Failover Clusters (WSFC) to enable HA and DR at no additional cost, AGs may incur costs you have not anticipated.
The alternative approach involves using the more cost-effective SQL Server Standard Edition and licensing a third-party product to implement HA/DR. This combination provides a more robust level of protection at a dramatically lower overall cost than the Enterprise Edition alternative.
Achieving HA in Azure
Native SQL Server solutions are not ideal
Let’s look at HA options first. In a typical configuration, you run SQL Server on a primary VM in one Azure AZ and cluster it with a VM in a different AZ using WSFC. WSFC monitors the health of the SQL Server VM and, in the event of a failure, moves operation to the secondary server in an operation called a failover. SQL Server AG feature replicates SQL databases from a primary system to a secondary system. After a failover, AG services immediately pick up from the point where the primary stopped.
Because this HA functionality is bundled into SQL Server, the cost to implement an AG would appear to be nil — but that may not be so in practice. The basic AG service available in SQL Server Standard Edition will replicate a single SQL database to a single secondary system. If you have more than one database demanding HA or if you want to replicate your database(s) to multiple secondary systems, you’ll either need to create multiple individual AGs or purchase SQL Server Enterprise Edition and upgrade to Always On AG.
Whether you choose to manage multiple individual AGs or upgrade to SQL Server Enterprise Edition, your costs will increase significantly. Your IT personnel will either need to invest more time managing multiple SQL Standard EditionAGs, or your costs are going to climb steeply because SQL Server Enterprise Edition is much more expensive to license.
Third-party HA solutions provide a cost-effective answer
Alternatively, you could license a third-party HA solution to create what is known as a SANless cluster. In this scenario, you would create a Windows Failover Cluster Instance (FCI) consisting of VMs with attached storage in two separate but nearby AZs. The SANless clustering software synchronously replicates data between the primary and secondary cluster nodes. If the primary node becomes unresponsive, the cluster automatically fails over to a secondary node, where the fully replicated SQL Server database is ready to take over.
Whle the SANless clustering approach requires a license for SQL Server Standard Edition as well as for the SANless clustering software, it can still be as much as 80% less expensive than SQL Server Enterprise with AG. A SANless clustering environment also enables you to replicate all the databases, including system databases, to the secondary server.
The SANless clustering software doesn’t count databases: It simply performs highly efficient synchronization between the target storage volumes. SANless clustering also enables you to replicate data to multiple secondary nodes — again, without having to license SQL Server Enterprise edition. For many organizations, the ability to avoid the cost of SQL Server Enterprise edition makes licensing a SANless clustering solution a far more attractive way to achieve HA in an Azure environment.
Extending to Disaster Recovery for SQL Server in Azure
It’s easy to envision adding a third cluster VM in an AG- or SANless cluster-based HA configuration to support DR. One simply adds a SQL Server instance in a remote AZ and uses asynchronous replication to replicate data to that instance. Asynchronous replication could possibly risk loss of a miniscule amount of in-flight data if the primary server fails before replicated transactions were written to the remote DR storage. However, the remote system can quickly resume operations.
Azure Site Recovery Option
Additionally, within the Azure environment, a third option exists for DR: Azure Site Recovery (ASR). With ASR, your entire production VM (including attached storage) is automatically replicated to an AZ in a remote Azure region.
Unlike an AG- or SANless clustering-based approach to DR, though, which requires active infrastructure in the remote region, the VM replicated via ASR exists as an image in the remote region. Should a catastrophe demand that you run your operations from that remote region, you would restore that VM image on suitable infrastructure.
The cost implications of these DR options
What do these DR alternatives mean from a cost perspective? If you’re considering an AG or SANless clustering approach, the same caveats that apply to HA apply to DR. In both scenarios you’ll have to retain infrastructure in the remote AZ.
Suppose you are extending an AG-based HA solution. In that case, you’ll need to use the more costly SQL Server Enterprise Edition because you’ll have more than one replica. Suppose you replicate data from a SANless cluster-based HA environment to your remote DR site. In that case, you’ll simply need to acquire another license for the third-party SANless clustering software for use in the DR infrastructure. You can still use SQL Server Standard Edition and replicate as many databases as you want.
If you’re going to use ASR for DR, you’ll have a regular charge for the service from Azure, but you won’t pay for any DR infrastructure on the back end unless you need to restore that VM. When that restore occurs you’ll begin paying for the infrastructure you provision. Keep in mind, though, that depending on the frequency with which ASR replicates your production VM, the data in that VM may not be as up-to-date as the data asynchronously replicated by an AG- or SANless clustering-based solution.
It is not all about infrastructure cost though
Moreover, it will take some amount of time to provision infrastructure and restore the VM so that your operations can be brought online. Between the age of the data on the VM and the amount of time it takes to restore the VM, the SQL database in your DR infrastructure may be far more out of sync than it would be if you were to bring an asynchronously replicated DR infrastructure online from an AG- or SANless clustering-based DR solution.
You’ll need to factor into your cost calculations the value of time and lost transaction data when comparing the cost of these different approaches to DR. Ultimately, configuring for HA and DR in Azure is readily achievable. Whether you choose to use SQL Server’s native AG approach or a third-party SANless Clustering approach depends on your particular application infrastructure. How many databases do you need to replicate? How many remote instances you will need? Those questions can help you determine the cost profiles for your solution. But think too about time to recovery in the event of an emergency. If you want to consider ASR as a DR solution, you’ll need to factor into your cost calculations the value of time and lost transaction data when comparing the cost of these different approaches to DR.
More than 20 years ago, SIOS started helping companies protect their critical applications from downtime and disasters by providing reliable, easy-to-manage, clustering software and expert consulting services. Since then, we have stayed on the forefront of HA/DR as IT infrastructures have evolved from on-premises data centers into complex combinations of on-prem, cloud, hybrid cloud, and multi-cloud environments. We have earned a reputation as the industry’s leading provider of high availability and disaster recovery protection for mission-critical applications, ERPs, and databases.
Our unique application recovery kits provide application-specific intelligence and time-saving automation that ensure failovers are efficient, reliable and maintain application best practices.
Today we have more than 80,000 licenses installed globally, protecting applications for companies in a broad range of industries.