Article · Oct 11, 2022

The Cost of Risk

Ever had one of those moments when you realized you should have paid the fee for the extended coverage? You know, for a small fee you can get a 3-year warrantee to fix or replace your purchase. You think, “I haven’t needed it before. It is just a waste of my money. I can get a Carmel swirly mocha-frap, half skim, decaf, boba and kale smoothie instead.”

Well, skipping the fee maybe fine for your overnight web order of a Wi-Fi and Bluetooth mini Siri enabled key fob, but what about the “simple” IT project to move the 10 servers to the new location, or the transfer of a high volume 2 terabyte database server from the old storage array to the new Premium SSD v2 storage in Azure, or even the simple effort to move the development servers to “the cloud”… come-on… its development, how much risk can that be…

Many providers preach the “cost of free” for the first 30 days, but you may also want to look at the cost of risk. Risk is a fickle thing… it is obvious that too few resources can incur risk by not having enough staff to complete the tasks in the time allotted, so actions are rushed, and problems occur.

However, too many resources can cause risk as well. Multiple handoffs and checkpoints, delays due to simple communications between multiple parties, missed alerts because someone else was supposed to monitor that message queue, choosing a storage solution that based on price instead of performance needs, and finding out during the migration it’s too slow. These, and many other human errors, are all links in the risk chain. The more links in the chain, the more risk to success. More risk, more cost to manage the risk.

Automation may seem like a good option to reduce risk. If you create scripts to automate manual tasks you may feel you have lower resource requirements, resulting in reduced risk. However, if the critical tasks are ‘one-off’ actions, the risk transfers to the automation team but does not necessarily diminish. Did the automation actions account for the multiple scenarios or obstacles that can occur? Is there a process to address automation errors? If something goes wrong somewhere else, is there a way to pause or skip the automation step? What action is needed to stop or restart the automation steps?

Even if all you do are projects like these, each project, provider, or infrastructure has different challenges, requirements, options, features, release cycles and strategies to complete the necessary tasks. Your automation actions need to address the shortfalls and limitations of each technique the providers offer. This means your automation becomes a product on its own. Does your business sell that product or is it just another costs that has its own risk factors to consider?

Then there is the risk of limitations – Most vendors tell you what they do. To really understand the risk, you need to understand what they cannot do, and, more importantly, how you address it with a low-risk option. It is nice that a process can do 80% of the project, but what do you do with the other 20% that are probably the most critical parts? Is it acceptable to have minimal risk at 80% of the work but very high risk on the critical parts? Is the 80% valid if the 20% fails?

If you find a solution that covers the 20%, but has a price, will that reduce the risk, or just shift it? The value of adding another link in the risk chain may not decrease the risk as much as you would like. How will you coordinate the automated tool process with the non-automated efforts? Do you need to change one process to accommodate the other? If one process fails do you need to revert both? If you need to revert both, can you? Is there a project duration difference between the 80% and 20%? If so, should you do the 20% first, then, after it is successful complete the 80%, or the other way around? Will you need to add a third process for the outliers like clusters or old operating systems or physical servers, new hardware or storage demands?

Another crucial point about risk is deciding at what point the recovery of the project (backout options) become the risk? If an unrecoverable, un-accounted for condition occurs, can you get “back to good”? If you have multiple processes deployed, can they get you back to the same point in time? Will it require additional time and tasks, and isn’t that additional risk? There are dozens of ways to complete server migrations for free as an IT professional. How many low-risk solution provide a simple “back to good” option in all scenarios across all infrastructures?

Just think how “free” the project is, if three people at $200 per hour take 8 or more hours to move 10 servers. If all goes well that is $4,800. If the ‘free’ process requires additional infrastructure to function, that is additional risk as well as additional dollars for the setup and management during the migration project, then add the time to validate the environment after the migration. A simple 10 server project can run from $5,400 to $6,500 or more. That is as much as $650 or more per server, and that is a simple and very conservative example. What happens if that is a 50-server migration and only one of several migration “waves” per month for the next 5 months? Suddenly your “free” migration can cost you over $32K.

This estimate does not include the required time to setup the infrastructure and new servers in the new environment, or to complete backups of the source systems to ensure there is a ‘gold’ recovery point to get “back to good”, or to monitor the systems while they are performing the initial synch. What if you encounter a problem during the manual effort, can you just move a few of the servers now, revert the failed servers and try again next time? Can you quickly restart the process to complete the move within the outage windows? What happens if you are moving that 2 terabytes database server to Azure Premium SSD v2 storage, and, in the last 20 gig of your Block-Image transfer the local disk, or network, or, router, or power fails? Can you just pick up where it left off?

During the migration of the Development environment to the cloud a production issue occurs, and the Development team needs access to their servers to address it. What happens to your migration to the cloud? Can you pause and resynch quickly enough to make the cutover window, or do you need to scrap everything and start over?

After you perform a quick review to assess the risk of your project, you will probably find that “free” is not Risk Free, and probably not even Risk reduced, but Risk is what costs the most. Using a paid-for tool on the most critical and complex 20% should work even better on the other “simple” 80%. Standardized process across the project removes links from the risk chain. Removing links from the risk chain will result in reducing the risk and probably shorten the timeline, which optimizes the resources used. Repeatable processes across any infrastructure and intelligent automation, where possible, is where the real value is. Risk reduction is equal to adding value and reducing potential costs. That is what paid-for tools bring, added value to optimize your costs and increase your probability of success.

Carbonite Migrate, driven by the DoubleTake replication engine, has been performing reduced risk Windows and Linux server migrations for more than 25 years. Working with a qualified technical sales engineer from a qualified Carbonite Migrate partner, or directly with Carbonite will help build a project that has low risk. Utilizing the partner, or Carbonite Professional Services team, reduces the risk to its lowest point while increasing the value significantly, and most likely shortening the total project time-line. In the event a problem does occur, the services team can help you work through the necessary actions. In the unlikely event of a product issue, a seasoned 24 by 7 Customer Support professional is available to help.

Performing Windows and Linux Server migrations with minimal business and technical impact, using repeatable processes, that are identical across different infrastructures, and being able to quickly restart if issues arise, or bring you “back to good” in minutes for simple, or complex configurations, can help to drive the cost of migrations to less than free, by reducing the risk. With Carbonite Migrate you can increase your human resource productivity by completing more server migrations with fewer people. This gives you the ability to increase the number of servers and shorten the timeline or reduce resource costs without adding risk.

Looking at the same 10 server migration scenario using Carbonite, you can see immediate risk and cost reductions. One person does not need to spend 8 hours or more to setup and start replication to a destination environment. In fact, one person can easily complete the setup in under an hour. Once in synch, all 10 servers can be cutover at once and usually in under 30 min. Post cutover and testing can be optimized with pre, and post cutover scripting configured through each replication job. So, for less than the price of “Free”, you can accomplish the same 10 server migration with fewer resources and a lot less risk. Think about the savings for the 50-server migration wave when compared to the higher risk of ‘free’ tools. When you include the fact that the simple and complex server environment will use the same processes, you can recognize exponential benefits with simplified migrations, reduced risk, optimized resource, standardized revert options and intelligent automation.

The 2-terabyte problem can utilize the difference mirror to potentially recover within the defined migration window with or without the high-speed storage option of Premium SSD v2 in Azure, the Development emergency can easily be recovered to a good ‘known’ spot for the team to address the production issues. Replication continues while they make their changes, then, when ready, just start the cutover again.

With the Carbonite replication engine, you can:

Replicate a full server, or a set of files and folders to one or more target destinations. You can select the volumes and mountpoints, and even exclude, or omit selected files and folders using replacement masks. This can optimize the data being sent across the network by eliminating unwanted information. You can even choose to send selected files or folders to a different location or even a different target to optimize your storage costs for static files such as logs or backups. You can even cross Availability Zones, Regions, and Subscriptions without issue.
Auto-provision the target VM’s when replicating to VMWare ESX, or Hyper-V. You can utilize a specific process to auto-provision the target VM to help reduce the manual activities, shorten timelines, limit risk, and optimize costs. If you are looking to ‘right-size’ the target, you can do this through the job settings for each server replication job, or through the fully functional PowerShell and API scripting options.
Pre-Seed the target data by establishing the target VM and restoring an old save. The replication engine can perform a Checksum difference mirror that only sends data that is not already on the target. This can optimize initial startup time and provide that extra risk reduction action by essentially picking up replication where it left off in the event of an outage or action that requires replication to stop.
Utilize many new and advanced infrastructure options when they are released, like Azure Stack HCI, Azure Stack Hub, Azure VMWare Service (AVS), Azure Stack Data Box, Azure Shared Storage, Azure Ultra-Disk, Azure Premium SSD v2 Storage, AWS Outpost, AWS Snowball, AWS VM Cloud (VMC), Google transfer Device Google VMWare Engine (VME) and much, much more.
Upgrade your Microsoft SQL Database to a new, Infrastructure as a Service or IaaS, instance and version of SQL running on a new version of Windows. Point that SQL Migration to the Azure Premium SSD v2 or Ultra-Disk storage without changing your processes.
With assistance from the Professional Services team, Migrate or keep Available your Microsoft Clustered Services, SQL Cluster, or Always-on Availability Group, to any destination that will support the source environment.
Replicate Servers between instances, recovery zones or regions, and even disk volumes greater than 8 terabytes or disk blocks larger than 512 bytes (Ultra-Disk or Premium SSD v2) with no extra requirements or setup.

The following configurations are not supported with the Carbonite replication engine:

Real Application Clusters (RAC)
Automated Storage Manager (ASM)
Veritas Storage Systems
Common Internet File System (CIFS)
Server Message Blocks (SMB)
Network File System (NFS)

Requirements for the Carbonite replication Engine include:

File Systems Supported
- NTFS
- Ext-3
- Ext-4
- XFS
- BtrFs
- ReFs

Operating Systems Supported
- Win 2008 R2 and above
- RHEL
- CentOS
- Oracle Linux
- SUSE Linux
- Ubuntu

Supported Linux versions can be found in the Linux distribution link below.

Older Operating systems not listed may be migrated with the use of the Carbonite Professional Services team.

The following links provide additional requirements for the Carbonite Replication Engine

Which-Windows-Versions-are-Supported-by-Carbonite-Replication

Which-Linux-Distributions-Versions-are-Supported-by-Carbonite-Replication

Support Knowledge Base

For more information on the Carbonite replication engine, or to talk to a Carbonite Sales Engineer about your specific business needs, or request a demo.

Author

David Mee

David Mee is the Principal Consultant for Carbonite’s Availability, Recover and Migration (ARM) team. He is based in Southern California and works as the technology specialist and liaison between the technical teams and business leaders. David has spent over 25 years designing, implementing, integrating, automating, and managing highly complex, geographically dispersed Availability, Disaster Recovery, and cross infrastructure Server Migration projects for some of the largest Global companies. David’s’ career includes more than 11 years at a custom compiler house, finishing as the global lead technology specialist. During his tenure he applied and refined his skills on creating solutions using, at the time, the developing technologies that many of the advanced function in the computer field are using today.