Cloud adoption is on the rise, with cloud spending expected to grow at more than six times the rate of general IT spending through 2021. Businesses today are tasked with moving huge amounts of data from on-premise to cloud storage platforms such as AWS S3. An effective data migration strategy is extremely important and often overlooked until its severity is realized during the migration process.
When we consider the sheer magnitude and scale of Big Data, the on-premise infrastructure operations are rather limiting — both in terms of scale and capability. It has been convenient for many organizations to default to cloud operations for everything related to Big Data and Artificial Intelligence (AI). A few scenarios where cloud migration can turn out to be the preferred option:
Your client wants a speedy application implementation and deployment
Your project has started witnessing heavy traffic overnight
You are cautious about the effect of data center going down
It is becoming expensive to administer the growing database needs
Challenges with Migrating Data
A common misconception about cloud migration is that it will be a one-time journey. But the reality is that the process of migrating data infrastructure to the cloud should happen gradually and systematically while minimizing downtime and disruption to users. Moving data is only one part of the puzzle. There are several other challenges associated with cloud migration.
Cost can play a significant role in deciding the approach to be taken. Underestimating the resources involved in cloud migration can quickly cause costs to spiral out of control, and cloud migration could eventually turn out to be a cash-eating monster. Lyft reported in March 2019 that it will be spending USD 300 million in AWS over the next three years. With big data and cloud, there is also a looming elephant in the room of data security. Your organization’s sensitive data is put at risk when moved from on-premise to cloud. Companies can incur huge economic losses if this data is leaked during the process. It is important to remember that the onus to secure data is yours, not the cloud provider’s. Another grave challenge is finding people with the right skill sets to execute a cloud migration plan successfully. Lack of knowledge on the ever-changing cloud technologies and insufficient skill sets can lead to slow, ineffective adoption in the way of seamless cloud migration.
Before starting a migration process, it is crucial to analyze the cloud’s dependencies and constraints, migration patterns, potential applications, and the advantages of infrastructure as a service (IaaS). This will effectively launch you on the path that works best for your company. There are three primary types of cloud migration, based on how different companies want to use cloud to accomplish their goals.
Data Migration Models
When broadly classified, we see three models of data lake migration from on-premise to the cloud:
This type of migration refers to moving an on-premises Hadoop cluster to one built ground-up from basic compute instances in the cloud. This is the simplest migration model leveraging existing staff skill sets. It uses only the IaaS aspect of the cloud with persistent compute instances, typically with instance local storage. Except for infrastructure access, security is entirely the cloud customer’s responsibility, as is the cluster's creation, configuration, monitoring, and maintenance.
Moving from Hadoop on-premises to using Hadoop as a service from the cloud provider is the second model of migration. Much of the heavy lifting around Hadoop cluster setup and configuration and ensuring compatibility of Hadoop ecosystem components is left to the cloud provider. A data lake management application may aid in creating and using transient Hadoop clusters on-demand and interface directly to cloud-native persistent storage.
The third model of data lake migration involves a gradual transition from Hadoop on-premises to hybrid architectures — on-premises/cloud, using a variety of cloud-native storage options and services in addition to the Hadoop ecosystem tools, adopting cloud service patterns for processing event streams, real-time analytics, and machine learning. This model presupposes a metadata management layer to remove any mismatch between the underlying technologies and provide a seamless data fabric view across all the data regardless of storage location.
There can be numerous ways for migration depending upon the set of options you choose -
The aforementioned three migration models (Forklift, Hadoop AAS, Hybrid)
Hadoop distributions (Cloudera, Hortonworks, MapR)
Hadoop ecosystem tool variations
Cloud service providers (AWS, Azure, GCP)
Meaningful comparisons will need to be done in the context of specific business and technical requirements.
Developing an Effective Data Migration Strategy
Your migration is unique to your Hadoop environment, so there isn’t really a one-size-fits-all migration plan. Make a plan for your migration that gives you the flexibility to translate each piece to a cloud-computing paradigm.
Knowing your current software architecture, infrastructure, and database schemas helps in defining the timeframe, cost, and effort required to implement your cloud migration. You can begin by evaluating the business use case of the data lake, security considerations, and prioritizing the apps/data that need to be moved in the first place.
POC on a subset of data
Testing the waters before you go all-in with a new cloud vendor is highly recommended. You need to develop a proof of concept to validate the network challenges, feature parity, and performance comparisons. In this phase, you need to effectively test your workload and understand cloud storage services, the necessary security controls, and production cluster sizing.
As you have now verified the cloud provider and model as per your requirements, you can proceed with the migration process and begin moving your data and apps to the cloud. A phased-approach consistent with the chosen migration model takes into account the following:
Infrastructure migration decisions — storage and compute, sizing, scaling, networking
Security of data and governance of data access and resource usage in the cloud
Retooling data ingestion for sending data to the cloud data lake that is currently received by the on-premises platform from different sources
Detailed inventory of on-premises data lake and mapping to cloud platform
Data transformation pipelines and corresponding translation to cloud mechanisms
Application migration — forklift vs. rewrite, processes for development, test, and production
Migration options for historical data
Data Lake management applications
As your data and applications are now successfully re-hosted, you can focus on automating processes within the new infrastructure and optimizing its performance. It’s best to put automatic testing frameworks to use and consider Infrastructure as Code (IaC) approach to streamline your deployment process. You can also double-check some of the most critical aspects of your infrastructure manually, e.g., security, compliance, performance, etc.
As a trusted partner, Clairvoyant offers a broad set of cloud migration capabilities to support a diverse array of technologies, regulatory requirements, operating models and target environments. Our experience managing hybrid environments and knowledge of traditional and next-gen infrastructures, combined with comprehensive services, has enabled us to successfully deliver a range of analytics migration projects across several industry sectors on cloud platforms. Companies today frequently settle for what they can get rather than what they actually want or need. A comprehensive risk evaluation backed by professional cloud expertise can help achieve one’s long-term strategic targets. Vendors, on the other hand, must remain flexible in adapting to the evolving market demands in order to make the most of new technologies.
Cloud computing can offer a variety of organizational benefits — flexibility, efficiency, and strategic value. With a thorough assessment, any organization can create a solid migration plan fitted to their short-term and long-term business objectives. As most successful companies have shown, the time and effort required for cloud migration processes are more than leveraged by the resulting gains in the quality, efficiency, and speed to market technological solutions.
Still, running your business on cumbersome and outdated infrastructure? It might be time you consider migrating your processes to the cloud with minimum risk to your business. Learn more at www.clairvoyant.ai
Amazon Web Services Whitepaper, A Practical Guide to Cloud Migration.
Google Cloud Documentation, Migrating On-Premises Hadoop Infrastructure to Google Cloud Platform. Retrieved from:
Rajagopalan K, Data Lakes in the Cloud. Retrieved from: