March 28, 2014
Data is critical to most operations. Naturally, more projects involve integration. It is a complicated job. To simply data migration – plan the plan first. Implementing new systems is part of any IT life cycle. Business must continue to increase productivity and IT systems are a key part of many improvement efforts.
It is typical that the new system must accommodate the old data to make the transition work effectively. This requirement means that many projects have a critical and complex sub-project to migrate the old data to the new system. Any migration requires the use of an ETL process (extract, transform and load), but system upgrades or replacements tend to be the most complicated. They typically must be done in a short period of time and require full migration or there is a high risk of data obsolescence and/or extra effort by existing operations.
Changing systems is an added stress to an operation so data challenges add to the extra work that is required. Further complicating the process is the need to extract all the rules for the data from the old system and convert it with the appropriate codes, categories, history, formats and other dependencies. Such conversions require an understanding of the old system which may not be fully documented or understood by current staff. The old rules may not easy translate to the new system structure. Just because a system no longer meets the needs of an organization it does not mean that the old data is no longer needed and certainly does not simplify the work required to transfer this information to the new system.
Almost every ETL process needs to ensure scrubbing and validation. Migrations for system upgrades typically require special work to migrate a complete transaction record. As opposed to ETL work to data warehouses used for data analytics or other accountability usage, the data cannot be stored as a translated final result, but rather is converted in the state of the transaction as used by the new system. The new system must be able to complete any workflow remaining (e.g. , accounts receivable) and provide a complete history of all transactional events.
While the job has many challenges, there is a finite set of data to be converted so a plan can be made and the conversion may be tested prior to the go-live. Additionally, the entire process can be scripted; timed and documented so very accurate schedules can be made to assist during the high-stress conversion. The following items are a summary of items that can be included in such a plan.
We recommend that a data migration plan address business rules, data categorization, quality assurance, processing (the ETL functions of the migration), migration optimization and the iteration process typical in migration work.
Finding the staff that know the rules and getting access to their time is essential to identifying the rules unless there is already a well-documented set of rules and the data complies to them well. Getting the buy-in of their managers as well as the staff themselves is important. As with any project, setting proper expectations and deadlines is helpful to avoid unfocused meetings that consume too much time. Of course, the issue of complicated and ‘fuzzy’ rules makes for situations that can consume a great deal of time. For the sake of your team spirit and the time budget a good facilitator may be needed to document all issues and divide the list into those that can be addressed individually and those that require the team. Bringing sample data as well as defining the business rules may be necessary to address the most complicated issues.
Identifying and defining the existing rules is mixture or art and science. The longer the legacy system has been in place, the more challenging rule definitions can be do identify. There is often a series of rules that have been applied in code, databases and procedures. Documentation of course may be missing or incomplete. Access to experienced staff is a great place to begin, but a good data extract sometimes is an equally good place to start. Finding unique codes and identifying relationships are smart ways to ensure the experienced team members have a full set of data categories to examine and explain. Getting a complete scope of work makes planning more reliable.
It is recommended to create validations on the load process to enforce the rules as a final check to reduce bad data inserts. Many commercial systems provide smart load software to enforce data rules and reduce errors.
Once the old rules are identified and the rules established, then the old data must be reviewed to ensure that it follows the rules as defined. The same data processing that will be used for the migrated data may be applied as the old data is processed. The old system may have had special rules established to handle bad data or the system may have just failed. Rules for what data may be safely discarded must be established or a remediation process must be applied. Documentation started early and well organized will save time in later stages.
Categories of Data Migration
The more that the rules can be classified or categorized the easier it will be to communicate with the team and establish processes for the migration. For example, a Telecom migration of billing would have voice, data, video and security as major service areas. Voice sub categories are residential and business. Then services lines fit within these categories. For example a possible definition may start with an outline and then work the rules. In the example below, older services for a Telecom may have mixed rules as new technologies allow services to be blended together. Identifying old data structures to match services and map to the new structure can assist to identify rules that must defined to migrate the data.
Sample outline of a service category work sheet
Plan the QA
It may seem counter-intuitive, but many teams work on the QA rules before they plan the transformation development. One benefit is to allow the ETL team to fully understand the conversion rules as they will know the tests that must be applied. While some argue that a team gets a better result if the final testing criteria are not provided to the development team, others argue that automation of testing is the best approach to success. Either way, make sure the full testing rules are established along with the experts to ensure all testing is accomplished whether during the migration afterwards or in both ways. Here is an outline of items to consider when draft testing scenarios and tests:
With the business rules identified and the testing scripts and scenarios defined, the migration process is well organized and straight-forward to complete. There are many tools available to use for the process. Our toolset includes Pentaho Data Interchange and custom programming. Most Big Data companies provide ETL and migration tools. Some companies specialize in migration and many large software vendors provide loading tools and programming aids to ensure only good data is loaded. Of course, spreadsheets are a tool any experienced data analyst will use when a quick sample data set needs to be sorted and shared.
Data Migration – Plan the Plan
As with any good development project, begin with the end in mind. Build in your validation and exception handlers from the start so that it is easy verify success and address data exceptions. Assume that you will find more rules as you go and you need to account easily exceptions as they occur. A little extra work in the beginning of a big migration project saves hours if not weeks before the job is done. When the job is really complex we write our load information into a database and create a quick excel files so we can look at the load results. A good list of issues simplifies trouble shooting.
Sometimes a temporary solution or an alternative to a full migration is used so that historical data is not fully migrated. This approach requires the maintenance of the legacy system in ‘view’ mode to allow users to see the history. Some teams determine that the new system is so different that the complete history is not justified. Within a few years the new system will have better and more useful history. In this situation, the old system is left running in ‘view’ mode so history can be checked as needed or until the full migration can be done.
Conversion Optimization Plan
The faster the data can be converted the less time you need to shut down an operation during a migration. There are lots of ways to speed up a migration. Most teams focus on the business rule development first, and then work on optimization. Hight testing iterations benefit from early QA optimizing.
Testing Data Migration – Plan the Plan
The conversion of data completion has two cycles. One, repeat migration work and fixes until it is perfected. Two, stop once the amount of remaining data is small enough to fix manually. A clear scope of work and timeline keeps the project focused. Motivate the team towards a finished result with a clear path. Each test will yield results that may warrant a partial repetition of the process. That is, results may indicate that business rules problems. They may have been missed, misunderstood, improperly applied or not tested.
It is normal to find exceptions and additional rules during migration; especially as the process moves from the early partial migrations to full migration testing. It is important to categorize and attack the issues holistically. A good spreadsheet with sorting of exceptions can make a massive set of exceptions easy to read. It can quickly show set of problems that may need a programmatic solution.
For large projects that require a large team to complete the migration, track the status of your key conversion. Break the data into categories and report exceptions, migration times and other key variables. Establish goals for migration execution speed and for exception elimination. Sometimes a basic dashboard can show the team progress and keep them motivated when exception counts feel too high.
Data Migration Summary
Establish the parameters for the project. Break the plan into phases and establish goals for migration execution, data exception and insertion rates over time. Document the rules and decisions. Keep them current. Group decisions are easier with a well clear guidelines. Specify the steps, Clarify the goals and publish the results along the way. Informed team members make good decisions and get results.