Information lifecycle management (ILM) is a policy-based approach to the handling of data, from the moment of the data’s creation until the time the organization removes it from all repositories. ILM is an aspect of data governance.
Information Lifecycle Management vs Data Lifecycle Management
Information Lifecycle Management (ILM) and Data Lifecycle Management (DLM) are often used interchangeably, as they are both policies that cover an organization’s data.
However, there is a distinction between the two. ILM is fundamentally about information, which includes both digital data values and information stored physically. For example, a customer’s name and address are items of information. ILM policy determines the handling of such information, including on outgoing letters.
DLM is similar, but it focuses on files containing data. A customer relationship management database would fall under DLM policy. In contrast, the values stored in the database would be an ILM matter.
What are the Stages of the Information Lifecycle?
Data exists in a six-stage lifecycle:
First, an event must create data within the organization’s system. This event can happen in several ways, such as:
- Manual creation: This occurs when a user enters data manually into the system.
- Data ingestion: This is the importation of data from another data source, such as an external database.
- Automatic generation: A device or system generates data automatically, which is then transferred into the organization’s database. Internet of Things data and network statistics are examples of generative data.
- Metadata: This is data about data. Metadata describes another database or provides information about files stored elsewhere.
- Analytics data: Analytics tools generate substantial data that falls under ILM policy. For example, regression analysis may create new information about a customer’s future purchases.
New data may not be immediately compatible with other systems within the infrastructure. For this reason, data generally passes through an integration process to prepare it for its destination.
Examples of such processes include:
- Data transformation: Data goes through a mapping process from its current schema into a new schema that is compatible with the destination.
- Data cleansing: The system removes corrupt, empty, and duplicate values to ensure data integrity.
- Data integration: New data merge with other data sources to produce more detailed records. This may involve the replacement of some values with information from other sources.
- Data enrichment: An user or process appends additional data to the record to produce more detailed information. For example, a company might have a customer database, which they enrich by adding phone numbers to each file.
Data may pass through a transformation layer multiple times during its lifecycle. For example, the Extract, Transform, Load process transforms data before the disposition stage.
Most data exist for use as part of some process. For example, when a customer places an order, they create order information. This information is then passed to the logistics system for fulfillment, and to the payment system for payment processing.
The application stage usually involves fresh data, although sometimes a system may need to recall older data to fulfil an action. ILM policy must ensure that all users and processes receive data that is:
- Accurate: Information should be accurate to the best of the organization’s knowledge.
- Clean: Data should be free of any errors, duplications, or corrupt values.
- Recent: All information should be the most recent version, with timestamps to indicate the data’s recency.
- Complete: Search results should provide all relevant information.
- Available: Users and processes should be able to access any data they need when they require it, with minimal delay.
Many organizations focus on this phase of the lifecycle, as it’s the one that most impacts productivity and service. However, there are three other equally important stages after this.
Data sometimes needs to leave the organization. For example, in the run-up to tax seasons, companies may send their financial data to an accounting firm.
Dissemination, or publication, is a one-way process. Once the data exists elsewhere, the organization no longer has any control. For this reason, data dissemination must be:
- Secure: External dissemination of data always involves a degree of risk. ILM policy should specify acceptable channels for data publication.
- Standardized: The data must be in a format that’s useable at the destination. Some organizations use EDI (Electronic Data Interchange) protocols for standardized B2B transfers of data.
- Traceable: Organizations should document all dissemination so that auditors can see precisely how and when third parties received data.
When dissemination occurs, it creates a whole new information lifecycle at the destination.
When data is not in use, organizations will often move data to a dedicated repository. These repositories can take many forms, but there are two main structures used to store large quantities of data:
- Data warehouse: This is a large relational database repository used to store mass quantities of structured data. Information typically passes through a transformation process, such as Extract, Transform, Load (ETL).
- Data lake: A data lake stores structured and unstructured data. Information is not transformed during ingestion. Instead, the lake uses an Extract, Load, Transform (ELT) process, where analysts can transform data if they need to.
Organizations might need to hold data for a certain amount of time. For example, financial records and customer data often have legally mandated minimum and maximum storage times.
Data removal may occur when the organization confirms that it no longer requires the data for any purpose. It may also occur in response to a removal request made under privacy laws such as GDPR.
ILM policy will typically include a method for documenting activity that occurs during the data lifecycle. This may be a manual record, or it may exist in the form of metadata, such as data lineage. This record is important in the final step of the information lifecycle, as it allows the organization to track any data replication that may have occurred during the information lifecycle. The record will also give details of any data disseminated to an external database.
These records sometimes exist in the form of manual logs, such as an Excel spreadsheet. For larger data sets, platforms may automatically record data lineage metadata.