- Published: 03 October 2006
Increase in the Growth of Data
Changes in solid state electronics, communication infrastructure, miniaturization of computing devices will dynamically influence the growth of data. In the data management world, there is discussion of structured (housed in files, databases, etc., where it is organized using an explicit structure ) compared to unstructured data, such as: email, bitmap images/objects, or text which is not part of a database. Actually, the common nomenclature being used is "unstructured" but really it has a very complex structure. This will definitely impact the exponential growth of data over the next century. But, let's step back to realize the implications of this growth especially in regard to our digital world and the data within that world.
By analogy, data is like a book in the library. It’s great when you can go into a library, search the catalog to locate the book, go to the shelf, open the book and find the information for which you were looking. Data in many forms is like the thousands of books in a library. Like a library book, data needs to be cataloged so it can be properly accessed. This cataloging function results in data about the data or data resource data (some call it metadata). Without such data (the library card catalog), we won’t easily find our book and its content. We have a similar example in the business environment. We create a spreadsheet that provides information about our products and their prices. We name the spreadsheet abc.xls on our personal computer. We created it today (when) but, we do not provide any additional information about where the data came from (it's source), the purpose for which we need it (reasons why), who else needs this information (either internally or externally), or how we actually created the information (if calculations or special programs were used to complete the request for the data). The data has significant meaning since it is the means by which we search, access, and provide data meaning to others. It helps to provide the overall context for the use of abc.xls”.
Within the spreadsheet, we have captured other data. For each column, we have created a column name that describes the content of the column. For example, customer name, customer number, order date, product name, product number, description, quantity that was sold and the price the customer paid for it on that date. We also include the cost of the product to calculate the net profit made on the sale. Down the rows, we have listed each customer who purchased the products.
Now, most of us can relate to this spreadsheet since it is a typical example of business sales information. But it does raise some interesting questions. What is a sale? Is it the day that the customer ordered it? Is it the day that we delivered it? Is it the day that the customer paid for it? So, when is a sale a “sale”?
Consider another scenario. What is the net profit associated with this product? How is net profit calculated? Is it sales price minus cost? Might it be sales price minus cost and discount? Or, is it sales price minus product cost plus variable delivery charges, and discount?
As you can see from this spreadsheet example, various interpretations and implications are made based upon the understanding of what the data represents. If definitions of the data are not available, commonly understood terms may be misinterpreted by your employees and customers. Your organization now has a data integrity problem, which is called "data chaos".
Let’s discuss data and metadata. Data are the single or combined facts – the raw data. On the other hand, data resource data or metadata is structured to describe the characteristics of a resource (external) or asset (internal). Data resource data is about knowledge which creates the ability to turn data into personal information for effective business action. From a business perspective, we locate and catalog other enterprise assets to answer questions. What assets do we have? What does an asset mean? Where is the asset located? How did the asset get there? How do I gain access to the asset? What is the value of that asset? Metadata answers these questions and provides a cross-reference to those business assets. Therefore, like data, it provides the source of enterprise knowledge in terms of relationships between assets, information retrieval and presentation, and knowledge management.
As Thomas A. Stewart wrote in his book: Intellectual Capital: The New Wealth of Organizations, “Knowledge assets, like money or equipment, exist and are worth cultivating only in the context of business strategy and architecture.” It is apparent that data and information have replaced physical assets as the driver of value, leading one to believe that those information-based assets are critical to the future growth of business. So, the quality of data and metadata is vital to answering critical business questions. It's all about the quality of data!
Without some framework for data and information quality, it is difficult (if not impossible) to manage and change your business. The following framework defines stages of development of your data management activities. Six (6) measurement categories span the five (5) stages of maturity.
Measurement Category or Stage:
Leadership understanding and attitude
- Uncertain: No leadership understanding of the issue
- Awakening: Willing to invest time and money to investigate.
- Defined: Become knowledgeable and supportive of effort
- Managed: Take on a participative role
- Certainty: Information quality becomes a key company strategy
Quality Organization status
- Uncertain: Quality is built into software application and tools
- Awakening: Emphasis to correct bad data and metadata
- Defined: Formalize data quality organization
- Managed: Participates with CIO in management
- Certainty: Information and Data Quality is foremost concern
Data quality problem handling
- Uncertain: No formal process defined
- Awakening: Short-term team handle major problem
- Defined: Problems faced openly
- Managed: Proactive problem recognition of data quality issues
- Certainty: Most data quality problems prevented
Cost of information quality
- Uncertain: Unknown
- Awakening: Reporting of some items
- Defined: Open Reporting of all items
- Managed: Improved savings drives new opportunities
- Certainty: Significant data quality cost savings achieved
- Uncertain: No data quality process
- Awakening: Short-term data quality effects observed
- Defined: Development as a key program/initiative
- Managed: Data Quality process becomes effective and efficient
- Certainty: Normal and continued process improvement
- Uncertain: Don't know why there is a Data Quality problem occurring
- Awakening: Some recognition of data quality problem
- Defined: Start to resolve major data quality problems
- Managed: Recognize that Data Error prevention is a key business operation
- Certainty: Know reasons for data quality problems
Remember data is the source of the enterprise knowledge. Measuring it has value -- just as valuable as measuring your business’ financial worth because it creates value either by design or by default. By default is not acceptable in today’s marketplace in light of the changes in solid state electronics, communication infrastructure, and the miniaturization of computing devices that will dynamically influence the exponential growth of data!