In my previous post I wrote about what Data is and when we say it is Big. At the concluding part of the post, I promised to delve into the differences between data and information. Before I do that, here is a recap of the key points of that posts for the benefit of those who may not have read it.
Data is a raw form of knowledge about things, processes or people we encounter on a daily basis. There are many sources of data and these have evolved over the years from point-of-sale (POS) scanner data of the 1980s, web clicks of the 90s to the social media, mobile, and sensor or machine-generated data we are experiencing today.
Data may be of structured format stored in spreadsheets or relational databases or unstructured or semi-structured scattered across a multitude of channels. Regardless of the format, data holds business value.
Data is said to be Big when it is characterized by high volumes, variety and generated at high velocity: the famous three V’s of Big Data. For completeness, other V’s (such as value, variability, veracity etc.) have been added to the mix but the 3 V’s, arguably, convey the essence of Big Data.
Now to today’s posts: How is Data Different from Information?
Let me state that a lot has already been written about this topic as posted on https://www.geeksforgeeks.org/difference-between-information-and-data/ and https://www.diffen.com/difference/Data_vs_Information, to cite just a few.
Rather than re-listing these differences, I will rather give practical examples that set data apart from information. This way, readers will be able to recognize the nuanced differences between data and information regardless of how it is presented to them.
- Data: The sales records of a retail business indicating transaction volume and date, purchase amount and items in that transaction are raw data.
Information: Analyzing these sales records to determine which day of the week or month of the year has the highest average sales by volume and amount of sales is information.
2. Data: The history of sensor readings of vehicular axle load on the highways of a country over a 30-year period is data (raw source-of-truth).
Information: Finding that Highway A has the largest distribution of axle load year-on-year for the past 30 years than any other Highway in the country is important information.
3. Data: Health records of all patients that visit a particular health facility that capture patients demographic details, dates of visit, diagnosis, medication given etc. is data.
Information: A discovery that there is an increasing trend in the incidence of diabetes among patients between the ages of 30 and 40 years is very critical information that care givers must pay attention to.
I can simply go on and on with similar examples, but I believe you get the point.
In brief, data is the raw facts and figures (source-of-truth if you will) that can be analyzed, summarized, aggregated and/or interpreted to produce information within a certain context. Data in and of itself is meaningless until it is put in context and properly interpreted to produce intelligible information.
This was intended to be brief and I hope I succeeded in shining some light on the gray areas between data and information. Thanks for reading.