Practically anyone who spends a minute or two on the internet, reads any form of print media, engages in social media or even watches any television at all would have come across the buzz around Big Data. From its role in giving companies, who leverage it, a competitive advantage to more advanced usage in powering machine learning and deep learning models, Big Data seems to have etched itself into our collective consciousness. Whether you are a manufacturing company seeking to bring efficiency to your processes, a high tech firm hatching out that niche disruptive product t or just a regular business seeking to differentiate yourself in your industry, you cannot ignore the alluring promise of big data.
But before the Big, there was data, right? In this post, we are going to pin down exactly what is meant by data and the conditions under which it is said to be big. Data, without the big, is an extremely valuable asset for any organization, be it for profit, not for profit, governmental or non-government. Especially so when there is an abundance of technology tools capable extracting value and actional insights from it. Before we get ahead of ourselves, let’s get back to the purpose of this post: What is Data and when is it said to be Big?
Data Defined
Data, according to the Oxford Dictionary, is defined as facts and statistics collected together for reference or analysis. Now, depending on your industry or field of practice, this dictionary definition may or may not fully encapsulate your concept of data. However, we can all agree that before we add bells and whistles to this bare-bones definition, something needs to be collected.
That something that needs to be collected can come in the form of text, observations, figures, images, numbers, graphs, or symbols. For example, a physician may be interested in the name, age, sex, height, weight and many other demographics details associated with his or her patient. A person operating a retail business might be interested in quantities of various products at the shop, cost and sale prices of those items and many more. Yet still a scientist at the laboratory would want to record the temperature of the environment he operates in or the distances, areas or volumes of apparatus he uses. I believe you get the drift! All these constitute some form of knowledge about things around us and when collected, become data. Put another way:
Data is a raw form of knowledge about things, processes or people we encounter on a daily basis.
Taken in isolation, data does not carry much significance or value. Data needs to be interpreted to convey meaning, thereby offering value. But you may be thinking: Isn’t what we just listed above what we call information? Not quite!! I will be delving into the difference between Data and Information in a separate post. For now, let’s just say information is what puts data (facts and figures) into context. Suffice it to say that information is the result of analyzing and interpreting data.
Evolution of Sources of Data
Historically, new sources of data have emerged that held the potential to transform how organizations drive, or derive, business value. As pointed out by Bill Schmarzo in his book Big Data: Understanding How Data Power Businesses, in the 1980s, point-of-sale (POS) scanner data changed the balance of power between consumer package goods (CPG) manufacturers and retailers. The advent of detailed sources of data about product sales, later coupled with customer loyalty data, provided retailers with unique insights about product sales, customer buying patterns, and overall market trends that previously were not available to any player in the CPG-to-retail value chain. The new data sources literally changed the business models of many companies.
The late 1990s, saw web clicks become the new knowledge currency, enabling online merchants to gain significant competitive advantage over their brick-and-mortar counterparts.
Today, the craze, for good reason, is all about data-driven business revolution. New sources such as social media, mobile, and sensor or machine-generated data hold the potential to rewire an organization’s value creation processes. Social media data provide insights into customer interests, passions, affiliations, and associations that can be used to optimize customer engagement processes (from customer acquisition, activation, maturation, up-sell/cross-sell, retention, through advocacy development). Aptly put by Bill Schmarzo!
This last paragraph, brings us back to the concluding leg of this post’s essence: When do we say Data is Big?
Big Data
When we talked, earlier, about the details we collect about things, processes and people, there was an implied structure to the collection and storage of such details. These are typically stored in databases or spreadsheets for easy manipulation and analysis. Now imagine a company wants to make use of a variety of new sources such as social media, mobile, and sensor or machine-generated, call logs, applications logs files, images form corporate events to fully understand its customers and how they engage with its products. Data from such sources have become ubiquitous and could hold a lot of business value, but it is also mostly unstructured or semi-structured. The combination of such structured, semi-structured or unstructured data takes us to the realm of Big Data.
Big data is often characterized by the three V’s, namely:
- Volume: large quantities of data in many environments
- Variety: the wide array of data types frequently stored in big data systems; and
- Velocity: How quickly much of the data is generated, collected and processed.
We’ve come to the end of this post and I hope you have a better appreciation of what data is, how it differs from information and under what conditions we classify it as Big. Stay tune for more informative posts from DataWithData.
Credit to Doug Laney, who coined the term Big Data back in 2001 and identified these characteristics.