Big Data? Why it benefits AI?

Big Data? Why it benefits AI?

Cover_Big data

A classic definition of big data

Since the 1990s, the term “Big Data” has been mentioned by people and then developed rapidly within decades. Big data has often being cited as a source for better understanding politics, economy, society… etc. People might image that big data refers to a super large dataset, but, that’s not a precise and comprehensive recognition. Let’s first take a look at a helpful and classic definition provided by Gartner, Inc – the three Vs.

“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Doug Laney, Analyst, Gartner, Inc.

Volume,” “Velocity,” and “Variety” are then be widely known as the three Vs of big data.

  • Volume: The quantity of data. Usually, using big data means that you need to have the ability to process high volumes of low-density, even unstructured data.
  • Velocity: The speed at which the data is received and acted on to meet the demands that lie in the path of growth. In general, big data is required or available in real-time processing. Compared to traditional data flow, big data is produced more continually.
  • Variety: The type and nature of the data. Typically, various types of data are available when dealing with big data. Additional data preprocessing is frequently required due to the rise of new unstructured or semi-structured data types.

Although this three Vs definition has been challenged by other industry authorities over past decades, other Vs such as “Value,” “Veracity” popped up; the original three Vs is still a popular and well-known introduction when talking about big data.

Big Data 4V

What exactly is big data?

Actually, big data is a buzzword and a “vague term.” In general, it refers to a large volume of data that is hard to process using existing techniques that require in-memory computation. That is to say, you can treat big data as larger, more complex even unstructured datasets, especially from new data sources. Traditional data processing software has difficulty on managing these voluminous datasets. However, these massive volumes of data opens the door to various business problems you wouldn’t have been able to tackle before.

Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.

Snijders, C.; Matzat, U.; Reips, U.-D., 2012

The size of big data is a constantly moving target due to the rapidly development of both hardware and software. Based on an IDC (International Data Group, Inc. and International Data Corporation) report prediction, the global data volume was predicted to grow exponentially from 4.4 Zettabytes to 44 Zettabytes between 2013 and 2020 (\(1\) Zettabyte \(= 10^{12}\) GB). By 2025, IDC predicts there will be 163 Zettabytes of data. That is to say, data volumes are doubling in size about every two years. One question left for people is that how to manage and make use of these data efficiently.

So… why big data benefits AI?

In short, AI and big data complement each other. The more data is given, the better AI becomes. AI, or more specifically, machine learning and deep learning models need bunch of data to improve themselves. Big data can help data owners understand the patterns in their data a lot better, even in a seemingly impossible approach in the past. Humans cannot deal with big data efficiently without the help of algorithms. However, it is worth noting that a computational algorithm cannot catch and analyze all the data. This is why AI and big data complete each other. To extract valuable information hidden in big data, it is promising to make use of top-notched AI technologies which can process big data and apply machine learning or deep learning models.

Big Data visualization


Big data gives people new insights that open up new opportunities and business models. However, a data scientist needs to spend roughly more than 50% of the time “cleaning” or “curating” the raw data before it can be used for machine learning projects. So, what’s the solution?

Yes! AI reveals the possibilities.

Big data is going to get bigger and bigger. The future trend will be an increased demand for AI, especially ML algorithms to integrate, manage, and analyze the data. People just scratched the surface of big data and AI. Both of them are two important branches of computer science today. The development of big data technology depends on AI, while AI requires tons of data for support and improve itself. Technology innovation has just begun; keeping up with big data technology will still be an ongoing challenge. Hope this article is helpful for you to understand about big data and AI. We will have more informative article in the future!


Related articles:

Editor: Chieh-Feng Cheng
Ph.D. in ECE, Georgia Tech
Technical Writer, inwinSTACK


Select list(s)*