Similar to “the cloud,” the term “big data” is often referenced when discussing modern technology, but it is not always understood fully. Not many people grasp the meaning, purpose and potential of big data. The following sections explain big data and related topics in analytics.
What Is Big Data?
TechTarget defines big data as “an evolving term that describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information.” It doesn’t necessarily equate to any specific volume of data, but it is often used to describe terabytes (1,000 gigabytes), petabytes (1 million gigabytes) and even exabytes (1 billion gigabytes) of data captured over time.
The goal of most big data systems is to surface insights and connections from large volumes of heterogeneous data that would not be possible using conventional methods
“The basic requirements for working with big data are the same as the requirements for working with datasets of any size,” according to cloud computing platform DigitalOcean. “However, the massive scale, the speed of ingesting and processing, and the characteristics of the data that must be dealt with at each stage of the process present significant new challenges when designing solutions. The goal of most big data systems is to surface insights and connections from large volumes of heterogeneous data that would not be possible using conventional methods.”
Big data can be “difficult to nail down because projects, vendors, practitioners and business professionals use it quite differently,” DigitalOcean says. The company defines big data as large datasets and the category of computing strategies and technologies used to handle large datasets.
3 V’s of Big Data
The 3 V’s of big data offer insight into what separates big data from other data processing.
“Volume is the V most associated with big data because, well, volume can be big,” ZDNet says. “What we’re talking about here is quantities of data that reach almost incomprehensible proportions.”
For example, Facebook currently stores more than 250 billion images. The task manager Todoist has roughly 10 million active installs on Android alone, and each user has multiple task lists. Other collections of data are present in enterprises from energy and health care to national security.
Big data systems are often defined by the sheer scale of information they handle. In many cases, the amount of volume that is captured, stored and processed is difficult to imagine. “Often, because the work requirements exceed the capabilities of a single computer, this becomes a challenge of pooling, allocating, and coordinating resources from groups of computers,” DigitalOcean says. “Cluster management and algorithms capable of breaking tasks into smaller pieces become increasingly important.”
Learn more about a career in analytics
Develop the skills needed to launch a successful career in analytics with the online M.S. in Analytics from Notre Dame of Maryland University.Start now
Big data also involves the speed that information moves through the system. For many applications, data is received in real time and must be analyzed to gain insights into the system.
Social media is an ideal example of the velocity of big data. While Facebook stores more than 250 billion photos, users upload more than 900 million photos a day on Facebook. “So that 250 billion number from last year will seem like a drop in the bucket in a few months,” according to ZDNet.
“Velocity is the measure of how fast the data is coming in,” ZDNet continues. “Facebook has to handle a tsunami of photographs every day. It has to ingest it all, process it, file it, and somehow, later, be able to retrieve it.”
The different types of data collected complicate the sheer volume and velocity of information received. Not only is there a wide range of sources that are processed, but there is a difference in their relative quality.
A great deal of data is unstructured and doesn’t easily fit onto fields of a spreadsheet or a database application. From locating something for a legal case in thousands or millions of email messages to structuring photos, videos and audio recordings for a media company, consolidating a variety of data can be complex and difficult.
Often, big data systems accept and store data close to its raw state. Then any changes will take place when the data is processed.
Big Data Terminology
|Algorithm||Mathematical “logic” or a set of rules used to make calculations. Logic or rules are coded or written into software as a set of steps that eventually lead to an output.|
|The Cloud||Software or data running on remote servers, rather than locally. Data stored “in the cloud” is typically accessible over the internet.|
|Cluster Computing||The practice of pooling resources of multiple machines and their capabilities to complete tasks. It requires a cluster management layer to handle communication between individual nodes and coordinate work assignments.|
|Data Analyst/Data Scientist||Data analyst and data scientist are two job titles for analytics professionals.|
|Data Mining||Data mining involves “processing data and identifying patterns and trends in that information,” according to IBM. “Data mining principles have been around for many years, but, with the advent of big data, it is even more prevalent.”|
|ETL||ETL refers to extract, transform and load, or the process of taking raw data and preparing it for the system’s use.|
|Hadoop||Open-source software framework from Apache that is widely used in big data. Its collection of programs allow for storage, retrieval and analysis of large datasets using distributed hardware (data spread across smaller storage devices rather than a single large one).|
|Internet of Things (IoT)||This term refers to how everyday items are able to collect, analyze and transmit data. For instance, self-driving cars and self-stocking refrigerators are becoming more common.|
|Map Reduce||The big data algorithm that breaks up an analysis into pieces that can be distributed across different computers in different locations. It distributes the analysis (map) and then collects the results back into one report (reduce).|
|NoSQL||Databases that don’t use relational tables used in traditional database systems.|
Careers in Big Data
“Job postings seeking data scientists and business analytics specialists abound these days,” Management Information Systems Quarterly says. “There is a clear shortage of professionals with the ‘deep’ knowledge required to manage the three V’s of big data: volume, velocity, and variety.”
The online master’s degree in analytics from Notre Dame of Maryland University prepares students for advanced roles in the growing field of big data. In a convenient and flexible learning environment, students gain multidisciplinary competencies in knowledge management technologies, qualitative processes and economic principles of change risk management. This program is offered fully online.