A New Frontier for GPU Analytics
OmniSci is the pioneer in GPU-accelerated analytics (also referred to as GPU analytics), enabling businesses and government to rapidly find insights in data beyond the limits of mainstream analytics tools.
How is GPU-Acceleration Used in Analytics?
Most people are familiar with “mainstream CPU-based analytics tools.” They consist of the common Business Intelligence (BI) and Data Visualization solutions, as well as analytics tools for Geographic Information Systems (GIS). These are feature-rich tools, primarily designed to provide self-service reporting dashboards, drill-down, and visualization capabilities to a lot of workers. They typically rely on underlying processing technologies and require complex, expensive system architectures and data pipelines to support them.
In contrast, GPU-accelerated analytics refers to a growing array of use cases that require two fundamental capabilities, around handling big data using GPUs and delivering a radically new interactive GPU analytics experience:
- Big Data: large volumes, high-velocity and new types of data that organizations are managing; and
- Interactive Experience: Delivering an agile and interactive (zero latency) analytics experience needed by data engineers, analysts, and data scientists.
When it comes to making use of the explosion of data in the world, GPU acceleration use cases have a combination (i.e. not all three need to be present, but often are) of three fundamental data attributes:
A very large volume of structured data. Most often we see tables ranging from tens of millions to the tens of billions of rows (although we also work with some organizations with single tables in the hundreds of billions of records).
High-velocity data streams are being generated from the explosion in data from IoT sensors, clickstream data, server logs, transactions, and telematics data generated from moving objects, like mobile devices, cars, trucks, aircraft, satellites, and ships. Often this data is pouring in at millions of records a second.
Location and Time (Spatiotemporal) Data
At least 80% of data records created today contain location-time (or spatiotemporal) data. This represents a big challenge to mainstream CPU-based tools, because quickly analyzing granular-level spatiotemporal data is incredibly compute intensive, and lends itself, poorly, to traditional indexing and pre-aggregation techniques. All mainstream BI and GIS-analytics systems fail to cope with spatiotemporal datasets above relatively low volumes of data.
The second part of the GPU-accelerated analytics equation deals with the analytics experience. Firstly, how agile is an organization’s workflow at getting large volumes of data from sources to the analytics engine? Secondly, how effortlessly can an analyst build dashboards and interactively explore the data? Together, the two factors broadly define the “time-to-insight” of an analytics platform, or how long it takes to get from raw uningested data to being able to generate insights from that data. Again, this typically isn’t an issue with small volumes of data but becomes a huge issue in Big Data settings.
An Agile Data Pipeline
Traditionally, organizations expend huge amounts of money and time wrangling big datasets to get good data from their sources all the way through to the eyeballs of an analyst. With mainstream systems, based on traditional CPU architectures, this involves very large hardware footprints (often up to thousands of machines), due to the low parallelism of this architecture. Next, they still need to wrangle data into a form that can be queried in a (potentially) performant way. This involves data engineers doing tasks such as downsampling, indexing, and pre-aggregating (often called “cubing”) data. This low-value work is becoming a major cost in IT departments and furthermore, the techniques are often inappropriate for many GPU data analytics use cases. For example, downsampling and pre-aggregation is antithetical to the idea of finding an individual record that an analyst might be concerned about, like a rogue object within a network.
With GPU-accelerated analytics, the organization avoids this low-value human wrangling effort by ingesting the entire dataset to the system. Such an approach is viable due to the supercomputing level of parallelism provided by the system’s use of GPUs, which means queries can be evaluated in real-time without relying on ingest-slowing pre-computation.
Mainstream analytics tools typically provide a “click-and-wait” experience for analysts, regardless of data volumes (the wait period can range from seconds to hours, depending on the dataset size and query complexity). While feature rich, these tools are simply not designed with high performance in mind, so Big Data analysts find them unsatisfactory for insights discovery, ultimately using them for reporting and interesting visualizations, rather than true analytical exploration.
In contrast to mainstream CPU-based analytics, GPU-accelerated analytics use cases require analysts to perform “speed-of-thought” exploratory analysis. Often these use cases are considered mission critical, and any discernible latency in returning query results can dramatically impinge the ability to explore the data and find any “needle-in-a-haystack” insights. That latency threshold is in the low hundreds of milliseconds, even on datasets in the tens of billions of records. This not only allows speed-of-thought analysis but also allows people in meetings to have “conversational interactivity” with the data but only with the right GPU accelerated software.
In their seminal work “The Effects of Interactive Latency on Exploratory Visual Analytics,” Zhicheng and Heer concluded:
“In this research, we have found that interactive latency can play an important role in shaping user behavior and impacts the outcomes of exploratory visual analysis. Delays of 500ms incurred significant costs, decreasing user activity and data set coverage while reducing rates of observation, generalization and hypothesis.”
Read the full paper here: https://idl.cs.washington.edu/files/2014-Latency-InfoVis.pdf.
Types of GPU-based Analytics Use Cases
here are dozens of GPU analytics use cases use cases within industries such as Telecommunications; Financial Services; Automotive; Logistics; Oil & Gas; Utilities; Advertising; Defense & Intelligence. Examples include:
- Telecommunications: Network Reliability Analysis
- Automotive & Transport: Vehicle Telematics Analysis
- Investment Banking: Alternative Data Insights
- Utilities: Smart Meter Analysis
- Oil & Gas: Well Log Analysis
- Pharmaceutical: Clinical Trial Analysis
- Cyber Incident Investigation
- Defense & Intelligence: GEOINT
Learn more about how OmniSci addresses these and other use cases here.
The Medium to Long Term
The trends that got us here are not going away. Data continues to grow at 40% year over year. Competitiveness in virtually every industry has become dramatically impacted by analytics capabilities, the ultimate goal being to find more insights faster than the competition. Additionally, organizations are fighting a talent war to attract and retain analysts and data scientists, and realize the need to equip them with technologies that deliver exceptional productivity.
Therefore, we believe that over the medium to long-term the capabilities that we define today as GPU-accelerated analytics will eventually become mainstream, fundamentally transforming how analytics work is done in any organization.