320x Filetype PPTX File size 0.71 MB Source: msrg.org
THE BIGBENCH PROPOSAL
End to end benchmark
Application level
Based on a product retailer (TPC-DS)
Focused on Parallel DBMS and MR engines
History
st
Launched at 1 WBDB, San Jose
Published at SIGMOD 2013
Spec at WBDB proceedings 2012 (queries & data set)
Full kit at WBDB 2014
Collaboration with Industry & Academia
First: Teradata, University of Toronto, Oracle, InfoSizing
Now: bankmark, CLDS, Cisco, Cloudera, Hortonworks, Infosizing, Intel, Microsoft, MSRG, Oracle, Pivotal,
SAP
05.09.2014 EXTENDING BIGBENCH 2
DATA MODEL
Structured: TPC-DS + market prices
Structured Data Unstructure
Marketpric d Data Semi-structured: website click-stream
e Item
Unstructured: customers’ reviews
Sales Reviews
Web Custome
Page r
Web Adapted
Log TPC-DS
Semi-Structured Data BigBench
Specific
05.09.2014 EXTENDING BIGBENCH 3
DATA MODEL – 3 VS
Variety
Different schema parts
Volume
Based on scale factor
Similar to TPC-DS scaling, but continuous
Weblogs & product reviews also scaled
Velocity
Refresh for all data
05.09.2014 EXTENDING BIGBENCH 4
WORKLOAD
Workload Queries
30 “queries”
Specified in English (sort of)
No required syntax (first implementation in Aster SQL MR)
Kit implemented in Hive, HadoopMR, Mahout, OpenNLP
Business functions (Adapted from McKinsey)
Marketing
Cross-selling, Customer micro-segmentation, Sentiment analysis, Enhancing multichannel consumer experiences
Merchandising
Assortment optimization, Pricing optimization
Operations
Performance transparency, Product return analysis
Supply chain
Inventory management
Reporting (customers and products)
05.09.2014 EXTENDING BIGBENCH 5
WORKLOAD - TECHNICAL
ASPECTS
Generic Characteristics Hive Implementation
Characteristics
Data Sources #Queries Percenta Query Types #Queries Percentag
ge e
Structured 18 60% Pure HiveQL 14 46%
Semi-structured 7 23% Mahout 5 17%
Un-structured 5 17% OpenNLP 5 17%
Analytic techniques #Queries Percenta
ge Custom MR 6 20%
Statistics analysis 6 20%
Data mining 17 57%
Reporting 8 27%
05.09.2014 EXTENDING BIGBENCH 6
no reviews yet
Please Login to review.