This text is based on the publication "Efficient Window Aggregation with General Stream Slicing." by Traub et al. (EDBT 2019).
Cite PublicationScotty uses the window aggregation technique stream slicing. The idea of stream slicing is to divide the stream into non-overlapping subsets of data, so-called slices. Every time a window starts or ends, a new slice is started. Consequently, the number of slices depends on the workload and not on the amount of input data.
For each slice, the system stores a partial aggregate of the tuple values. When a tuple arrives, it is assigned to exactly one slice. In an incremental aggregation step, the partial aggregate is incremententally updated with each value of a newly arrived tuple. If it is possible, slicing techniques only store partial aggregates instead of all the tuples resulting in a small memory footprint. The window aggregation result is computed in a final aggregation step by combining the partial aggregates of the slices which leads to a low latency. Partial aggregates are shared among the overlapping windows, concurrent queries, and multiple users (i.e., aggregate sharing). This prevents redundant computations for overlapping windows as shown in Figure 1.
Window aggregations can have different workload characteristics that impact the performance of stream processing systems.
in-order streams, out-of-order streams
associative, invertible, commutative, distributive, algebraic, holistic
context free, forward-context free, forward-context aware
time-based, count-based, arbitrary
Scotty integrates general stream slicing (see Traub et al. EDBT 2019). This window aggregation technique supports the different workload characteristics of aggregation queries to ensure high aggregation performance while being generally applicable. The decision trees illustrate the decisions that Scotty takes to adapt to the current workload.
Based on the given characteristics, general stream slicing decides whether individual tuples are kept in memory or dropped after computing partial aggregates.
There are three different fundamental operations that can be performed on slices: merging, splitting, and updating. Two slices can be merged to form one slice. One slice can be split into two seperate slices. This is an expensive operation because it requires recomputing the partial aggregates in both slices aggregates from scratch. To enable this recomputation, Scotty needs to store the individual tuples in memory. Updating includes adding in-order tuples, adding out-of-order tuples, removing tuples, or changing metadata (e.g., the start timestamp or end timestamp of the slice).
Scotty has the following components (Figure 5).
The Stream Slicer creates new slices when tuples arrive. To store the slices, it accesses the Aggregate Store, Scotty's shared data structure.
The task of this component is to trigger all split, merge, and update operations. It checks whether these operations are required.
The Window Manager is reponsible for the computation of the final aggregation results for windows using the partial aggregates of the slices.
Scotty can be integrated as a window operator winthin the API of different stream processing systems. For this, a connector class has to be implemented that instantiates Scotty and implements (or extends) the operator class of the repective system. Scotty already provides different connectors to several open-source systems.
This example can be found in the Demo folder of our open-source repository.
DataStream> stream = ...;
KeyedScottyWindowOperator, Tuple2> processingFunction =
new KeyedScottyWindowOperator<>(new SumWindowFunction());
processingFunction
.addWindow(new TumblingWindow(WindowMeasure.Time, 2000))
.addWindow(new SlidingWindow(WindowMeasure.Time, 5000,1000));
stream
.keyBy(0)
.process(processingFunction)
.print();
This article explains how session windows are processed with stream slicing for out-of-order streams.
Continue readingTraub, J., Grulich, P. M., Cuéllar, A. R., Breß, S., Katsifodimos, A., Rabl, T., & Markl, V. (2019, March). Efficient Window Aggregation with General Stream Slicing. In EDBT (Vol. 19, pp. 97-108).
@inproceedings{traub2019efficient,
title = {Efficient Window Aggregation with General Stream Slicing.},
author = {Traub, Jonas and Grulich, Philipp M and Cu{\'e}llar, Alejandro Rodr{\'\i}guez and Bre{\ss}, Sebastian and
Katsifodimos, Asterios and Rabl, Tilmann and Markl, Volker},
booktitle = {EDBT},
volume = {19},
pages = {97--108},
year = {2019}
}