Scotty

This article explains how session windows are processed with stream slicing for out-of-order streams.

Outline

Session Windows
Aggregate Sharing for Session Windows
Processing out-of-order Tuples

Reference

This text is based on the publication "Scotty: Efficient window aggregation for out-of-order stream processing" by Traub et al. (ICDE 2018).

Cite Publication

Session Windows

Session windows split the stream into sessions that represent periods of activity. Periods of activity are periods of time where tuples arrive. A session ends, if no tuples arrive in a predefined period of the time referred to as the gap. The timestamp of the last tuple that arrived before the gap then represents the end timestamp of the session. The next session begins as soon as the next tuple arrives after the gap. The start timestamp of a session is therefore the timestamp of its first tuple. As a result, tuples need to be processed to derive sessions which makes the session window a data-driven window type. Session windows are used, for example, to detect browser sessions or interactions with an ATM.

For tumbling and sliding windows, we can compute the event-times of the window edges based on the size and slide parameters. Since a new slice starts everytime a window starts or ends, we know the event-timetamps of the slices up front. In contrast, the start and end timestamp of a session depends on the timestamps of the arriving tuples. We have to monitor the gaps between the tuples to detect session timeouts. Consequently, stream slicing is more complex for session windows than for tumbling or sliding windows.

Aggregate Sharing for Session Windows

Figure 1 shows an example of session window stream slicing with four session window queries with gaps of 3, 5, 6, and 7 seconds. After the first 9 tuples have arrived, there is a gap of 4 seconds where no tuples arrive. This period of inactivity is bigger than the 3 second gap defined in the first query. As a result, the first session of this query ends after the first 9 tuples. The gap defined in the second query is bigger than the 4 second. Hence, there is no timeout for the first session of the second query. Then, the first session of the second query contains a gap of tuples of 3 seconds. The last query defines a gap of 7 seconds. There is no other gap equal or bigger than 7 seconds before the last one. Consequently, one long session is created that includes all tuples shown in this example.

This example also shows the slices that we can derive from the start and end timestamps of the session windows. Longer sessions, such as the first session from query 2, are a combination of the two smaller sessions above. As a result, the partial aggregates of their slices can together form the aggregation result of the longer session. Similarly, the final aggregate of the long session of query 4 can be computed by combining the partial aggregates of all the slices below.

From this example, we dervive the following observations:

The slicing logic solely depends on one session window: the one with the smallest gap. All session windows with larger gaps are compositions of the slices made for the session window with the smallest minimum gap
Sessions of a single query have no overlap. Thus, a single session window query cannot benefit from aggregate sharing
Multiple session window queries with different gaps benefit from aggregate sharing.
Slices can cover the gaps between sessions because gaps do not cover any tuples by definition. Respectively, a slice which covers a session and a gap is logically equivalent to a slice which covers the session only.
Session windows would also share aggregates with concurrent sliding and tumbling windows.

Processing out-of-order Tuples

An out-of-order tuple may belong to an existing session, fuses sessions, or creates a new session (Figure 2). In the first case, the out-of-order tuple belongs to an existing session. Three different insertions can take place. In case 1.1, the tuple is inserted into the session window. If the tuple has a timestamp within the gap of the session, we have to extend the session as shown in case 1.2. Since the slice of the session also includes the gap, we do not have to change the start or end of the slice. In both cases, we can simply insert the tuple into the slice that represents the session. In contrast, we have to change the slice, when a tuple extends a session at the start. This is shown in case 1.3, where the tuple has a timestamp before the start of the session, but after the gap of the session before. This requires changing the start of the slice before the tuple can be inserted into the slice. An out-of-order tuple can also fuse two sessions. As illustrated in case 2, the tuple extends the session end. As a consequence, the gap between the session shrinks below the minimum defined gap. This means that the session does not end, but includes the next session. Here, the slices of the respective sessions have to be combined to one slice. Case 3 shows how an out-of-order tuple creates a new session. We have to create a new slice for the new session. This requires to split the slice where the new session begins. This is possible without moving tuples to the new slice, because the gap contains no tuples.

Figure 2: Out-of-order Processing of a session window.

Stream Slicing for Session Windows

This article explains how session windows are processed with stream slicing for out-of-order streams.

Outline

Reference

Session Windows

Aggregate Sharing for Session Windows

Processing out-of-order Tuples

Scotty: Efficient Window Aggregation for out-of-order Stream Processing

APA:

BibTeX: