Motivation
Many classical data structures require elements to be inserted in a specific monotonic order to function correctly. This is fine when input arrives sorted - but what if it doesn't?
This post presents a general technique for converting such data structure into one that handles arbitrary insertion order. I was introduced to this idea by my teammate - Abdelmaged Nour - and couldn't find it written up anywhere, so here it is.
Prerequisites
No prerequisites are needed beyond basic complexity analysis. The example problem used to validate the technique uses the Convex Hull Trick. If you're unfamiliar with it, see this blog first.
The Problem
Suppose you have a data structure with elements that supports:
- Insertion of an element in time
- Query in time
with the constraint that elements must be inserted in monotonically increasing order of some comparison function.
Can we support arbitrary insertion order while preserving reasonable complexity?
We can! We can insert elements in amortized total time and answer each query in .
The Solution
The idea is reminiscent of how a binary counter works. Maintain an array of size , where holds either an empty slot, or an instance of our data structure of size , sorted in the required order.
To insert a new element:
- Wrap it a single-element instance of our data structure
- Starting from index , if is non-empty, merge it with the current data structure using merge sort and clear . Then, increment .
- Repeat until you find an empty slot , and place the final merged instance there; it should be of size
DS merge(DS& a, DS& b) {
// Merge two sorted structures into one sorted structure.
// Implementation depends on the data structure.
}
void insert(T element) {
DS current;
current.insert(element);
int i = 0;
while (!blk[i].empty()) {
current = merge(current, blk[i]);
blk[i].clear();
i++;
}
blk[i] = current;
}Notice that after every merge, the size of the underlying data structure doubles; each element 'participates' in a merge at most times. Since we insert elements each taking time, the total time is amortized .
To answer a query, just combine the answer from all the non-empty structures.
T query(U input) {
T ans = IDENTITY;
for (auto& ds : blk) {
if (ds.empty()) continue;
ans = combine(ans, ds.query(input));
}
return ans;
}Here, combine merges two candidate answers (e.g. takes the minimum, maximum, or performs some other kind of idempotent merge). Since there are blocks and each query costs , the total query time is .
Practice Problems
As an exercise, try applying this technique to solve this problem.
There's also this problem - check out this blog for a walkthrough.
I originally posted this blog on Codeforces