Log Decomposition · ismail fateen

Motivation

Many classical data structures require elements to be inserted in a specific monotonic order to function correctly. This is fine when input arrives sorted - but what if it doesn't?

This post presents a general technique for converting such data structure into one that handles arbitrary insertion order. I was introduced to this idea by my teammate - Abdelmaged Nour - and couldn't find it written up anywhere, so here it is.

Prerequisites

No prerequisites are needed beyond basic complexity analysis. The example problem used to validate the technique uses the Convex Hull Trick. If you're unfamiliar with it, see this blog first.

The Problem

Suppose you have a data structure with $N$ elements that supports:

Insertion of an element in $O(f(N))$ time
Query in $O(g(N))$ time

with the constraint that elements must be inserted in monotonically increasing order of some comparison function.

Can we support arbitrary insertion order while preserving reasonable complexity?

We can! We can insert $N$ elements in amortized $\mathcal{O}(f(N) \log N)$ total time and answer each query in $\mathcal{O}(g(N) \log N)$ .

The Solution

The idea is reminiscent of how a binary counter works. Maintain an array $\text{blk}$ of size $\lceil\log N\rceil$ , where $\text{blk}_i$ holds either an empty slot, or an instance of our data structure of size $2^i$ , sorted in the required order.

To insert a new element:

Wrap it a single-element instance of our data structure
Starting from index $0$ , if $\text{blk}_i$ is non-empty, merge it with the current data structure using merge sort and clear $\text{blk}_i$ . Then, increment $i$ .
Repeat until you find an empty slot $\text{blk}_i$ , and place the final merged instance there; it should be of size $2^i$

DS merge(DS& a, DS& b) {
    // Merge two sorted structures into one sorted structure.
    // Implementation depends on the data structure.
}
 
void insert(T element) {
    DS current;
    current.insert(element);
    
    int i = 0;
    while (!blk[i].empty()) {
        current = merge(current, blk[i]);
        blk[i].clear();
        i++;   
    }
    blk[i] = current;
}

Notice that after every merge, the size of the underlying data structure doubles; each element 'participates' in a merge at most $\mathcal{O}(\log N)$ times. Since we insert $N$ elements each taking $\mathcal{O}(f(N))$ time, the total time is amortized $\mathcal{O}(f(N)N\log N)$ .

To answer a query, just combine the answer from all the non-empty structures.

T query(U input) {
    T ans = IDENTITY;
    for (auto& ds : blk) {
        if (ds.empty()) continue;
        ans = combine(ans, ds.query(input));
    }
    return ans;
}

Here, combine merges two candidate answers (e.g. takes the minimum, maximum, or performs some other kind of idempotent merge). Since there are $\mathcal{O}(\log N)$ blocks and each query costs $\mathcal{O}(g(N))$ , the total query time is $\mathcal{O}(g(N) \log N)$ .

Practice Problems

As an exercise, try applying this technique to solve this problem.

There's also this problem - check out this blog for a walkthrough.

I originally posted this blog on Codeforces