Example: linked lists
Suppose you have a list of elements
and a sequence of sublists
of
, where each sublist consists of a subset of the elements in
, in the same order they appear in
. For example, let
= ABCD, and let these be the sublists:
s_1: AIn this case, each sublist differs from the sublist before it by at most one element. We can represent these sublists as linked lists, with head 0 and tail
s_2: AB
s_3: AB D
s_4: B D
s_5: BCD

In the above diagram, there is an arrow from an element
to another element
with a number i whenever next(
) is changed to
in the
th sublist. You can see that, for instance, to reconstruct s_3 you would start at 0 and trace through the diagram following the arrows with the largest numbers less than or equal to 3, following 0, then A, then B, then D, then
.
The basic idea here is that you have a sequence of data structures each consisting of nodes and pointers to nodes. Each data structure differs from the one before it in the sequence by the alteration of a small number of pointers. Since each data structure is only slightly different from the one before it, I'll call the sequence of data structures ``revisions'' of a data structure (for example, reflecting how multiple insertions and removals revise a binary tree over time). The objective is to access any node of any revision quickly, and also to store all the revisions using a small amount of space.
Associate with each possible outgoing pointer from a node a sequence of pointers
, each of which represents a revision of that pointer at a time
. For instance, in a singly linked list each node A has a ``next'' pointer, next(A). If next(A) = B at time 0, next(A) is changed to C at time 1.5, and next(A) is changed to D at time 2, then we would associate with next(A) the sequence
. The sequence associated with each possible pointer is maintained in increasing order of time. Now, given a time
, we can determine the value of the pointer at time
by searching in the sequence for the
such that
is maximized but
. This represents the latest revision of the pointer at time
and it can be found by a binary search in time
, where
is the number of revisions that have been performed on that pointer. In our previous example, if
, the latest revision would be
, so we know that next(A) = C at time 1.8. Additionally, we can update the pointer to
at a new time
by inserting
into the sequence, which can also be done in time
if we represent the sequence by a red-black tree.
Therefore any operation on the data structure at any time
can be performed, simply by following the latest revision of each pointer at time
. Suppose that an operation X on the data structure accesses or updates
pointers on the data structure, and that over the entire history of the data structure before this operation, the total number of times any pointer has been revised is
. Then if X would have been
without revision control, it is
with revision control. Since
without revision control, this is not too bad a slowdown, with a worst case of
.
The extra space required to store the
revisions is only
, probably a large improvement over storing each revision of the data structure as a separate copy. Also, if revisions are stored as separate copies then every update operation on the data structure must create a completely new copy, probably slowing down the update dramatically.
Application
I started thinking about this in relation to a naive way of finding which rectangle of many on the screen contains a given point. Naively, you could sort all left and right vertical boundaries of rectangles by their x-coordinate. These boundaries divide up the screen into vertical strips, such that within a given strip, all vertical lines pass through the same rectangles. So within each vertical strip, you could sort those rectangles by their y-coordinate. Then to locate which rectangle a given point lies in, first do a binary search of the x-coordinate to locate the vertical strip it lies in, then do another binary search of the y-coordinate to find the rectangle. With
rectangles this takes
time, but it requires
space (and preparation time). Instead, you could consider each vertical strip sorted by y-coordinate as a red-black tree. Each vertical strip differs from the one to its left by only the addition or removal of a small number of rectangles, so consider the strips as revisions of a single red-black tree (where ``time'' is the x-coordinate). Each insertion or removal of a rectangle involves the revision of
pointers, and there are
insertions and removals total, so
and
. So locating a point becomes
, a minor slowdown, and revision control dramatically reduces the space requirement to
.
© Bart Parkis
Last modified:Tue May 19 19:07:52 2009