Temporal Data Structures
Data structures which make time-travel Lijie Chen
Tsinghua Institute for Interdisciplinary Information Sciences
What is temporal data strucuture
s
Data structures which make time-travel persistent data structures retroactive data structures
Pointer Machine
An abstract computational machine model. In this model we think of data structures as collections of nodes of a bounded size with entries for data. Each piece of data in the node can be either actual data, or a pointer to a node. Just like struct in C++, record in Pascal, we can store actual data or pointer in a node.
How to work in pointer machine
Unlike RAM model, you can’t access variable directly, but only though series of pointer. A virtual root has pointer to some useful thing. Access in pointer Machine: We have a accessed set node set S , each time we can pick u ∈ S and access v if u has a pointer of v , and add v into set S . Modi?cation in pointer Machine: During access, we can modify some attribute of node in accessed set S .
How to work in pointer machine
So the primitive operation is:
Why pointer machine?
Everything is based on pointer, persistent is very easy. We can stimulate other model(like RAM) with some overhead.
Segment tree in Pointer Machine
Pointer Machine
Many data structures can be described in point machine. But variable size data structures like array can’t. But we can using a binary tree to implement array, with a overhead of O (log n). Later we will talk about fully persistent array in O (log log n). Which is also the lower bound. The everything we will talk about below will using pointer machine paradigm by default.
Persistent data structures
Partial Persistence Full Persistence Con?uent Persistence Functional Persistence Speci?c persistent data structures like segment trees, binary search trees are well studied in OI. The general method to make every data structures persistent. If the data structure has bounded in-degree which is O (1), then it can be made partial persistent in worst case O (1). And can be made fully persistent in amortized time O (1). To better illustrate the category of persistence, we can use version graph.
Partial Persistence
What is partial persistence? Modi?cation only occur now, query about the past.
Full Persistence
What is Full persistence? Modi?cation about the past, query about the past.
Con?uent Persistence
What is Con?uent Persistence? In addition to full persistence, we can even combine two previous version into one.
Functional Persistence
What is Functional Persistence? We restrict the way of implementing persistent data structure. We can not modify, we can only create new node if we need. Functional way.
The general method to make data structures partial persistent in a pointer machine
Every data structures in pointer machine, its function can be recognized as a sequence of write and read. So In partial persistent data structures, we have: read(node u,version v,variable i): read the variable i in the version v of node u. write(node u,variable i,value var): write value var to the variable i of node u. When the in-degree is bounded by O (1), it can be done in worst case O (1).
The general method to make data strucutres fully persistent in a pointer machine
Every data structures in pointer machine, its function can be recognized as a sequence of write and read. So In fully persistent data structures, we have: read(node u,version v,variable i): read the variable i in the version v of node u. write(node u,version v,variable i,value var): write value var to the variable i in the version v of node u. which return a new version. When the in-degree is bounded by O (1), it can be done in amortized case O (1).
Stimulate the RAM model
As RAM can be seen as a big array. If we can stimulate the fully persistent array in O (g (n)) time, then every data structure take O (f (n)) time can be made full persistent in O (g (n)f (n)) time. Actually fully persistent array can be solved int O (log log n) time. which is also a lower bound.
van Emde Boas Tree
van Emde Boas Tree is a data structure which support fast predecessor query of integer. Suppose we deal with integer in [0, u ). * insert: log log n * predecessor(x), successor(x): log log n
van Emde Boas Tree
Denote V (u ) to be the data structure which integer in [0, u ), And T (u ) to be the time complexity fro V (u ). √ We can show T (u ) ≤ T ( u ) + O (1) √ First we divide [0, u ) into u block, let V (u ).block [i ] denote the i ? th block. And we keep the minimum value as V (u ).min, and add others value into the blocks and summary(not the minimum). Also we record V (u ).max . Also let V (u ).summary to store the index of every non-empty block. Then we can see that block and summary can be seen as instance √ of V ( u ). We can use hash to get the block with index.
van Emde Boas Tree:Insertion
Let insertion time to be T (u ). √ Suppose we insert x , let x = a u + b . If V (u ) is empty, then we let V (u ).min = x , and we are done. Otherwise we check x with V (u ).min to maintain the minimum value, and insert b (or the old minimum’s b ) to V (u ).block [a] and a to V (u ).summary . √ We can see T (u ) = T ( u ) + O (1) So T (u ) = O (log log n)
van Emde Boas Tree:Query
√ Suppose we Query x ’s predecessor at V (u ), let x = a u + b . If V (u ).block [a] exist, we check whether x ’s predecessor is in it using its min and max. If so, we query in V(u).block[a]. Otherwise, we query in V (u ).summary to get the predecessor block of a, and return its max. We can see that the time complexity is O (log log n) again.
Fully persistent array
We maintain the version tree. Maintain the ET of it. For each array index i , we denote Li as the nodes in version tree where A[i ] has been changed. We need to ?nd the lowest ancestor of u which in Li . If we order Li by their position in ET, it is exactly the predecessor problem.
Labels
But we need integer label in VEB tree solution. And as we can insert a label in ET sequence, things get complicated. Now we need a data structure, which support: * insert(x ,t ) insert a element t right after element x . * each element has a integer label which preserve their order.
weight BST
We know that weight BST can give each element a label of O (log n) bits which preserve their order and support insert in O (log n) time.
Labels
For better performance, we partition the sequence in many blocks, each blocks have size O (log n). For each block, we use weight BST to give the whole block a label of O (log n) bits using this block’s minimum element. Inside a block, we use brute force to give each element in a block a label of O (log n) bits. When a block have size > 2 log n, we split it in half.
Labels
Seems good, but if our operation change a block’s minimum element. we need to relabel O (log n) elements. which need O (log n log log n) time, which is even worse.
Bucketing
When ?nding predecessor, we partition the sequence in blocks of size O (log2 ). Inside a block, we use a BST to ?nd the predecessor, which need O (log log n) time. And we use a VEB tree to maintain each block’s order using each block’s minimum element’s label. Then we only need the label of each block’s header. There is O (n/ log2 n) such headers, as one insert might cause O (log n) element’s label to be changed, we might expect O (log?1 n) header’s label change on average.
Weighted Label Maintainance
But as the header may be nonuniform. We can changed the previous labeling method to be based on header weight(A header has weight 1, other has weight 0). Then we can have a O (log log n) solution for fully persistent array. This means, we can make EVERY normal data structure with worst case complexity O (f (n)) fully persistent in O (f (n) log log n) time.
Open Questions
De-amortization of full persistence Is there a matching lower bound for both full and partial persistence?
Functional Tries
Why Functional Tries is important? We can see ?le system as a trie, which leaf node represent ?le, internal node represent folder, edge name represent ?le name or folder name. Then Functional Tries can be used to represent version of the ?le system.
Functional Tries:Naive approach
As it is a tree, we can simply use the path-copying method. For a tree which has tree depth d , the running time is O (d ). What if the tree is very deep?
Functional Tries:Motivation
In real work, we often do some work in one folder, then switch to its parent folder, and do some other work, and switch back, and do some other work. If we employ the path-copying and we the tree has huge deep, it could be very slow.
Functional Tries:The Concept of Finger
To abstract, we can see in previous example, where we do the job don’t vary much. We can say that we have some ”?nger”, each ?nger point to some node in the Trie. And we can only do modi?cation at where the ?nger stand. At the previous example, we just maintain one ?nger, and each time move this ?nger to neighbouring node.
Functional Tries:The Concept of Finger
Using this concept, we can do much faster. Let the maximum degree of the node to be ?. And there is n nodes. And suppose we only have O (1) amount of ?nger. Then we support functional trie which ?nger move take O (log ?) time, and modi?cation take O (log ?) time. If we just want fully persistent, then we can even have both O (log log ?) time.
Functional Tries
Suppose there is k ?ngers, denote them by F . Get the Steiner tree of those ?ngers, which will have at most 2k vertices after compressing every 2-degree node. denote those node by PF . We maintain the compressed tree represented by PF . For each edge on it, it is actually a path in the original trie. We denoted such a path as a tendon. For each tendon, we represent it by a functional dequeue, the elements in this dequeue is the vertices on that path. For each vertex, we maintain a functional BST of depth O (log ?), which represent the other adjacent vertices of it. We use a functional BST to store every ?nger by their node-id. For each ?nger, we use a functional BST of depth O (1) to store the adjacent tendon, and a functional BST of depth O (log ?) to store other neighbor.
Functional Tries: Movement
How to perform operation on this? Discussion.
Functional Dequeue
How to make functional dequeue? Discussion.
Functional Dequeue
Let dequeue D = P + M + S . P , S are two bu?er which have length at most 3. M is again a dequeue itself. We may change D ’s representation tripe(for example take the last element o P and put it in front of M ). But won’t change its content. PushLeft,popLeft works trivial if P is not full and empty. Otherwise recursive do it in M . Potential analysis show it is amortized O (1).
Functional Link-Cut Tree
The worst case O (log n) algorithm described in the original paper is good for functional. Why the amortized one doesn’t work?
Refresh of that method
Decompose the trie into a set of heavy paths, and represent each heavy path by a globally biased binary tree. Tied together into one big tree which we call the representation tree. An edge is heavy if more than half of the descendants of the parent are also descendants of the child. A heavy path is a contiguous sequence of heavy edges. Because any node has at most one heavy child, we can decompose the trie into a set of heavy paths, connected by light (non-heavy) edges.
Refresh of that method
The weight wv of a node v is 1 plus the number of descendants of v through a light child of v . Global biased binary tree will make the total wv of both children balanced. So the representation tree is of O (log n) depth.
Why path-coping does not work
For the expose operation to work, for each heavy-path’s head, we should store its parent. But it is bad for path copying. Why?
Finger
Again, we introduce ?nger here too. For each relevant vertex, we store a ?nger, which store all its ancestor in the represent tree. For each vertex, we make an auxiliary globally biased tree of its light children. We can use catenable functional dequeue of dequeue to store each ?nger. It can support the parent operation.
Retroactive Data Structures
What is retroactive data structures? We can query about the current version of the data structures, while we can insert or delete the modi?cations in the past! Partial retroactivity Full retroactivity Nonoblivious retroactivity
Partial retroactivity
Query about the current version of the data structures, while we can insert or delete the modi?cations in the past. Insert(t ,modi?cation): insert a modi?cation(which is an original modi?cation) at time t . Delete(t ,modi?cation): Delete the modi?cation at time t . query: query the current version.
Fully retroactivity
Query about the every version(current or the past) of the data structures, while we can insert or delete the modi?cations in the past. Insert(t ,modi?cation): insert a modi?cation(which is an original modi?cation) at time t . Delete(t ,modi?cation): Delete the modi?cation at time t . Query(t ,query) : query the version at time t .
Nonoblivious retroactivity
Now we even put the query into the sequences. For each change we made, we want to know the ?rst a?ected query.
A example of priority queue
Initially it is empty. insert 1 insert 2 delete-min insert 3 : current one is [2,3] if we change the insert 2 into insert 0 insert 1 insert 0 delete-min insert 3 : current one is [1,3]
How to achive this?
The bad news is that it is indeed impossible to change every data structures into their retroactive version. Suppose we have a data structures, which hold two variables X , Y which are initial 0. And it support the following operation. X =a Y =Y +a Y =X ·Y query the value of Y We make the following modi?cation sequence: X = x Y = Y + an Y = X · Y Y = Y + an?1 · · · Y = Y + a0 . i Then we know the answer is n i =0 ai x . If we change the ?rst operation. Then we need to evaluate this polynomial for another x .
How to achive this?
In history-independent algebraic decision tree, for any ?eld, independent of pre-processing of the coe?cients, need ?(n) ?eld operations (result from 2001), where n is the degree of the polynomial. So it takes at least ?(n) time. We can’t do better than brute force.
How to achive this?
But good news is that we can make many data structures retroactive. queue: O (1) partial O (log n) full deque: O (log n) full decomposable search problem: priority queue: O (log n) partial
Commutative and Inversion operation
commutative: a, b ? b , a inversion: ?a, ? a?1 s .t . a, a?1 ? do nothing Then Insert (t , modi?cation) ? Insert (now , modi?cation) Then Delete (t , modi?cation) ? Modify (now , modi?cation?1 )
Queue
Operation: enqueue(x): put x at the end of the queue dequeue(): remove the ?rst element Query: front(): return the ?rst element of the queue back(): return the last element of the queue Time complexity: partial: O (1) full: O (log n)
Queue:Partial
How to make queue partial retroactive? Discussion.
Queue:Partial
Quite trivial, we can use a double linked list to maintain every enqueue operation. Then we know that the current version is a continuous sub-sequence of it. Maintain two pointer: F , B point to the front and the back of the queue. Update is trivial.
Queue:Full
How to make queue fully retroactive? Discussion.
Queue:Full
Due to the property of queue. It is still not hard. We maintain two balanced binary tree Te , Td , store the enqueue and dequeue operation ordered by time respectively. Query(t ,front): ?rst count the number of dequeue operation at time ≤ t . Denote it by d . Then return the d + 1-th elements in Te . Query(t ,back): return the last element in Te which inserted time ≤ t . O (log n)
Deque:Fully
How to make queue fully retroactive? Discussion.
Deque
Operation: pushL(x),pushR(x): put x at the left(right) of the queue popL(),popR(): remove the ?rst(last) element Query: front(): return the ?rst element of the queue back(): return the last element of the queue Time complexity: full: O (log n)
Deque:Full
In a array-based deque implementation, we use two index L, R , and A[L...R ] of the array is our deque. initially: L=1,R=0 pushL(x): L=L-1,A[L]=x popL: L=L+1 pushR(x): R=R+1,A[R]=x popR: R=R-1 Use a BST store pushL,popL by their time, and encode them as -1 and +1. Another one for pushR,popR in the same way. Then we can know the L, R at the query time t .
Deque:Full
Then we can know the L, R at the query time t . front: we want to know the array value A[L] at time t . Which is the last pushL which write to it. Find the last pushL at time < t whose pre?x sum is L. Can be done by simple BST search(store the minimum and maximum pre?x sum in a node).
Decomposable Searching Problems
Search Problems: insert(x),delete(x) : S = S ∪ {x }, S = S \ {x } query(x,S): only a?ect by the elements of S . Search Problems are commutative and have inversion operation. So they are naturally partial retroactive. A query is decomposable:Q(x ,A ∪ B ) = Q(x ,A) ? Q(x ,B )
Decomposable Searching Problems:Full
How to make decomposable searching problems fully retroactive? Discussion.
Decomposable Searching Problems:Full
We can maintain a timeline, for each element x , we know his existence interval are [Lx , Rx ]. Then we use a segment tree, each node maintain a original structure. Split [Lx , Rx ] into O (log n) nodes in the segment tree, and insert x into those nodes’s original structure. Query at time t can be done by querying every t ’s ancestor in the segment tree.
Priority queue
How to make priority queue partial retroactive? Discussion.
Priority queue
Non-trivial because add an insert in the past can have many e?ect on succeeding operations. We can instead using a geometry view.
Geometry view: Insertion
Geometry view: Insertion and Deletion
Geometry view: Insertion and Deletion
Geometry view: Cascading e?ect of past operation
How to handle this?
When we add an insert in the past, there will be only one elements insert into the current version of the priority queue. When we add a delete-min in the past, there will be only one elements removed from the current version of the priority queue. Just ?nd them is ok.
Insert in the past
When we insert element x at the time t , which will happen? it just remain, and x is inserted into the current priority queue. it got removed sometimes later, so someone got free, and goes on, make some other guys ?nally entered the current priority queue. We can see the element which are inserted into the current priority queue is max(x ,max{x |x are deleted after time t}). Taking look at the cascading line(red line), if the largest x are still deleted, then it lie above the red line, it means that when it got deleted, there is something below it, which is a contradiction(because it is delete-min).
Insert in the past
So we want to know what is max{x |x are deleted after time t}. It is still hard to approach, because the deleted time will change a lot as an e?ect of modi?cation. So can we change deleted time into insertion time? which will not change and the problem get easier.
Bridges
We call time t a bridge if Qt ? Qnow .(Qi means the priority queue at the time i ). So bridge means that the element remain at time t will continue its way to the end.
Geometry view: Bridges
Insert in the past
Suppose time t is the preceding bridges of time t . Then we have: max{x |x are deleted after time t } = max{x |x ∈ Qnow and are inserted after time t } Let A ={x |x are deleted after time t },B ={x |x ∈ Qnow and are inserted after time t }. Proof: ? if x are deleted after time t , then x ∈ Qnow , so x ∈ Qt , so x is inserted after time t . So A ? B , max(A) ≤ max(B ).
Insert in the past
Suppose time t is the preceding bridges of time t . Then we have: max{x |x are deleted after time t } = max{x |x ∈ Qnow and are inserted after time t } Let A ={x |x are deleted after time t },B ={x |x ∈ Qnow and are inserted after time t }. Proof: ? if max(B ) > max(A), let k = max(B ). Then k ∈ A, so t < dk < t ,dk is the deleted time of k . We should know that dk is not a bridges since there is no bridges between t and t . So there is an element k , which inserted before dk and deleted after dk . k ∈ Qt because it got deleted, so the inserted time of k > t , so k ∈ B and k > k , which contradict the fact that k is the largest.
delete-min in the past
When we insert a delete-min, who will be ?nally deleted? We delete a1 now, but it is supposed to be deleted at time da1 , so at that time we will delete a2 instead of a1 ,but it is supposed to be deleted at time da2 , so at that time we will delete a3 instead of a2 . Move on and we ?nally ?nd ak such that it will not be deleted. So we got a1 < a2 < a3 < · · · < ak , and ak ∈ Qnow . First we should notice that at time dak ?1 , everything < ak is deleted, and ak remain to Qnow , so everything > ak will too, so the time dak ?1 is a bridge. Also we can notice that from every time between t and dak ?1 will not be a bridge because from the chain something got deleted. So dak ?1 is the succeeding bridge of time t . and ak is the smallest element on that bridge.
delete-min in the past
To summarize, when we insert a delete-min at time t . We ?rst ?nd the next bridge after t , and delete the smallest element on it.
Other modi?cation
Delete a delete-min: as same as insert the deleted elements back Delete an insert: insert a delete-min right before it got deleted
The Algorithm
What we want is the following: Query at the current priority queue(?nd-min). Query max{x |x ∈ Qnow and are inserted after time t }. Find the preceding or succeeding bridges of time t .
Query at the current priority queue(?nd-min)
This is easy, we can use a BST to store the elements of Qnow . Everytime we need to insert something in Qnow or delete something from Qnow , we simply do it in this BST. ?nd-min is trivial.
Query max{x |x ∈ Qnow and are inserted after time t }
We again use a BST to store every insert which ∈ Qnow by time. Then it is simply a su?x maximum value query.
Find the preceding or succeeding bridges of time t .
We use a BST of every operation ordered by time and encode them as following: insert x (x ∈ Qnow ): +1 delete-min: -1 insert x (x ∈ Qnow ): 0 Then we can see that time t is a bridge if every thing before time t in this BST summed to 0. Then we need to ?nd a t preceding or succeeding t which has pre?x sum 0. It is a trivial BST operation.
The Algorithm: Complexity
We can see that every operation can be done by constant operations on those BSTs. So the overall complexity is O (log n). Which is as same as the original complexity in a sense.
Priority Queue: Full
How to make it fully retroactive? Discussion.
Priority Queue: Full
We will introduce the general methods to make every partial retroactive data structures fully retroactive. Supposed this data structure has bounded in-degree, then we can make it persistent with no extra time cost. √ Suppose there is O (m) operations. We split them into O ( m) √ blocks. Each block has size O ( m). Then we update the operation one by one, and keep the persistent version of it at each block’s starting point. For each query at the previous time t , we ?rst get the version at the t ’s block’s starting time. And do the updates between it and t . Then we get the version at time t . When we insert operation at time t , we insert it into the corresponding block, and do the retroactive update on those block’s starting point after t . Rebuild the whole data structure when some block’s size is larger √ tphen 2 m.
Priority Queue: Full
The retroactive priority queue consist of some BST which has bounded in-degree. So we can make it fully retroactive, and the time complexity is √ O ( m log n).
Open Questions
Can we do better in fully retroactive priority queue?