Succinct Data Structures for Approximating Convex Functions with Applications
Prosenjit Bose1 , Luc Devroye2 , and Pat Morin1
1 2
School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca School of Computer Science, McGill University, Montr? eal, Canada, H3A 2K6, luc@cs.mcgill.ca
Abstract. We study data structures for providing -approximations of convex functions whose slopes are bounded. Since the queries are e?cient in these structures requiring only O(log(1/ε)+log log n) time, we explore di?erent applications of such data structures to e?ciently solve problems in clustering and facility location. Our data structures are succinct using only O((1/ε) log2 (n)) bits of storage. We show that this is optimal by providing a matching lower bound showing that any data structure providing such an -approximation requires at least ? ((1/ε) log2 (n)) bits of storage.
1
Introduction
We consider the problem of approximating convex functions of one variable whose slopes are bounded. We say that a non-negative number y is an εapproximation to a non-negative number x if (1 ? ε)x ≤ y ≤ x 1 . We say that a function g is an ε-approximation to a function f if g (x) is an ε-approximation to f (x) for all x in the domain of f . Let f : R → R+ be a convex function that is non-negative everywhere. In this paper we show that, if the absolute value of the slope of f is bounded above by n, then there exists a piecewise-linear function g that ε approximates f at all points x except where the slope of f is small (less than 1) and that consists of O(logE n) pieces, where E = 1/(1 ? ε). The function g can be computed in O(K logE n) time, where K is the time it takes to evaluate expressions of the form sup{x : f (x) ≤ t} and f is the ?rst derivative of f . Once we have computed the function g , we can store the pieces of g in an array sorted by x values so that we can evaluate g (x) for any query value x in O(log logE n) time. Since we are interested in the joint complexity as a function of ε < 1/2 and n ≥ 10, it is worth noting that logE n = Θ((1/ε) log n) and thus that log logE n = Θ(log(1/ε) + log log n). As an application of these results, we consider functions de?ned by sums of Euclidean distances in d dimensions and show that that they can be approximated using the above results. To achieve this, we use a random rotation
1
This research was partly supported by NSERC. This de?nition is a bit more one-sided that the usual de?nition, which allows any y such that |x ? y | ≤ εx.
J. Akiyama and M. Kano (Eds.): JCDCG 2002, LNCS 2866, pp. 97–107, 2003. c Springer-Verlag Berlin Heidelberg 2003
98
Prosenjit Bose, Luc Devroye, and Pat Morin
technique similar to the method of random projections [6]. We show that the sum of Euclidean distances from a point to a set of n points can be closely approximated by many sums of Manhattan distances from the point to the set. This technique is very simple and of independent interest. The remainder of the paper is organized as follows. Section 2 presents our result on approximating convex functions using few linear pieces. Section 3 discusses how these results can be interpreted in terms of data structures for approximating convex functions. Section 4 gives lower bounds on the space complexity of approximating convex function. Section 5 describes applications of this work to facility location and clustering problems. Finally, Section 6 summarizes and concludes the paper.
2
Approximating Convex Functions
Let h(x) = c + |nx|, for some c, n ≥ 0. Then, it is clear that the function g such that g (x) = c + (1 ? ε)|nx| is an ε-approximation of h. Furthermore, g is an ε-approximation for any function h2 such that g (x) ≤ h2 (x) ≤ h(x) for all x ∈ R. (see Fig. 1). This trivial observation is the basis of our data structure for approximating convex functions. h(x) = |nx| h2 (x) g (x) = (1 ? )|nx| c x
Fig. 1. The function g is an approximation of h and of h2 .
Let f be a non-negative convex function and let f be the ?rst derivative of f . Assume that f (x) is de?ned for all but a ?nite number of values of x and that |f (x)| ≤ n for all x in the domain of f . For convenience, we de?ne the right derivative f ? (x) as follows: If f (x) is de?ned, then f ? (x) = f (x). Otherwise, f ? (x) = limδ→0+ f (x + δ ). Let a be the largest value at which the slope of f is at most ?(1 ? ε)n, i.e., a = max{x : f ? (x) ≤ ?(1 ? ε)n} . (Here, and throughout, we use the convention that max ? = ?∞ and min ? = ∞.) Similarly, let b = min{x : f ? (x) ≥ (1 ? ε)n}. Then, from the above discussion, it is clear that the function ? ? f (a) + (1 ? ε)(x ? a)n if x ≤ a g (x) = f (b) + (1 ? ε)(b ? x)n if x ≥ b (1) ? f (x) otherwise is an ε-approximation of f (see Fig. 2).
Succinct Data Structures for Approximating Convex Functions
99
f (x) g (x)
a
b
x
Fig. 2. The function g is a (1 ? ε) approximation of f .
Equation (1) tells us that we can approximate f by using two linear pieces and then recursively approximating f in the range (a, b). However, in the range (a, b), f ? is in the range (?(1 ? ε)n, (1 ? ε)n). Therefore, if we recurse logE n times, we obtain a function g with O(logE n) = O((1/ε) log n) linear pieces that approximates f at all points except possibly where f ? is less than one. Theorem 1 Let f and f ? be de?ned as above. Then there exists a piecewiselinear function g with O(logE n) pieces that is an ε-approximation to f at all values except where |f ? (x)| ≤ 1.
3
Data Structures
In this section, we consider the consequences of Theorem 1 in terms of data structures for approximating convex functions. By storing the pieces of g in an array sorted by x values, we obtain the following. Theorem 2 Let f and f ? be de?ned as in Section 2. Then there exists a data structure of size O((1/ε) log n) that can compute an ε-approximation to f (x) in O(log(1/ε) + log log n) time for any query value x where |f ? (x)| ≥ 1. Next, we consider a more dynamic model, in which the function f is updated over time. In particular, we consider the following operations that are applied to the initial function f (x) = 0, for all x ∈ R. 1. Query(x): Return an ε-approximation to f (x). 2. Insert(a): Increase the slope of f by 1 in the range (a, ∞), i.e., set f (x) ← f (x) + x ? a for all x ∈ [a, ∞). 3. Delete(x): Decrease the slope of f by 1 in the range (x, ∞). In order to maintain convexity, the number of calls to Delete(x) may not exceed the number of calls to Insert(x) for any value of x. Note that a sequence of Insert and Delete operations can only produce a monotonically increasing function f whose slopes are all integers. This is done
100
Prosenjit Bose, Luc Devroye, and Pat Morin
to simplify the exposition of the data structure. If an application requires that f be allowed to decrease and increase then two data structures can be used and their results summed. The function f has some number m of breakpoints, where the slope of f changes. We store these breakpoints in a balanced search tree T , sorted by xcoordinate. With each breakpoint x, we also maintain the value ?(x) by which the slope of f increases at x. In addition, we link the nodes of T in a doublylinked list, so that the immediate successor and predecessor of a node can be found in constant time. It is clear that T can be maintained in O(log n) time per operation using any balanced search tree data structure. In addition to the search tree T , we also maintain an array A of size O((1/ε) log n) that contains the piecewise linear approximation of f . The ith element in this array contains the value xi such that xi = min{x : f ? (x) ≥ E i }, a pointer to the node in T that contains xi , and the values of f (xi ) and f ? (xi ), i.e., the value of f at xi and slope of f at xi . To update this array during an Insert or Delete operation, we ?rst update the values of f (xi ) and f ? (xi ) for each i. Since there are only O((1/ε) log n) array entries, this can be done in O((1/ε) log n) time. Next, we go through the array again and check which values of xi need to be changed (recall that xi = min{x : f ? (x) ≥ E i }). Note that, since Insert or Delete can only change the value of f ? (x) by 1, if the value of xi changes then it changes only to its successor or predecessor in T . Since the nodes of T are linked in a doubly-linked list, and we store the values of f (xi ) and f ? (xi ) we can detect this and update the value of xi , f (xi ) and f ? (xi ) in constant time. Therefore, over all array entries, this takes O((1/ε) log n) time. To evaluate an approximation to f (x), we do a binary search on A to ?nd the index i such that [xi , xi+1 ) contains x and then output f (xi ) + (x ? xi )f ? (xi ). By the choice of xi , this is a ε-approximation to f (x). We have just proven the following: Theorem 3 There exists a data structure of size O(n) that supports the operations Insert, Delete in O((1/ε) log n) time and Query in O(log(1/ε) + log log n) time, where n is the maximum slope of the function f being maintained.
4
A Lower Bound on Storage
In this section we prove an ? ((1/ε) log2 n) lower bound on the number of bits required by any data structure that provides an ε-approximation for convex functions. The idea behind our proof is to make m = Θ((1/ε) log n) choices from a set of n elements. We then encode these choices in the form of a convex function f whose slopes are in [0, n]. We then show that given a function g that is an ε-approximation to f we can recover the m = Θ((1/ε) log n) choices. Therefore, any data structure that can store an ε-approximation to convex functions whose n slopes lie in [0, n] must be able to encode m di?erent possibilities and must 2 therefore store ? ((1/ε) log n) bits in the worst case.
Succinct Data Structures for Approximating Convex Functions
101
Let x1 , . . . , xn be an increasing sequence where x1 = 0 and each xi , 2 ≤ i ≤ n satis?es xi ? xi?1 xi ? xi?1 > xi?1 + , (2) xi?1 + (1 ? ε) 1 ? 2ε 1 ? 2ε which is always possible since (1 ? )/(1 ? 2ε) > 1. Let p1 , . . . , pm be any increasing sequence of m = logE n integers in the range [1, n], where E = 1/(1 ? 2ε). We construct the function f as follows: 1. For x ∈ [?∞, 0), f (x) = 0. 2. For x ∈ (xpi , xpi+1 ), f (x) has slope 1/(1 ? 2ε)i . 3. For x > pm , f (x) has slope n. The following lemma, illustrated in Fig. 3 allows us to decode the values of p1 , . . . , pm given an ε-approximation to f . f (x) xpj+1 ? xpj (1 ? 2ε)j ?1
(1 ? )f (x)
f (xpj ) +
xpj
Fig. 3. An illustration of Lemma 1.
xpj+1
x
Lemma 1 For the function f de?ned above and for any i such that i = pj for some 1 ≤ j ≤ m, (1 ? ε)f (xpj+1 ) > f (xpj ) + xpj+1 ? xpj . (1 ? 2ε)j ?1
Proof. The lemma follows (with some algebra) from (2). Suppose that g is an ε-approximation to f , i.e, for all x ∈ R, g (x) satis?es (1 ? ε)f (x) ≤ g (x) ≤ f (x). Then Lemma 1 can be used to recover the values of p1 , . . . , pm from g . Suppose, that we have already recovered p1 , . . . , pj and that we now wish to recover pj +1 . Note that, since we have recovered p1 , . . . , pj we can compute the exact value of f (xpj ). We then evaluate g (xpj + 1), g (xpj + 2), and so on until encountering a value k such that g (xpj + k ) > f (xpj ) + xpj + k ? xpj (1 ? 2ε)j
102
Prosenjit Bose, Luc Devroye, and Pat Morin
Lemma 1 then guarantees that pj +1 = pj + k ? 1. In this way, we can reconstruct the entire function f and recover the values of p1 , . . . , pm . Although in the above discussion the slopes used in the construction of f are not always integral it is clear that carefully rounding values appropriately would n yield the same results using only integer valued slopes. Since we can encode m 2 n di?erent choices of p1 , . . . , pm in this manner and log m = ? ((1/ε) log n), we conclude the following: Theorem 4 Any data structure that can represent an ε-approximation to any convex function whose slopes are integers in the range [0, n] must use ? ((1/ε) log2 n) bits of storage in the worst case. Remark 1 Some readers may complain that the function used in our lower bound construction uses linear pieces whose lengths are exponential in n. However, one should take into account that the endpoints of these pieces have xcoordinates that are integral powers of 2 and they can therefore be encoded in O(log n) bits each using, e.g., a ?oating point representation. Remark 2 Another easy consequence of Lemma 1 is that any piecewise linear function that ε-approximates f has ? ((1/ε) log n) pieces.
5
Applications
Next, we consider applications of our approximation technique for convex functions to the problem of approximating sums of distances in d dimensions. Let S be a set of n points in d dimensions. The Fermat-Weber weight of a point q ∈ Rd is FW(p) = pq ,
p∈S
where pq denotes the distance between points p and q . Of course, di?erent definitions of distance (e.g., Euclidean distance, Manhattan distance) yield di?erent Fermat-Weber weights. 5.1 The 1-Dimensional Case
One setting in which distance is certainly well de?ned is in one dimension. In this case, pq = |p ? q | , so the Fermat-Weber weight of x is given by FW(x) = f (x) =
y ∈S
|x ? y | .
Note that the function f is convex (it is the sum of n convex functions) and has slopes bounded below by ?n and above by n, so it can be approximated using
Succinct Data Structures for Approximating Convex Functions
103
the techniques Section 3. Furthermore, adding or removing a point p to/from S decreases the slope of f by 1 in the range (?∞, p) and increases the slope of f by 1 in the range (p, ∞), so the dynamic data structure of the previous section can be used to maintain an ε-approximation of f in O(logE n) = O((1/ε) log n) time per update. Given the set S , constructing the ε-approximation for f can be done in O(n/ε) time by a fairly straightforward algorithm. Using a linear-time selection algorithm, one ?nds the elements of S with ranks εn/2 and (1 ? ε/2)n . These are the values a and b in (1). Once this is done, the remaining problem has size (1 ? ε)n and is solved recursively. Although some care is required to compute the values f (a) and f (b) at each stage, the deatails are not di?cult and are left to the interested reader. Remark 3 A general result of Agarwal and Har-Peled [1] implies that the Fermat-Weber weight of points in one dimension can actually be ε-approximated by a piecewise-linear function with O(1/ε) pieces. However, it is not clear how easily this approach can be made dynamic to handle insertion and deletions of points. 5.2 The Manhattan Case
The Manhattan distance between two points p and q in Rd is
d
pq
1
=
i=1
|pi ? qi | ,
where pi denotes the ith coordinate of point p. We simply observe that Manhattan distance is the sum of d one-dimensional distances, so the Fermat-Weber weight under the Manhattan distance can be approximated using d onedimensional data structures. 5.3 The Euclidean Case
The Euclidean distance between two points p and q in Rd is
d 1/2
pq
2
=
i=1
(pi ? qi )2
.
A general technique used to approximate Euclidean distance is to use a polyhedral distance function, in which the unit sphere is replaced with a polyhedron that closely resembles a sphere. For example, the Manhattan distance function is a polyhedral distance function in which the unit sphere is replaced with a unit hypercube. Although this technique works well when d is small, such metrics generally require a polyhedron with a number of vertices that is exponential in d, making them poorly suited for high dimensional applications.
104
Prosenjit Bose, Luc Devroye, and Pat Morin
Another technique, that works well when d is very large (greater than log n), and for most distance functions, is that of random projections [6]. Here, a random O(log n)-?at is chosen and the points of S are projected orthogonally onto this ?at. With high probability, all interpoint distances are faithfully preserved after the projection, so the problem is reduced to one in which the dimension of the point set is O(log n). The di?culty with this approach, when using Euclidean distances, is that sums of Euclidean distances are di?cult to deal with even when d = 2 [2], thus the reduction in dimension does not help signi?cantly. Here we present a third approach that combines both of these techniques and adds two new twists: (1) we obtain a polyhedral metric as the sum of several Manhattan metrics and (2) our polyhedron is random. The ?rst twist allows us to apply approximate data structures for one-dimensional convex functions while the second allows us to achieve approximation guarantees using an a number of vertices that increases only linearly with d. Let f (p) denote the Fermat-Weber weight of p under the Euclidean distance function. Choose k independent random orientations of the coordinate axes. Let fi (p) denote the Fermat-Weber weight of p under the Manhattan distance function after rotating the axes according to the √ ith random orientation. Then fi (p) may take on any value in the range [f (p), df (p)]. In particular, fi (p) has an expected value E[fi (p)] = cd f (p) , where cd is a constant, dependent only on d, whose value is derived in Appendix A. Consider the function k 1 g (p) = × fi (p) kcd i=1 that approximates the Fermat-Weber weight under Euclidean distance. Lemma 2 Pr{|g (p) ? f (p)| ≥ εf (p)} = exp(?? (ε2 k )) Proof. The value of g (p) is a random variable whose expected value is f (p) and it is the √sum of k independent random variables, all of which are in the range [f (p), df (p)]. Applying Hoe?ding’s inequality [4] immediately yields the desired result. In summary, g (p) is an ε-approximation of f (p) with probability 1 ? e?? (ε k) . Furthermore, g (p) is the sum of k Fermat-Weber weights of points under the Manhattan distance function. Each of these Manhattan distance functions is itself a sum of d Fermat-Weber weights in 1 dimension. These 1-dimensional Fermat-Weber weights can be approximated using the results of Section 3 or the results of Agarwal and Har-Peled [1]. 5.4 Clustering and Facility Location
2
Bose et al [3] describe data structures for approximating sums of distances. They show how to build a data structure in O(n log n) time that can (1 ? ε)approximate the Fermat-Weber weight of any point in O(log n) time. However, the constants in their algorithms depend exponentially on the dimension d.
Succinct Data Structures for Approximating Convex Functions
105
The same authors also give applications of this data structure to a number of facility-location and clustering problems, including evaluation of the Medoid and AverageDistance clustering measures, the Fermat-Weber problem, the constrained Fermat-Weber problem, and the constrained obnoxious facility-location problem. All of these applications also work with the data structure of Section 3, many with improved running times. A summary of these results is given in Table 1.
Table 1. Applications of the data structure for evaluating the Fermat-Weber weights of points under the Euclidean distance function. Problem Exact solution Average distance O(n2 ) Medoid (1-Median) O(n2 ) Discrete Fermat-Weber O(n2 ) Fermat-Weber – Constrained Fermat-Weber O(n2 ) Constrained OFL O(n2 )
a b
Previous ε-approx. O(n)a O(n log n) O(n)a O(n log n) O(n)a O(n log n) O(n)b O(n log n) O(n)b O(n log n) O(n)a O(n log n)
Ref. [5,3] [5,3] [5,3] [5,3] [5,3] [5,7,3]
New ε-approx. O(n log log n) O(n log log n) O(n log log n) O(n) O(n) O(n log log n)
Refers to a randomized algorithm that outputs a (1 ? )-approximation with constant probability. Refers to a randomized algorithm that outputs a (1 ? )-approximation with high probability, i.e., with probability 1 ? n?c , for some c > 0.
6
Summary and Conclusions
We have given static and dynamic data structures for approximating convex functions of one variable whose slopes are bounded. These data structures have applications to problems involving sums of distances in d dimensions under both the Manhattan and Euclidean distance functions. In developing these applications we have arrived at a technique of independent interest, namely that of approximating Euclidean distance as the sum of several Manhattan distances under several di?erent orientations of the coordinate system.
References
1. P. K. Agarwal and S. Har-Peled. Maintaining the approximate extent measures of moving points. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 148–157, 2001. 2. C. Bajaj. The algebraic degree of geometric optimization problems. Discrete & Computational Geometry, 3:177–191, 1988. 3. P. Bose, A. Maheshwari, and P. Morin. Fast approximations for sums of distances, clustering and the Fermat-Weber problem. Computational Geometry: Theory and Apllications, 24:135–146, 2002. 4. W. Hoe?ding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.
106
Prosenjit Bose, Luc Devroye, and Pat Morin
5. P. Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the 31st ACM Symposium on Theory of Computing (STOC’99), 1999. 6. P. Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, pages 10–33, 2001. 7. J. Kleinberg. Two algorithms for nearest neighbour search in high dimensions. In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC’97), pages 599–608, 1997. 8. D. S. Mitrinovic. Analytic Inequalities. Springer, New York, 1970.
A
The Value of cd
The value of cd is given by
d
cd = E
i=1
|Xi |
,
where (X1 , . . . , Xd ) is a point taken from the uniform distribution on the surface 2 2 of the unit ball in Rd . We observe that (X1 , . . . , Xd ) is distributed as
2 N1 N2 ,..., d 2 N N2
,
where N 2 =
d i=1
Ni2 and (N1 , . . . , Nd ) are i.i.d. normal(0, 1). Clearly,
2 2 G( 1 N1 N1 L 2) = = d 1 1 2 2 N2 G( 2 ) + G( d? N1 + i=2 Ni 2 )
d?1 1 d?1 where G( 1 2 ), and G( 2 ) are independent gamma( 2 ) and gamma( 2 ) random 1 d?1 2 variables, respectively. Thus, N1 /N is distributed as a beta( 2 , 2 ) random d?1 variable, β ( 1 2 , 2 ). We have: d
E
i=1
|Xi | = d E
1
β
1
1 d?1 , 2 2
d?1
=d
0
x 2 ?1 (1 ? x) 2 d?1 B( 1 2, 2 )
?1
·
√
x dx
d?1 B( 1 2, 2 ) 2 = , d+1 B( 1 , 2 2 )
=d·
1 B (1, d? 2 )
where B (a, b) is the beta function.
Succinct Data Structures for Approximating Convex Functions
107
From Mitrinovic [8, p. 286], we note: 2 ≥ d+1 B( 1 , 2 2 ) =2 ≥ Furthermore,
d
1 1 d 1 + + · 2 4 16d + 32 Γ ( 1 2) d 1 1 1 + + ·√ 2 4 16d + 32 π 2d + 1 . π
(3) (4) (5)
E
i=1
|Xi | = d E[|X1 |] ≤√ ≤√ ≤ d+1 π·
d 2
+
3 4
+
1 16d+48
2(d + 1) √ π · 2d + 3 2(d + 1) . π
In summary, 2Γ ( d + 1) 2d + 1 ≤ cd = √ 2 d+1 ≤ π π · Γ( 2 ) 2(d + 1) . π