cccg09_bichromatic

Upload: sagar

Post on 08-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 cccg09_bichromatic

    1/4

    CCCG 2009, Vancouver, BC, August 1719, 2009

    The Bichromatic Rectangle Problem in High Dimensions

    Jonathan Backer J. Mark Keil

    Abstract

    Given a set of blue points and a set of red points in d-dimensional space, we show how to find an axis-alignedhyperrectangle that contains no red points and as manyblue points as possible. Our algorithm enumerates theset ofrelevant hyperrectangles (inclusion maximal axis-aligned hyperrectangles that do not contain a red point)and counts the number of blue points in each one. Theruntime of our algorithm depends on the total numberof relevant hyperrectangles. We prove asymptotically

    tight bounds on this quantity in the worst case. Thetechniques developed directly apply to the maximumempty rectangle problem in high dimensions.

    1 Introduction

    We show how to solve instances of the bichromatic shapeproblem: find a figure of a certain shape that con-tains no red points and as many blue points as possi-ble. Specifically, we present an efficient solution to thebichromatic axis-aligned hyperrectangle problem. Therestriction to axis-aligned shapes is common in the liter-

    ature. As is customary, we assume that hyperrectanglesare axis-aligned, unless otherwise noted, to save space.Two other common conventions that we follow are theuse ofd to denote dimension and n to denote the totalnumber of red and blue points.

    Recently, Aronov and Har-Peled posed the bichro-matic problem for a variety of shapes [2], includingsquares and rectangles. In their paper, they present an(1+)-approximation algorithm for the bichromatic ballproblem that runs in O(nd/2(2 logn)d/2+1) time,for dimensions d 3.

    Eckstein et al. explore the bichromatic problem for

    hyperrectangles in high-dimensions, a problem moti-vated by data-analysis [7]. They show that the bichro-matic hyperrectangle problem is NP-hard, for arbi-trarily high dimensions. They also present a simpleO(n2d+1) time algorithm, for any fixed dimension d. Asthis runtime grows exponentially in d, they develop aheuristic algorithm suitable for high dimensions. In asubsequent paper, Liu and Nediak propose a O(n2 logn)

    Department of Computer Science, University of

    Saskatchewan, [email protected] of Computer Science, University of

    Saskatchewan, [email protected]

    time and O(n) space algorithm for the two-dimensionalbichromatic rectangle problem [10].

    The bichromatic shape problem is a natural variantof the maximum empty shape problem: find a figure ofa certain shape that is contained in a given boundingbox, contains no red points, and has as large a vol-ume as possible. The techniques used to solve thesetwo different problems are quite similar. The enumera-tion approach used in this paper was previously appliedto the maximum empty rectangle problem in two andthree dimensions [11, 13, 12, 5]. In a separate paper [3],

    we solve the two-dimensional bichromatic square andrectangle problems in O(n logn) and O(n log3 n) spacerespectively. Our approach uses techniques previouslyapplied to the maximum empty square and rectangleproblems [4, 1]. All of the techniques just mentionedrequire at least linear storage, which makes them un-suitable for large data sets. Edmonds et al. describea heuristic solution to the maximum empty rectangleproblem that typically requires much less space [8].

    For completeness, we also mention the maximum dis-crepancy problem, which is similar to the bichromaticshape problem: find a figure of a certain shape that

    maximises the difference between the number of con-tained blue points and number of contained red points.Dobkin et al. solve the maximum discrepancy problemfor rectangles in O(n2 logn) time [6]. In their paper,they relate various discrepancy problems to problemsin machine learning and computer graphics.

    2 Preliminaries

    By hyperrectangle, we mean a Cartesian product of dclosed intervals

    di=1[ai, bi]. Let Ebe any hyperrectan-

    gle enclosingall of the red and blue points. We say thata hyperrectangle is feasible, if it is contained in E and

    no red point lies in its interior. To solve the bichromatichyperrectangle problem, we only need to consider feasi-ble hyperrectangles. By relevant hyperrectangle (RHR),we mean a feasible hyperrectangle that is not properlycontained in any feasible hyperrectangle. Clearly, someRHR solves the bichromatic hyperrectangle problem.

    A side of a RHR is a region of its boundary mostextreme in some co-ordinate axis (i.e. boundary pointssuch that the ith co-ordinate equals ai or bi). Clearly,there are 2d sides to an RHR. We say that a side issupported, if its interior touches a red point or E. Aproperty of RHRs that we repeatedly exploit is that

  • 8/7/2019 cccg09_bichromatic

    2/4

    21st Canadian Conference on Computational Geometry, 2009

    each side of a RHR is supported: if some side S is un-supported, we can push S out while keeping the RHRfeasible, contradicting that the RHR is maximal.

    In Section 3, we bound the number of RHRs inthe worst case. This complements a prior bound ofO(n logd1 n) RHRs on average [5], under modest as-

    sumptions on the input (essentially that the red pointsare chosen independently).

    In Section 4, we show how to enumerate all RHRsand count the number of blue points contained in eachone. Such enumeration can also be used to solve themaximum empty rectangle problem. In three dimen-sions, our enumeration is asymptotically slower by alogn factor than the algorithm by Datta and Soundar-alakshmi [5], in the worst case. However, our enumera-tion is much simpler, it has the same asymptotic averagecase run time, and it generalises to higher dimensions.

    3 Number of Relevant Rectangles

    We now prove the following result conjectured in [5].

    Theorem 1 For any fixed dimension d, there are(nd) RHRs, in the worst case.

    3.1 Worst-case Lower Bound

    In this section, we arrange n red points so that thereare at least (n/d)d RHRs that contain the origin. Thisresult is a generalisation of a 3D construction [5]. Asan enclosing hyperrectangle E, we take the width twohypercube centred at the origin.

    Let x+i be the unit vector in the positive directionof the ith co-ordinate axis, for 1 i d. Define xianalogously. We first partition these 2d vectors into dpairs so that ifx+i is paired with either x

    +j or x

    j , then

    xi is paired with neither x+j nor x

    j . Figure 1 illustrates

    the odd and even cases of one pairing scheme.

    x1 x2 x3 x4 x5 x6x1 x2 x3 x4 x5

    +

    Figure 1: Partitions for d = 5 and d = 6.

    For each pair of vectors and contained in a par-tition, we evenly place n/d red points along the linesegment between and . In the projection onto theplane P defined by and , most of the red points areplaced at the origin, n/d points are placed on the axis, and n/d points are placed on the axis (seeFigure 2).

    Let H be any RHR in Rd that contains the origin.To visualise H, consider its projection H into P (see

    Figure 2). Let T denote the side ofH furthest in direc-tion . The top side T ofH is the projection ofT intoP. If we were to push T in or out, T would slide upor down. Moreover, H would remain feasible, as longas T did pass through a red point. Finally, T would besupported if and only if T were supported (assuming

    H still contained the origin). Similar observations holdwith respect to side R ofH furthest in direction andthe right side ofH. Figure 2 illustrates the projectionof n/d + 1 feasible, supported placements of T andR. These placements do not affect the placement of anyof the other sides of H. Therefore, there are at least(n/d + 1)d RHRs containing the origin.

    H

    Figure 2: Potential upper right corners of H form astaircase.

    3.2 Worst-case Upper Bound

    In this section, we prove that there are O(nd) RHRs byinduction on d. The base case where d = 1 is trivial.

    First, we bound the number of RHRs that are sup-ported by E on at least one side. Let H be a RHR,and let the top and bottom of H refer to its extremesides in the dth dimension. Suppose that the top of Htouches E (the other cases are similar). Let R be theregion between the two hyperplanes coinciding with thetop and bottom ofH. To apply our inductive hypoth-esis, we project everything contained in R into d 1dimensions (drop the dth co-ordinate). Note that theprojection ofH is a (d 1)-dimensional RHR. Hence,there are O(nd1) RHRs supported by E on the top,for each of the O(n) possible bottoms.

    Second, we bound the number of RHRs that are sup-ported by red points on each side. We do this by (a)mapping each such RHR to one of O(nd) kernels and(b) arguing that at most 4d RHRs map to a single ker-nel. Our argument is simplified by assuming that theco-ordinates of the red points in each axis are unique. Itis possible to symbolically perturb the set of red pointsso that this condition holds.

    To define our mapping, we choose an arbitrary orderof the 2d sides of a d-dimensional hyperrectangle (seeFigure 3). Given a RHR H, we visit each side ofH inorder. If a side contains a red point on its interior, we

  • 8/7/2019 cccg09_bichromatic

    3/4

    CCCG 2009, Vancouver, BC, August 1719, 2009

    1

    2

    3

    4H

    Figure 3: Deflation of a RHR in the given order thesides.

    push it in until a point occurs on its boundary. Whenthis process finishes, we are left with a kernelH, whereevery side touches a red point, but only on that sidesboundary. Let P be the set of red points touching H.Then |P| d because (a) each side ofH touches at mostone red point (by uniqueness of point co-ordinates) and(b) each point ofP touches at least two sides. Hence,there are O(nd) possible H because H is the smallesthyperrectangle containing P and |P| d.

    Given H, we can recover Hby visiting the sides ofH

    in reverse order and pushing each side out until its in-terior touches a red point. This requires knowing whichsides to inflate. There are 22d such possibilities.

    This counting argument was inspired by an argumentapplied to a related problem [9].

    4 Enumerating Relevant Hyperrectangles

    In this section we prove the following theorem.

    Theorem 2 For any fixed dimension d, we can solvebichromatic rectangle problem inO(k logd2 n) time andO(k+n logd2 n) space, where k is the number of RHRs.

    The core of our approach is an efficient enumerationof all RHRs. The details are simplified by assumingthat the co-ordinates of the red points in each dimen-sion are unique. This can be imposed with a symbolicperturbation of the point set.

    To find RHRs, we sweep Rd from to with ahyperplane P that is orthogonal to the dth axis. We re-fer to the sides ofP that extend towards and inthe dth axis as above and below respectively. Similarly,

    we refer to the sides of a hyperrectangle that are paral-lel to P and closest to and as top and bottomrespectively.

    Let E be the region of the bounding hyperrectan-gle E that lies below P. As we sweep, we ensure thatwe have discovered all of the RHRs with respect to E

    (i.e. the RHRs that result from taking E as the bound-ing hyperrectangle and restricting our attention to thered points inside ofE). To do this, we maintain a listL of all of the RHRs with respect to E that touch P.We represent each such RHR by how each of its sides issupported (i.e. by which red point or side of E). Oc-

    casionally, we must add to and delete from L. All suchevents occur when P passes through a red point.

    Let r1, r2, . . . denote the red points sorted in increas-ing order of dth co-ordinate. We now describe how toupdate L as P passes through ri. Let P

    + and P besweep planes lying just above and below ri respectively

    (see Figure 4). Let L

    +

    and L

    be the lists associatedwith P+ and P respectively. The RHRs common toboth L and L+ are not supported by ri on any bound-ary.

    Let Di be the set of RHRs in L such that ri liesdirectly above the top of each one. By our assumptionof co-ordinate uniqueness, ri supports the top of eachRHR in Di when the sweep plane reaches ri. So Diis the set of RHRs deleted from L+. To compute Di,we perform an orthogonal range query whenever a newRHR H is discovered. Specifically, we add H to Dj ,where rj is the red point with the lowest index that liesdirectly above the bottom ofH. If there is no such rj ,

    we add H to a special set D. We can locate rj inO(logd1 n) time by using a range tree with fractionalcascading. Such a tree requires O(n logd1 n) time andspace to construct.

    Let Ai be the set of RHRs in L+ that are supportedon some side by ri. Clearly, Ai is the set of RHRsadded to L+. Let Ha be a RHR ofAi. Then ri doesnot support the top ofHa, which is supported by P+. Ifri supports the bottom ofHa, then Ha is the region ofE above ri and below P

    +. Otherwise, Ha correspondsto some RHR of Di as follows (see Figure 4): Let H

    be the result of pushing the top of Ha down so that it

    coincides with P

    . When the sweep plane is at P

    ,the top ofH is supported by the plane, but one otherside is not. There is at most one such unsupported sidebecause the co-ordinates of the red points are unique.Push this one side out until it hits an obstacle. Thisresults in a RHR Hd that belongs to Di.

    Ha

    Hd

    ri

    P+

    P

    H

    Figure 4: An added hyperrectangle Ha corresponds toa deleted hyperrectangle Hd (illustrated in 2D).

    This correspondence between RHRs ofAi and RHRsofDi suggests a procedure for generating Ai from Di.Let Hd be a RHR ofDi. Let Sbe one of the (d1) axis-aligned hyperplanes through ri that is perpendicular toP. For each such S, split Hd along S, which resultsin two pieces. For each piece, check if it is properlysupported on all all but two sides: the side coincident

  • 8/7/2019 cccg09_bichromatic

    4/4

    21st Canadian Conference on Computational Geometry, 2009

    with S (which will be supported by ri) and the sidecoincident with P (which will be supported by P+).If a piece is properly supported add it to Ai. Otherwise,discard it. This process can be executed in O(|Di|) time,for any fixed d.

    The set of all RHRs are

    iDi. We can count the

    number of blue points contained in each one using arange tree with fractional cascading. Note that we donot need to compute L explicitly (it can be constructedfrom the Di). We only used it to simplify the descrip-tion of our algorithm. Let k be the number of RHRs.Clearly, n O(k) because each red point supports thebottom of at least one RHR. It is straightforward to ver-ify that this sweep-plane algorithm takes O(k logd1 n)time and O(k + n logd1 n) space.

    This sweep plane algorithm is similar to an algorithmfor finding maximal negative orthants [9].

    4.1 Saving a Logarithmic Factor

    A simple observation allows us to shave a dimension offof our red-point range tree: When we query the red-point range tree for a point lying above the bottom ofa RHR, we know that the desired point rj lies abovethe sweep plane. Hence, the height of the bottom isirrelevant, if we remove red points from the range treeas the sweep plane passes over them.

    Likewise, a simple observation allows us to shave adimension off of our blue-point range tree: The numberof points in a RHR is the number of blue points directlyabove the bottom minus the number of blue points di-rectly above the top. This allows us to perform a sep-arate sweep to count the number of blue points in eachRHR. Similarly, we remove points from the blue-pointrange tree as the sweep plane passes over them.

    With these slight modifications, our algorithm takesO(k logd2 n) time and O(k + n logd2 n) space.

    5 Open Problems

    In 2D, the bichromatic rectangle and maximum emptyrectangle problems can be solved without examining allof the RHRs. In the worst case, these algorithms per-form much better asymptotically than the enumeration

    algorithms. At the core of these non-enumeration algo-rithms is a fundamental observation that does not seemto generalise to higher dimensions. Still, the gap be-tween worst case run times in 2D suggests that we cando better than enumeration in three or higher dimen-sions. This remains an outstanding open problem.

    References

    [1] A. Aggarwal and S. Suri. Fast algorithms for com-puting the largest empty rectangle. In Proceedingsof the third annual Symposium on Computational

    Geometry, pages 278290. ACM New York, NY,USA, 1987.

    [2] B. Aronov and S. Har-Peled. On Approximatingthe Depth and Related Problems. SIAM Journalon Computing, 38(3):899921, 2008.

    [3] J. Backer and J. Keil. The bichromatic square andrectangle problems. Technical Report 2009-01, Uni-versity of Saskatchewan, 2009.

    [4] B. Chazelle, R. Drysdale, and D. Lee. Comput-ing the largest empty rectangle. SIAM Journal onComputing, 15:300, 1986.

    [5] A. Datta and S. Soundaralakshmi. An efficient al-gorithm for computing the maximum empty rect-angle in three dimensions. Information Sciences,128(1-2):4365, 2000.

    [6] D. Dobkin, D. Gunopulos, and W. Maass. Comput-ing the maximum bichromatic discrepancy, withapplications to computer graphics and machinelearning. Journal of Computer and System Sci-ences, 52(3):453470, 1996.

    [7] J. Eckstein, P. Hammer, Y. Liu, M. Nediak, andB. Simeone. The maximum box problem and itsapplication to data analysis. Computational Opti-mization and Applications, 23(3):285298, 2002.

    [8] J. Edmonds, J. Gryz, D. Liang, and R. Miller. Min-ing for empty spaces in large data sets. TheoreticalComputer Science, 296(3):435452, 2003.

    [9] H. Kaplan, N. Rubin, M. Sharir, and E. Verbin.Counting colors in boxes. In Proceedings of theeighteenth annual ACM-SIAM Symposium on Dis-

    crete algorithms, pages 785794. Society for Indus-trial and Applied Mathematics Philadelphia, PA,USA, 2007.

    [10] Y. Liu and M. Nediak. Planar case of the maxi-mum box and related problems. In Proceeding ofthe Canadian Conference on Computational Geom-

    etry, pages 1113, 2003.

    [11] A. Naamad, D. Lee, and W. Hsu. Maximum emptyrectangle problem. Discrete Appl. Math., 8(3):267277, 1984.

    [12] S. Nandy and B. Bhattacharya. Maximal emptycuboids among points and blocks. Computers andMathematics with Applications, 36(3):1120, 1998.

    [13] M. Orlowski. A new algorithm for the largest emptyrectangle problem. Algorithmica, 5(1):6573, 1990.