kd 树 | Yuan Tian's Blog

本次作业需要写一个数据类型来代表在一个单元格中的一组点（所有点的 x 轴坐标和 y 轴坐标都在 0 到 1之间），用 2d 树实现高效的区间搜索（找出所有在查询矩形中的点）和最近邻搜索（找出一个距查询点最近的点）。
kdtree-ops

1 几何基元（Geometric primitives）

使用下面的平面上的点和边与轴平行的矩形的几何基元
RectHV

2 数据结构

使用 algs4.jar 中的不可变类型 Point2D 表示平面上的点，下面是可能会用到的一部分 API：

public class Point2D implements Comparable<Point2D> {
   public Point2D(double x, double y)              // construct the point (x, y)
   public  double x()                              // x-coordinate 
   public  double y()                              // y-coordinate 
   public  double distanceTo(Point2D that)         // Euclidean distance between two points 
   public  double distanceSquaredTo(Point2D that)  // square of Euclidean distance between two points 
   public     int compareTo(Point2D that)          // for use in an ordered symbol table 
   public boolean equals(Object that)              // does this point equal that object? 
   public    void draw()                           // draw to standard draw 
   public  String toString()                       // string representation 
}

使用 algs4.jar 中的不可变类型 RectHV 表示边与轴平行的矩形，下面是可能会用到的一部分 API：

public class RectHV {
   public    RectHV(double xmin, double ymin,      // construct the rectangle [xmin, xmax] x [ymin, ymax] 
                    double xmax, double ymax)      // throw a java.lang.IllegalArgumentException if (xmin > xmax) or (ymin > ymax)
   public  double xmin()                           // minimum x-coordinate of rectangle 
   public  double ymin()                           // minimum y-coordinate of rectangle 
   public  double xmax()                           // maximum x-coordinate of rectangle 
   public  double ymax()                           // maximum y-coordinate of rectangle 
   public boolean contains(Point2D p)              // does this rectangle contain the point p (either inside or on boundary)? 
   public boolean intersects(RectHV that)          // does this rectangle intersect that rectangle (at one or more points)? 
   public  double distanceTo(Point2D p)            // Euclidean distance from point p to closest point in rectangle 
   public  double distanceSquaredTo(Point2D p)     // square of Euclidean distance from point p to closest point in rectangle 
   public boolean equals(Object that)              // does this rectangle equal that object? 
   public    void draw()                           // draw to standard draw 
   public  String toString()                       // string representation 
}

不要修改这些数据类型

3 暴力算法实现（Brute-force implementation）

写一个可变数据类型 PointSET.java 来表示一个单元格中的一组点，通过使用红黑二叉搜索树实现以下 API（还需要用到 algs4.jar 或 java.util.TreeSet 中的 SET）

public class PointSET {
   public         PointSET()                               // construct an empty set of points 
   public           boolean isEmpty()                      // is the set empty? 
   public               int size()                         // number of points in the set 
   public              void insert(Point2D p)              // add the point to the set (if it is not already in the set)
   public           boolean contains(Point2D p)            // does the set contain point p? 
   public              void draw()                         // draw all points to standard draw 
   public Iterable<Point2D> range(RectHV rect)             // all points that are inside the rectangle 
   public           Point2D nearest(Point2D p)             // a nearest neighbor in the set to point p; null if the set is empty 

   public static void main(String[] args)                  // unit testing of the methods (optional) 
}

3.1 极端案例（Corner cases）

如果有参数为 null，抛出异常 java.lang.NullPointerException

3.2 性能要求（Performance requirements）

insert() 和 contains() 的时间复杂度与最坏情况下集合中点的数目的对数成正比；nearest() 和 range() 的时间复杂度与集合中点的数目成正比

4 2d 树实现

写一个可变数据类型 KdTree.java 来表示一个单元格中的一组点，通过使用 2d 树实现相同的 API（用 KdTree 替换 PointSET）。一个 2d 树是一个一般化的二叉搜索树，键是二维的。思路是构建一棵结点存放点的二叉搜索树，用点的 x 轴坐标和 y 轴坐标交替作为键

搜索和插入
搜索和插入的算法和二叉搜索树类似，但是在根节点用 x 轴坐标作为键（如果插入点的 x 轴坐标小于根节点的，去左子树，否则去右子树）；然后在下一层，用 y 轴坐标作为键（如果插入点的 y 轴坐标小于根节点的，去左子树，否则去右子树）；然后在下一层用 x 轴坐标作为键，一直循环下去（如下图所示）
绘图
一个 2d 树可以将空间简单的切分：所有根结点左边的点在左子树；右边的点在右子树；然后如此递归。draw() 方法需要在标准绘图中用黑色画出所有点，用红色竖向切分，用蓝色横向切分。这个方法没什么效率——它主要用来 debugging

2d 树对于二叉搜索树的主要优势是可以高效的实现区间搜索和最近邻搜索。根结点代表整个平面，它的左右子结点代表用 x 轴坐标切分的两个矩形，以此类推。
区间搜索和最近邻搜索见上一篇文章，这里不再赘述，题干上是这么写的：

Range search. To find all points contained in a given query rectangle, start at the root and recursively search for points in both subtrees using the following pruning rule: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). A subtree is searched only if it might contain a point contained in the query rectangle.
Nearest neighbor search. To find a closest point to a given query point, start at the root and recursively search in both subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, a node is searched only if it might contain a point that is closer than the best one found so far. The effectiveness of the pruning rule depends on quickly finding a nearby point. To do this, organize your recursive method so that when there are two possible subtrees to go down, you always choose the subtree that is on the same side of the splitting line as the query point as the first subtree to explore—the closest point found while exploring the first subtree may enable pruning of the second subtree.

5 Clients

用以下的交互客户端程序测试和调试程序

KdTreeVisualizer.java
计算和绘制用户鼠标点击产生的一系列点产生的 2d 树
RangeSearchVisualizer.java
从一个文件（用命令行参数指定）中读取一系列点，并这些点插入 2d 树。然后，执行区间搜索，其中边与轴平行的矩形由用户拖动而成
NearestNeighborVisualizer.java
从一个文件（用命令行参数指定）中读取一系列点，并这些点插入 2d 树。然后，执行最近邻搜索，其中边与轴平行的矩形由用户点击而成