point_merge/point_merge.html

<html>

  <head>
    <title>
      POINT_MERGE - Count or Index Unique or Tolerably Unique Points
    </title>
  </head>

  <body bgcolor="#EEEEEE" link="#CC0000" alink="#FF3300" vlink="#000055">

    <h1 align = "center">
      POINT_MERGE <br> Count or Index Unique or Tolerably Unique Points
    </h1>

    <hr>

    <p>
      <b>POINT_MERGE</b>
      is a FORTRAN90 library which
      deals with the problem of counting or indexing the unique or
      "tolerably unique" points in a collection of N points in
      M dimensional space.
    </p>

    <p>
      This problem is distinct from, though similar to, problems such
      as finding the nearest neighbor, or counting all the points that
      lie within a given distance of each point, or finding the optimal
      assignment of N points into K clusters (the K-Means problem).
    </p>

    <p>
      The "tolerably unique" problem is the "Starbucks problem", that is,
      the task of choosing a list of Starbucks cafes to shut down, so that
      there is no Starbucks cafe across the street from another one.
      The Starbucks cafes that remain open are "tolerably unique", that
      is, there is now no other open cafe within the given tolerance.
    </p>

    <p>
      Given sets of data with some points very close to each other,
      there are a number of ways of resolving the data.  Here, a simpleminded
      approach is taken, in which we start with one tolerably unique point,
      and consider the remaining points one at a time, accepting the next
      point as long as it is not closer than the tolerance to some already
      accepted point.
    </p>

    <p>
      This is a simpler approach than trying to maximize the number of points
      you can have in the set, while satisfying the tolerance, or of trying
      to replace two nearby points by their average, for instance.
    </p>

    <p>
      For the unique case, in 1D, a simple and efficient procedure sorts
      the data, and then compares consecutive entries.
      For the unique case in multiple dimensions, the sorting procedure
      can still be used.
    </p>

    <p>
      For the "tolerably unique" case in 1D, the same sorting procedure
      can be used, but in multiple dimensions, the usual kinds of lexicographic
      sorting will interleave near and far points in a way that is
      hard to deal with.
    </p>

    <p>
      A reliable method for the tolerably unique case in multiple dimensions
      is simply to compute the distance between every pair of points.
      However, this is an O(N^2) computation, and becomes terribly unsuitable
      when the number of points considered is in the tens of thousands or more.
    </p>

    <p>
      The "radial" approach, implemented in <b>POINT_RADIAL_TOL_UNIQUE_COUNT</b>,
      picks a random base point Z, computes the radial distance R(I) of each point
      P(I) to Z, and then sorts the data by R.  It then counts tolerably unique
      items by inspecting the R array in order.  Two points are possible
      neighbors only if they lie within a TOL interval in R.  Assuming the
      points are in general position, the number of points that need to be
      compared will be small enough that this algorithm is essentially O(N)
      rather than O(N^2).
    </p>

    <p>
      In MATLAB, the <b>unique</b> command can select the unique points;
      there is also a user-written function called <b>consolidator</b>
      that can merge points with a tolerance.
    </p>

    <h3 align = "center">
      Licensing:
    </h3>

    <p>
      The computer code and data files described and made available on this web page
      are distributed under
      <a href = "../../txt/gnu_lgpl.txt">the GNU LGPL license.</a>
    </p>

    <h3 align = "center">
      Languages:
    </h3>

    <p>
      <b>POINT_MERGE</b> is available in
      <a href = "../../c_src/point_merge/point_merge.html">a C version</a> and
      <a href = "../../cpp_src/point_merge/point_merge.html">a C++ version</a> and
      <a href = "../../f77_src/point_merge/point_merge.html">a FORTRAN77 version</a> and
      <a href = "../../f_src/point_merge/point_merge.html">a FORTRAN90 version</a> and
      <a href = "../../m_src/point_merge/point_merge.html">a MATLAB version</a>.
    </p>

    <h3 align = "center">
      Related Data and Programs:
    </h3>

    <p>
      <a href = "../../cpp_src/ann/ann.html">
      ANN</a>,
      a C++ library which
      computes Approximate Nearest Neighbors,
      by David Mount, Sunil Arya;
    </p>

    <p>
      <a href = "../../cpp_src/ann_test/ann_test.html">
      ANN_TEST</a>,
      a C++ program which
      uses ann to approximate the nearest
      neighbors of a set of points stored in a file;
    </p>

    <p>
      <a href = "../../datasets/cities/cities.html">
      CITIES</a>,
      a dataset directory which
      contains sets of information about cities and the distances between them;
    </p>

    <p>
      <a href = "../../f_src/cities/cities.html">
      CITIES</a>,
      a FORTRAN90 library which
      handles various problems associated with a set of "cities" on a map.
    </p>

    <p>
      <a href = "../../f_src/kmeans/kmeans.html">
      KMEANS</a>,
      a FORTRAN90 library which
      contains several different algorithms for the K-Means problem.
    </p>

    <p>
      <a href = "../../f_src/spaeth/spaeth.html">
      SPAETH</a>,
      a FORTRAN90 library which
      can cluster data according to various principles.
    </p>

    <p>
      <a href = "../../f_src/spaeth2/spaeth2.html">
      SPAETH2</a>,
      a FORTRAN90 library which
      can cluster data according to various principles.
    </p>

    <p>
      <a href = "../../f_src/table_merge/table_merge.html">
      TABLE_MERGE</a>,
      a FORTRAN90 program which
      reads a file of N points in M dimensions, removes duplicates or points
      that are closer than some tolerance, and writes the reduced set of points
      to a file.
    </p>

    <h3 align = "center">
      Source Code:
    </h3>

    <p>
      <ul>
        <li>
          <a href = "point_merge.f90">point_merge.f90</a>, the source code.
        </li>
        <li>
          <a href = "point_merge.sh">point_merge.sh</a>,
          commands to compile the source code.
        </li>
      </ul>
    </p>

    <h3 align = "center">
      Examples and Tests:
    </h3>

    <p>
      <ul>
        <li>
          <a href = "point_merge_prb.f90">point_merge_prb.f90</a>,
          a sample calling program.
        </li>
        <li>
          <a href = "point_merge_prb.sh">point_merge_prb.sh</a>,
          commands to compile and run the sample program.
        </li>
        <li>
          <a href = "point_merge_prb_output.txt">point_merge_prb_output.txt</a>,
          the output file.
        </li>
      </ul>
    </p>

    <h3 align = "center">
      List of Routines:
    </h3>

    <p>
      <ul>
        <li>
          <b>I4_UNIFORM</b> returns a scaled pseudorandom I4.
        </li>
        <li>
          <b>POINT_UNIQUE_COUNT</b> counts the number of unique points.
        </li>
        <li>
          <b>POINT_RADIAL_UNIQUE_COUNT</b> counts the number of unique points.
        </li>
        <li>
          <b>POINT_RADIAL_TOL_UNIQUE_COUNT</b> counts the number of tolerably unique points.
        </li>
        <li>
          <b>POINT_RADIAL_TOL_UNIQUE_INDEX</b> indexes the tolerably unique points.
        </li>
        <li>
          <b>POINT_TOL_UNIQUE_COUNT</b> counts the number of tolerably unique points.
        </li>
        <li>
          <b>POINT_TOL_UNIQUE_COUNT</b> counts the number of tolerably unique points.
        </li>
        <li>
          <b>POINT_TOL_UNIQUE_INDEX</b> indexes the tolerably unique points.
        </li>
        <li>
          <b>R8COL_DUPLICATES</b> generates an R8COL with some duplicate columns.
        </li>
        <li>
          <b>R8COL_SORT_HEAP_INDEX_A</b> does an indexed heap ascending sort of an R8COL.
        </li>
        <li>
          <b>R8MAT_TRANSPOSE_PRINT</b> prints an R8MAT, transposed.
        </li>
        <li>
          <b>R8MAT_TRANSPOSE_PRINT_SOME</b> prints some of an R8MAT, transposed.
        </li>
        <li>
          <b>R8MAT_UNIFORM_01</b> returns a unit pseudorandom R8MAT.
        </li>
        <li>
          <b>R8VEC_COMPARE</b> compares two R8VEC's.
        </li>
        <li>
          <b>R8VEC_PRINT</b> prints an R8VEC.
        </li>
        <li>
          <b>R8VEC_SORT_HEAP_INDEX_A</b> does an indexed heap ascending sort of an R8VEC.
        </li>
        <li>
          <b>R8VEC_UNIFORM_01</b> returns a unit pseudorandom R8VEC.
        </li>
        <li>
          <b>TIMESTAMP</b> prints the current YMDHMS date as a time stamp.
        </li>
      </ul>
    </p>

    <p>
      You can go up one level to <a href = "../f_src.html">
      the FORTRAN90 source codes</a>.
    </p>

    <hr>

    <i>
      Last revised on 23 July 2010.
    </i>

    <!-- John Burkardt -->

  </body>

  <!-- Initial HTML skeleton created by HTMLINDEX. -->

</html>