The Sign of the Marginal Q

CLUSTER: Finding the k nearest neighbors

Andreas Lehnert and Francis Quimby and Darrell Ashton


Notes

  1. Program requires input file with specific fields in CSV format. (See below.)
  2. Uses Haversine great circle distance formula.
  3. Algorithm may not be most efficient (but it does fine).
  4. This program is distibuted as is with no warrantee, including any implicit guarantee of usability or correctness.

Usage:

cluster -ooutputfile datafile

Datafile Format:

Each line of the input file should be a CSV data line; i.e. contain no description lines. Fields are separated by commas and records terminated by newlines; the file should contain no additional whitespace.

Fields:

  1. Integer longitude
  2. Integer latitude
  3. A non-empty variable (e.g. record id)

Outputfile Format:

The first line will be the name of the original file from which the locations were read. All subsequent lines are data.

Fields:

  1. Observation number: The line number from the datafile (starting from zero).
  2. Radians longitude
  3. Radians latitude
  4. Field 3 from datafile
  5. Observation number of closest neighbor.
  6. Observation number of second-closest neighbor.
  7. Observation number of third-closest neighbor.
  8. etc.

Download


Andreas W. Lehnert <Andreas@marginalq.com>
Last update: 25-Nov-2003