CLUSTER: Finding the k nearest
neighbors
Andreas Lehnert and Francis Quimby and Darrell Ashton
Notes
- Program requires input file with specific fields in CSV format.
(See below.)
- Uses Haversine great circle distance formula.
- Algorithm may not be most efficient (but it does fine).
- This program is distibuted as is with no warrantee,
including any implicit guarantee of usability or correctness.
Usage:
cluster -ooutputfile datafile
Datafile Format:
Each line of the input file should be a CSV data line; i.e. contain no
description lines. Fields are separated by commas and records
terminated by newlines; the file should contain no additional
whitespace.
Fields:
- Integer longitude
- Integer latitude
- A non-empty variable (e.g. record id)
Outputfile Format:
The first line will be the name of the
original file from which the locations were read. All subsequent
lines are data.
Fields:
- Observation number: The line number
from the datafile (starting from zero).
- Radians longitude
- Radians latitude
- Field 3 from datafile
- Observation number of closest
neighbor.
- Observation number of second-closest
neighbor.
- Observation number of third-closest
neighbor.
- etc.
Download
Andreas W. Lehnert <Andreas@marginalq.com>
Last update: 25-Nov-2003