| OCR Text |
Show 22 itially, the entire set of data points is approximated by a single least squares line. No parametrization is required for this, since an explicit line can be used, of the form y(x) = a x + b or x(y) = a y + b. The initial set of data points is then subdivided and each set is approximated by a least squares line, then each of those is subdivided and approximated, and so on, until no more than (MinPts+ 1 )/2 data points are approximated by each segment. When the data set is subdivided, the subdivision point is included in both subsets. The approximating line is used to determine where to subdivide the data set. Each data set is subdivided at a point that is within MinPts/2 of the central point of the data set, and is further from the line than any of its MinPts/2 neighbors; if no such point is found, subdivision is at the central point. This serves two purposes: to bias this method toward subdividing at data points which may be corners, which will be important later, and to assure that data sets are subdivided into subsets of approximately equal sizes, which is useful in the early stages when the linear approximation reflects the the shape of the data so poorly that subdivision points cannot be picked very reliably. Once the subdivision is complete, lines which approximate fewer than MinPts data points are recombined with one of the neighboring lines, whichever combination results in the smallest error. The over-subdividing and recombining also makes corner detection more likely. The final step is to recombine all consecutive lines in the approximation that can be recombined without increasing the error greatly. One useful characteristic of this approximation is that apparent corners in the data can be easily detected in the linear approximation, since it tends to subdivide there, producing a corner in the first-order approximation. Also, the final recombination of similar lines tends to coalesce lines in regions without large fluctuations, resulting in one long approximating line. The heuristics take advantage of these characteristics. Low curvature is the easiest to detect, indicated by relatively long or nearly parallel approximating lines. Short approximating lines indicate regions of curvature; sharp angles |