Description |
Intrinsic dimension estimation is a fundamental problem in manifold learning. In applications, high-dimensional data frequently exhibit an underlying lower-dimensional structure that, if understood, would allow for faster and more complete analysis of the data. Understanding of this underlying structure requires first determining what the dimension of that structure is. In this dissertation, a new intrinsic dimension estimator is proposed. This estimator does not estimate the intrinsic dimension of a set directly, rather it estimates the dimension of the sampling measure. This approach acknowledges the fact that in applications the lower-dimensional structure is unknown; the only information about the set that is available to researchers is what is collected via some sampling method. Theoretical performance guarantees for this estimator are proven that show that the estimator will perform well with only very mild restrictions. Finally, the results of several numerical experiments are provided as evidence that the estimator performs as well or better than the estimators that have been proposed in the literature. Having determined the intrinsic dimension of a set, it remains to examine the geometry of the underlying lower-dimensional structure. This dissertation examines a new technique called Kernel Map Manifolds that has been proposed by Samuel Gerber to do precisely this. The Kernel Map Manifolds algorithm uses the complementary ideas of principal surfaces and kernel regression to estimate the geometry of the underlying structure of sample data. This algorithm relies on a conjecture about the nature of the class of minimizers of a distance function. If this conjecture is true, then a gradient descent method can be employed to produce estimated coordinate maps for a principal surface of a distribution. While this conjecture is not addressed directly herein, what is shown is that if the coordinate map of a principal surface of a given distribution is known, then sample data can be used in conjunction with this knowledge to produce accurate estimates of the principal surface thereby showing that if the conjecture is true then the Kernel Map Manifolds algorithm will produce accurate estimates of the underlying lower-dimensional geometry of the set. |