Description |
Matrices are ubiquitous in data analysis. Most real-world datasets are formulated as n x d matrix A, where n represents the number of data points and d represents the number of features. In addition, matrices are also used to store pairwise similarities between data points. Performing any large-scale machine-learning task in an efficient manner needs large matrices to be stored in memory, while also having them distributed across various machines. Since most datasets have a lower intrinsic dimension for the actual signal, a smaller sketch matrix approximates the original matrix in addition to giving a low-rank matrix, which reduces the memory requirements of machine-learning tasks. Computing low-rank matrices for best approximation accuracy is done via Singular Value Decompostion (SVD), which is computationally expensive and is not suitable in distributed environments. In this thesis, we survey various algorithms for matrix approximation and present improvements over existing work. |