Predicting audience demographics of web sites using local cues

Update Item Information
Title Predicting audience demographics of web sites using local cues
Publication Type dissertation
School or College David Eccles School of Business
Department Entrepreneurship & Strategy
Author Kim, Iljoo
Date 2011-12
Description The size and dynamism of the Web poses challenges for all its stakeholders, which include producers/consumers of content, and advertisers who want to place advertisements next to relevant content. A Critical piece of information for the stakeholders is the demographices of the consumers who are likety to visit a given web site. However, predicting the demogrphics of consumers who are likely to visit a given web site, while being essential, remains a challenging task. Hence in this dissertation we ask the following questions: Is it possible to deduce the audience demographics of a web site based solely on the local cues such as the design or the content of the web site? If so, is it design, content, or combination that provides a good predicitive model? In addition to the design or the content, is it also possible to use the semantics embedded within content to further improve the prediction performance? We explore these questions with statistical analyses as well as predictive models using various modeling schemes. From the results, we find that it is indeed possible to effectively predict demographics of consumers of a web site using cues embedded in the design or the content of its homepage. In addition, we build and evaluate an ensemble classifier that combines the predictions from both design and content cues. An analysis of the ensemble suggests the possible use of the approach for better prediction. In addition to the classification-based predictive model that predicts a discrete demographic class (e.g., female) within each demographic dimension (e.g., gender), we also explore a regression-based predictive model that predicts the demographic composition (e.g., 63.5 % female) of a web site, which is a continuous dependent variable. We show that this model also works effectively with good estimation performance. Finally, we suggest a feature selection approach using Latent Dirichlet Allocation (LDA) method and show that semantics extracted from web site content using the method can also be utilized to achieve a competitive prediction performance while significantly improving the prediction efficiency. The approaches in this study serve as low-burden complements to the more intrusive and costly registration/cookie based techniques.
Type Text
Publisher University of Utah
Subject Audience demographics; Data mining; Online marketing strategies; Predictive modeling; text; classification; Web mining
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Rights Management Copyright © Iljoo Kim 2011
Format application/pdf
Format Medium application/pdf
Format Extent 3,260,093 bytes
Identifier us-etd3,72162
Source original in Marriott Library Special Collections; HF91.5 2011 .K55
ARK ark:/87278/s6902jgz
Setname ir_etd
ID 194323
Reference URL https://collections.lib.utah.edu/ark:/87278/s6902jgz
Back to Search Results