Jie Tao

Date of Award

Spring 4-2015

Document Type


Degree Name

Doctor of Science in Information Systems


Business and Information Systems

First Advisor

Amit V. Deokar

Second Advisor

Yen-Ling Chang

Third Advisor

John Nelson


Initial Public Offering (IPO) process and the associated pricing strategies are of much interest to researchers and practitioners (e.g., underwriters, investors) in the finance and accounting domains. IPO prospectuses, regulated by Security Exchange Committee (SEC), serve as the most reliable publicly available information source in the IPO process. IPO prospectuses disclose a variety of information; however, traditional studies do not leverage the rich knowledge hidden in the vast textual information within them. The research gap can be partially attributed to the lack of an underlying formal knowledge structure to support the extraction of the implicit knowledge from the prospectuses, as well as to the absence of quantitative metrics that reflect management's outlook and awareness embedded in the prospectuses. The primary research question addressed in this work is: "How do the management's awareness of risks (expressed via the emphasized mentions in the Risk Factors sections of the Form 424 filings), and confidence about the firm's outlook (expressed through the sentiments in the MD&A sections) affect IPO valuations?" The major research problem could be further broken down into two research goals: a) to develop an actionable knowledge structure for guiding the extraction, storing the results, and facilitate reasoning of the knowledge hidden in the textual content of the IPO prospectuses; and b) to utilize the knowledge structure developed above, as well as the predictive models, to estimate pricing volatility prior to and right after the IPO date. In order to identify and quantify such inter­relationships, an underlying knowledge structure needs to be constructed and updated with minimal manual interventions for efficient knowledge acquisition and accurate knowledge representation purposes. In this dissertation, to bridge aforementioned research gaps, I proposed a text analytics framework for assisting the investment and underwriting decision making processes. Two major components existing in the proposed framework, namely the ontology enrichment methodology that updates the ontology in real time and online mode, and the predictive modeling techniques using the extracted information based on the ontology for predicting IPO pricing. The proposed framework is then developed in the form of a research prototype, which is used to predict pricing trends during and after the IPO process. I use real world data to evaluate the framework itself as well as the prediction results through a set of experiments, which yield promising results.

Design science research methodology is applied as the methodological framework in this study. Two motivational scenarios are provided to illustrate the significance and relevance of this study. The searching and developing process of a solution is documented in detail. I have compared our approach to the existing body of research and illustrated its novelty. Further, I have evaluated the proposed IT artifacts (the analytical framework), first through feasibility and functionality testing and second through an experimental approach for analyzing efficiency and accuracy.

The proposed analytical framework is evaluated by various means. First, a case study is designed to evaluate the functionalities and efficiencies of the framework. Second, the practical relevance of the framework is evaluated through the results of the predictive models. Third, the design artifacts are also evaluated against the design requirements drawn from existing literature. The evaluation results in this study are satisfying, which indicate the promising prospects of this project in practice.

There are two key research contributions of this work: a) an (semi-) automatic approach for enriching the specifications of domain knowledge bases (i.e. ontologies) is developed and evaluated, as an underlying knowledge structure for the analytical process. The approach is unique in the sense of incorporating feature-based word sense disambiguation and relation extraction methods in the process; b) several predictive models are designed based on extracted knowledge from the prospectus, for the purpose of predicting pre- and post-IPO pricing volatility. The results of this phase of the study ensure its practical relevance. In addition to these two primary contributions, two metrics are also designed, as a proxy of the management's awareness of risks and management's confidence regarding the organization's future operations. This metrics are based on the textual contents in the more informative sections in the prospectuses (i.e. Risk Factors, Management Discussions and Analysis) and to the best of our knowledge, these metrics are the first of its kind to quantify such information. Further, the analytical framework and development approaches of the design artifacts can be adopted in other application domains such as healthcare informatics, social media analysis, and so forth.