Research & Publications

Evaluating Topic Models with OpenAI Embeddings: A Comparative Analysis on Variable-Length Texts Using Two Datasets

Outlet Title

Proceedings of the 58th Hawaii International Conference on System Sciences

Abdullah Wahbeh, Slippery Rock University of PennsylvaniaFollow
Mohammad Al-Ramahi, Texas A&M UniversityFollow
Omar F. El-Gayar, Dakota State UniversityFollow
Ahmed Elnoshokaty, California State University - San BernardinoFollow
Tareq Nasralah, Northeastern UniversityFollow

Document Type

Conference Proceeding

Publication Date

2025

Abstract

Topic modeling is a crucial unsupervised machine learning technique for identifying themes within unstructured text. This study compares traditional topic modeling methods, like Latent Dirichlet Allocation (LDA), against advanced embedding-based models, specifically BERTopic-OpenAI. The analysis utilizes two distinct datasets: user reviews from the mental health app Replika and the 20newsgroup dataset. For the Replika dataset, both methods identified common themes, but BERTopic-OpenAI uncovered additional nuanced topics, demonstrating its enhanced semantic capabilities. Quantitative evaluation of the 20newsgroup dataset further highlighted BERTopic-OpenAI's advantage through achieving higher topic coherence and diversity than the best-performing LDA model. These results suggest that embedding-based models provide more coherent, interpretable, and diverse topics, making them valuable tools for extracting meaningful insights from extensive and variable-length text corpora. Future research should focus on refining these advanced techniques to improve their applicability and effectiveness in dynamic and varied textual environments.

Recommended Citation

Wahbeh, Abdullah; Al-Ramahi, Mohammad; El-Gayar, Omar F.; Elnoshokaty, Ahmed; and Nasralah, Tareq, "Evaluating Topic Models with OpenAI Embeddings: A Comparative Analysis on Variable-Length Texts Using Two Datasets" (2025). Research & Publications. 438.
https://scholar.dsu.edu/bispapers/438

Download

COinS

Research & Publications

Evaluating Topic Models with OpenAI Embeddings: A Comparative Analysis on Variable-Length Texts Using Two Datasets

Outlet Title

Document Type

Publication Date

Abstract

Recommended Citation

Browse

Search

Author Corner

Links

Research & Publications

Evaluating Topic Models with OpenAI Embeddings: A Comparative Analysis on Variable-Length Texts Using Two Datasets

Outlet Title

Authors

Document Type

Publication Date

Abstract

Recommended Citation

Share

Browse

Search

Author Corner

Links