Topic Modeling: How and Why to Use in Management Research

José Eduardo Storopoli

doi:10.5585/ijsm.v18i3.14561

Autores/as

José Eduardo Storopoli Universidade Nove de Julho - UNINOVE https://orcid.org/0000-0002-0559-5176

DOI:

https://doi.org/10.5585/ijsm.v18i3.14561

Palabras clave:

Topic modeling, Latent Dirichlet allocation, Computer-aided text analysis, Machine learning, Big data

Resumen

Objective: To exemplify how topic modeling can be used in management research, my objectives are two-fold. First, I introduce topic modeling as a social sciences research tool and map critical published studies in management and other social sciences that employed topic modeling in a proper manner. Second, I illustrate how to do topic modeling by applying topic modeling in an analysis of the last five years of published research in this journal: the Iberoamerican Journal of Strategic Management (IJSM).

Methodology: I analyze the last five years (2014 to 2018) of published articles in the IJSM. The sample is 164 articles. The abstracts were subjected to a standard topic modeling text pre-processing routine, generating 1,252 unique tokens.

Originality/Relevance: By proposing topic modeling as a valid and opportunistic methodology for analyzing textual data, it can shift the old paradigm that textual data belongs only to the qualitative realm. Furthermore, allowing textual data to be labeled and quantified in a reproducible manner that mitigates (or closely fully eliminates) researcher bias.

Main Results: Six topics were generated through Latent Dirichlet Allocation (LDA): Topic 1 – Strategy and Competitive Advantage; Topic 2 – International Business and Top Management Team; Topic 3 – Entrepreneurship; Topic 4 – Learning and Cooperation; Topic 5 – Finance and Strategy; and Topic 6 – Dynamic Capabilities.

Theoretical/methodological Contributions: I present the state of the art of the literature published in IJSM and also show how the reader can perform their own topic modeling. The full data and code that was used are available in free open science repositories in Open Science Framework (OSF) and GitHub.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

José Eduardo Storopoli, Universidade Nove de Julho - UNINOVE

Doutor em Administração pela Universidade Nove de Julho - UNINOVE, São Paulo, (Brasil). Professor do Doutorado em Administração e do Mestrado em Cidades Inteligentes e Sustentáveis na Universidade Nove de Julho. Líder da Linha de Pesquisa Inovação Aplicada ao Planejamento Urbano do Mestrado em Cidades Inteligentes e Sustentáveis.

Citas

Baumer, E. P. S., Mimno, D., Guha, S., Quan, E., & Gay, G. K. (2017). Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence? Journal of the Association for Information Science and Technology, 68(6), 1397–1410. https://doi.org/10.1002/asi.23786

, N. T., & Wang, X. (2016). Uncovering the message from the mess of big data. Business Horizons, 59(1), 115–124. https://doi.org/10.1016/j.bushor.2015.10.001

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(1), 993–1022.

Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17–35. https://doi.org/10.1214/07-aoas136

Bordag, S. (2008). A comparison of co-occurrence and similarity measures as simulations of context. Proceedings of the 9th international conference on computational linguistics and intelligent text processing, 52–63. https://doi.org/10.1007/978–3– 540–78135–6_5

Chang, J. (2011). lda: Collapsed Gibbs sampling methods for topic models. R.

Debortoli, S., Müller, O., Junglas, I., & vom Brocke, J. (2016). Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial. Communications of the Association for Information Systems, 39(1), 110–135. https://doi.org/10.17705/1CAIS.03907

Denny, M. J., & Spirling, A. (2018). Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It. Political Analysis, 26(2), 168–189. https://doi.org/10.1017/pan.2017.44

DiMaggio, P., Nag, M., & Blei, D. M. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41(6), 570–606. https://doi.org/10.1016/j.poetic.2013.08.004

DiMaggio, P. (2015). Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2), 205395171560290. https://doi.org/10.1177/2053951715602908

Glaser, B., & Strauss, A. (1967). Grounded theory: The discovery of grounded theory. Sociology the journal of the British Sociological Association, 12(1), 27-49.

Hannigan, T., Haans, R. F. J., Vakili, K., Tchalian, H., Glaser, V., Wang, M. & Jennings, P. D. (2019). Topic modeling in management research: Rendering new theory from textual data. Academy of Management Annals. https://doi.org/10.5465/annals.2017.0099

Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.

Hornik, K., & Grün, B. (2011). topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software., 40(13), 1–30.

Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-Assisted Text Analysis for Comparative Politics. Political Analysis, 23(2), 254–277. https://doi.org/10.1093/pan/mpu019

Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A. & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754

McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit. Retrieved from http://mallet.cs.umass.edu/index.php

Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 262–272.

Mimno, D. (2013). mallet: A wrapper around the Java machine learning tool MALLET. Retrieved from https://cran.r-project.org/package=mallet

Mohr, J. W., & Bogdanov, P. (2013). Introduction—Topic models: What they are and why they matter. Poetics, 41(6), 545–569. https://doi.org/10.1016/j.poetic.2013.10.001

Nelson, L. K. (2017). Computational Grounded Theory. Sociological Methods & Research, 1-40. https://doi.org/10.1177/0049124117729703

Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2018). The Future of Coding. Sociological Methods & Research, 1-36. https://doi.org/10.1177/0049124118769114

Nikolenko, S. I., Koltcov, S., & Koltsova, O. (2017). Topic modelling for qualitative studies. Journal of Information Science, 43(1), 88–102. https://doi.org/10.1177/0165551515617393

Ottolinger, P. (2019). bib2df: Parse a BibTeX File to a Data Frame. Retrieved from https://cran.r-project.org/package=bib2df

Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137. https://doi.org/10.1108/eb046814

Rehurek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). Valletta, Malta: ELRA. https://doi.org/10.13140/2.1.2393.1847

Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K. & Rand, D. G. (2014). Structural Topic Models for Open-Ended Survey Responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103

Roberts, M. E., Stewart, B. M., & Tingley, D. (2014). stm: R package for structural topic models. Journal of Statistical Software, 10(2), 1–40.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63–70).

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of Latent Semantic Analysis. Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Storopoli, J. (2019, July 22). Topic Modeling IJSM-RIAE. Retrieved from osf.io/97w6z

Torgerson, W. S. (1958). Theory and methods of scaling. New York: J. Wiley.

APPENDIX I - Topic terms weights

First Topic - 0.212*"estratég" + 0.049*"organiz" + 0.042*"competi" + 0.040*"prát" + 0.036*"merc" + 0.036*"conceit" + 0.035*"empr" + 0.022*"vantag" + 0.018*"gerenc" + 0.016*"comport"

Second Topic - 0.055*"gest" + 0.047*"teor" + 0.042*"negóci" + 0.027*"internac" + 0.024*"caracterís" + 0.022*"futur" + 0.021*"decis" + 0.020*"abord" + 0.018*"país" + 0.017*"relacion"

Third Topic - 0.080*"desenvolv" + 0.043*"ambi" + 0.032*"inform" + 0.030*"empreend" + 0.029*"públic" + 0.026*"sustent" + 0.025*"instituc" + 0.023*"institu" + 0.022*"internacion" + 0.022*"empreendedor"

Fourth Topic - 0.090*"process" + 0.037*"conhec" + 0.030*"entrev" + 0.030*"context" + 0.028*"qualit" + 0.023*"perspec" + 0.023*"form" + 0.023*"particip" + 0.020*"envolv" + 0.018*"mudanç"

Fifth Topic - 0.125*"empr" + 0.043*"fat" + 0.042*"recurs" + 0.040*"brasil" + 0.032*"ativ" + 0.026*"estrut" + 0.023*"corpor" + 0.022*"financ" + 0.019*"efici" + 0.019*"capit"

Sixth Topic - 0.084*"inov" + 0.082*"desempenh" + 0.067*"capac" + 0.062*"organizac" + 0.059*"model" + 0.027*"dimens" + 0.024*"dinâm" + 0.024*"produt" + 0.022*"ges" + 0.018*"pequen"