Analyzing the influence of hyper-parameters and regularizers of topic modeling in terms of Renyi entropy

dc.contributor.authorKoltcov, Sergei
dc.contributor.authorIgnatenko, Vera
dc.contributor.authorBoukhers, Zeyd
dc.contributor.authorStaab, Steffen
dc.date.accessioned2024-10-16T08:16:37Z
dc.date.available2024-10-16T08:16:37Z
dc.date.issued2020
dc.date.updated2020-05-02T20:28:56Z
dc.description.abstractTopic modeling is a popular technique for clustering large collections of text documents. A variety of different types of regularization is implemented in topic modeling. In this paper, we propose a novel approach for analyzing the influence of different regularization types on results of topic modeling. Based on Renyi entropy, this approach is inspired by the concepts from statistical physics, where an inferred topical structure of a collection can be considered an information statistical system residing in a non-equilibrium state. By testing our approach on four models-Probabilistic Latent Semantic Analysis (pLSA), Additive Regularization of Topic Models (BigARTM), Latent Dirichlet Allocation (LDA) with Gibbs sampling, LDA with variational inference (VLDA)-we, first of all, show that the minimum of Renyi entropy coincides with the “true” number of topics, as determined in two labelled collections. Simultaneously, we find that Hierarchical Dirichlet Process (HDP) model as a well-known approach for topic number optimization fails to detect such optimum. Next, we demonstrate that large values of the regularization coefficient in BigARTM significantly shift the minimum of entropy from the topic number optimum, which effect is not observed for hyper-parameters in LDA with Gibbs sampling. We conclude that regularization may introduce unpredictable distortions into topic models that need further research.
dc.description.sponsorshipNational Research University Higher School of Economicsde
dc.description.sponsorshipDeutsche Forschungsgemeinschaftde
dc.identifier.issn1099-4300
dc.identifier.other1906947651
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-150772de
dc.identifier.urihttp://elib.uni-stuttgart.de/handle/11682/15077
dc.identifier.urihttp://dx.doi.org/10.18419/opus-15058
dc.language.isoende
dc.relation.uridoi:10.3390/e22040394de
dc.rightsinfo:eu-repo/semantics/openAccessde
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/de
dc.subject.ddc620de
dc.titleAnalyzing the influence of hyper-parameters and regularizers of topic modeling in terms of Renyi entropyde
dc.typearticleen
ubs.fakultaetInformatik, Elektrotechnik und Informationstechnikde
ubs.fakultaetFakultätsübergreifend / Sonstige Einrichtungde
ubs.institutInstitut für Parallele und Verteilte Systemede
ubs.institutFakultätsübergreifend / Sonstige Einrichtungde
ubs.publikation.seiten13de
ubs.publikation.sourceEntropy 22 (2024), No. 394de
ubs.publikation.typZeitschriftenartikelde

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
entropy-22-00394-v2.pdf
Size:
1.3 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.39 KB
Format:
Plain Text
Description: