nimare.annotate.lda.LDAModel
- class LDAModel(n_topics, max_iter=1000, alpha=None, beta=0.001, text_column='abstract')[source]
Bases:
NiMAREBaseGenerate a latent Dirichlet allocation (LDA) topic model.
This class is a light wrapper around scikit-learn tools for tokenization and LDA.
- Parameters
n_topics (
int) – Number of topics for topic model. This corresponds to the model’sn_componentsparameter. Must be an integer >= 1.max_iter (
int, optional) – Maximum number of iterations to use during model fitting. Default = 1000.alpha (
floator None, optional) – Thealphavalue for the model. This corresponds to the model’sdoc_topic_priorparameter. Default is None, which evaluates to1 / n_topics, as was used in Poldrack et al.1.beta (
floator None, optional) – Thebetavalue for the model. This corresponds to the model’stopic_word_priorparameter. If None, it evaluates to1 / n_topics. Default is 0.001, which was used in Poldrack et al.1.text_column (
str, optional) – The source of text to use for the model. This should correspond to an existing column in thetextsattribute. Default is “abstract”.
- Variables
model (
LatentDirichletAllocation) –
Notes
Latent Dirichlet allocation was first developed in Blei et al.2, and was first applied to neuroimaging articles in Poldrack et al.1.
References
- 1(1,2,3)
Russell A Poldrack, Jeanette A Mumford, Tom Schonberg, Donald Kalar, Bishal Barman, and Tal Yarkoni. Discovering relations between mind, brain, and mental disorders using topic mapping. PLOS Computational Biology, 2012. URL: https://doi.org/10.1371/journal.pcbi.1002707, doi:10.1371/journal.pcbi.1002707.
- 2
David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003. URL: https://dl.acm.org/doi/10.5555/944919.944937.
See also
CountVectorizerUsed to build a vocabulary of terms and their associated counts from texts in the
self.text_columnof the Dataset’stextsattribute.LatentDirichletAllocationUsed to train the LDA model.
Methods
fit(dset)Fit the LDA topic model to text from a Dataset.
get_params([deep])Get parameters for this estimator.
load(filename[, compressed])Load a pickled class instance from file.
save(filename[, compress])Pickle the class instance to the provided file.
set_params(**params)Set the parameters of this estimator.
- fit(dset)[source]
Fit the LDA topic model to text from a Dataset.
- Parameters
dset (
Dataset) – A Dataset with, at minimum, text available in theself.text_columncolumn of itstextsattribute.- Returns
dset – A new Dataset with an updated
annotationsattribute.- Return type
- Variables
distributions (
dict) –A dictionary containing additional distributions produced by the model, including:
p_topic_g_word:numpy.ndarrayof shape (n_topics, n_tokens) containing the topic-term weights for the model.p_topic_g_word_df:pandas.DataFrameof shape (n_topics, n_tokens) containing the topic-term weights for the model.
- classmethod load(filename, compressed=True)[source]
Load a pickled class instance from file.
- Parameters
- Returns
obj – Loaded class object.
- Return type
class object
- set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.- Return type
self