`nimare.annotate.lda`.LDAModel

class LDAModel(n_topics, max_iter=1000, alpha=None, beta=0.001, text_column='abstract')[source]

Bases: NiMAREBase

Generate a latent Dirichlet allocation (LDA) topic model.

This class is a light wrapper around scikit-learn tools for tokenization and LDA.

Parameters

n_topics (int) – Number of topics for topic model. This corresponds to the model’s n_components parameter. Must be an integer >= 1.
max_iter (int, optional) – Maximum number of iterations to use during model fitting. Default = 1000.
alpha (float or None, optional) – The alpha value for the model. This corresponds to the model’s doc_topic_prior parameter. Default is None, which evaluates to 1 / n_topics, as was used in Poldrack et al.1.
beta (float or None, optional) – The beta value for the model. This corresponds to the model’s topic_word_prior parameter. If None, it evaluates to 1 / n_topics. Default is 0.001, which was used in Poldrack et al.1.
text_column (str, optional) – The source of text to use for the model. This should correspond to an existing column in the texts attribute. Default is “abstract”.

Variables

model (LatentDirichletAllocation) –

Notes

Latent Dirichlet allocation was first developed in Blei et al.2, and was first applied to neuroimaging articles in Poldrack et al.1.

References

1(1,2,3): Russell A Poldrack, Jeanette A Mumford, Tom Schonberg, Donald Kalar, Bishal Barman, and Tal Yarkoni. Discovering relations between mind, brain, and mental disorders using topic mapping. PLOS Computational Biology, 2012. URL: https://doi.org/10.1371/journal.pcbi.1002707, doi:10.1371/journal.pcbi.1002707.
2: David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003. URL: https://dl.acm.org/doi/10.5555/944919.944937.

See also

CountVectorizer: Used to build a vocabulary of terms and their associated counts from texts in the self.text_column of the Dataset’s texts attribute.
LatentDirichletAllocation: Used to train the LDA model.

Methods

`fit`(dset)	Fit the LDA topic model to text from a Dataset.
`get_params`([deep])	Get parameters for this estimator.
`load`(filename[, compressed])	Load a pickled class instance from file.
`save`(filename[, compress])	Pickle the class instance to the provided file.
`set_params`(**params)	Set the parameters of this estimator.

fit(dset)[source]

Fit the LDA topic model to text from a Dataset.

Parameters

dset (Dataset) – A Dataset with, at minimum, text available in the self.text_column column of its texts attribute.

Returns

dset – A new Dataset with an updated annotations attribute.

Return type

Dataset

Variables

distributions (dict) –

A dictionary containing additional distributions produced by the model, including:

p_topic_g_word: numpy.ndarray of shape (n_topics, n_tokens) containing the topic-term weights for the model.
p_topic_g_word_df: pandas.DataFrame of shape (n_topics, n_tokens) containing the topic-term weights for the model.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters: deep (bool, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters

filename (str) – Name of file containing object.
compressed (bool, optional) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.

Returns

obj – Loaded class object.

Return type

class object

save(filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters

filename (str) – File to which object will be saved.
compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type: self

nimare.annotate.lda.LDAModel

`nimare.annotate.lda`.LDAModel