Skip to content
/ PyLDA Public

A Latent Dirichlet Allocation implementation in Python.

Notifications You must be signed in to change notification settings

kzhai/PyLDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Zhai, KeZhai, Ke
Zhai, Ke
and
Zhai, Ke
Mar 24, 2019
9b6899e · Mar 24, 2019
Mar 24, 2019
Mar 24, 2019
Jul 10, 2016
Sep 23, 2015
May 18, 2018
Sep 23, 2015
Sep 24, 2015
May 8, 2017
Apr 23, 2017
Sep 23, 2015
Apr 23, 2017
Apr 23, 2017
Apr 23, 2017

Repository files navigation

PyLDA

PyLDA is a Latent Dirichlet Allocation topic modeling package, developed by the Cloud Computing Research Team in University of Maryland, College Park.

Please download the latest version from our GitHub repository.

Please send any bugs of problems to Ke Zhai (kzhai@umd.edu).

Install and Build

This package depends on many external python libraries, such as numpy, scipy and nltk.

Launch and Execute

Assume the PyLDA package is downloaded under directory $PROJECT_SPACE/src/, i.e.,

$PROJECT_SPACE/src/PyLDA

To prepare the example dataset,

tar zxvf associated-press.tar.gz

To launch PyLDA, first redirect to the directory of PyLDA source code,

cd $PROJECT_SPACE/src/PyLDA

and run the following command on example dataset,

python -m launch_train --input_directory=./associated-press --output_directory=./ --number_of_topics=10 --training_iterations=100

The generic argument to run PyLDA is

python -m launch_train --input_directory=$INPUT_DIRECTORY/$CORPUS_NAME --output_directory=$OUTPUT_DIRECTORY --number_of_topics=$NUMBER_OF_TOPICS --training_iterations=$NUMBER_OF_ITERATIONS

You should be able to find the output at directory $OUTPUT_DIRECTORY/$CORPUS_NAME.

Under any circumstances, you may also get help information and usage hints by running the following command

python -m launch_train --help