given number of topics
Topics: distribution of terms over a fixed vocabulary
tm, for preprocessing data. reduce stock words (commonly occuring, not useful). snowballC. stemming software.
Algotihm: Clustering algo.
Self-consistency. each word assigned topics based on other words which are assigned topics.
Generative model: Each document has a probability over topics (prior of parameter is Dirichlet). Then each word is drawn from the probability distribution represented by the model, each independently.