Welcome to TPME

Topic Models are a suite of probabilistic algorithms able to automatically classify a corpus of natural language documents within a set of categories (topics). Each topic is represented by its most relevant words in the vocabulary of the corpus. Furthermore, each document in the corpus is assigned to a mixture of topics which summarize its content. This application has the goal to validate topic models through two user tests.

Word intrusion test

This test aims to validate the semantic coherence of the topics created by the algorithm. In the word intrusion task we show you all the topics generated by the model. Each topic is represented through N words in a horizontal box: (N – 1) words are the most representative for that topic, while 1 is an intruder. Your task is to find, for each topic, the word intruder, or that word which seems to have less semantic coherence with respect to the others in the box.

Topic intrusion test

This test aims to check whether a document is well-represented by its representative topics. In the topic intrusion task we show you D documents, randomly sampled from the whole corpus. For each document you will see, the content of that document together with M topics (represented by their relevant words): (M – 1) of them are the most relevant for that specific document and 1 is an intruder. Your task is to read the document, grasp its content, and find which is the topic intruder, or the one that is less related with that document.

After carefully reading these instructions, please go ahead and find some further explanatory examples about the two tasks you are going to perform.