Skip to Main Content UMKC University Libraries

Introduction to Text Data Mining

This is a beginner's guide to the principles and concepts of text data mining (TDM). TDM is the computational and statistical analysis of large corpora of texts. In this guide you'll find brief descriptions of different types of text mining, some low bar

Text Mining Overview

What is text mining?

Text mining is a research practice that involves using computers to discover information in large amounts of unstructured text.

Unstructured text is data not formatted according to an encoding structure like HTML or XML.

Examples of unstructured data used for text mining include journal and news articles, blog posts, and email

Researchers use text mining tasks such as:

  • sentiment analysis
  • named entity extraction
  • document summarization

By using these methods, researchers can make connections and draw conclusions about the content of large text corpora. 

Text Mining Goals

Text mining helps researchers detect patterns and connections in large volumes of textual material.

According to researcher Marti Hearst, "In text mining, the goal is to discover heretofore unknown information, something that no one yet knows and so could not have yet written down." Text mining enables researchers to draw conclusions from large volumes of material they would not be able to otherwise read, synthesize, and incorporate into their scholarship.

Researchers in fields ranging from biological sciences to the humanities have begun using text mining to detect patterns and discover unknown information. 

Questions? Ask Us!

If you have questions about text mining tools, techniques, or resources, reach out to UMKC Libraries Digital Scholarship Services for a research consultation.

Highlights from Our Collections