Research Guides: Digital Scholarship Services: Text Data Mining

Text Mining Overview

What is text mining?

Text mining is a research practice that involves using computers to discover information in large amounts of unstructured text.

Unstructured text is data not formatted according to an encoding structure like HTML or XML.

Examples of unstructured data used for text mining include journal and news articles, blog posts, and email.

Researchers use text mining tasks such as:

sentiment analysis
named entity extraction
document summarization

By using these methods, researchers can make connections and draw conclusions about the content of large text corpora.

Introduction to Text Data Mining LibGuide
Visit this guide for more information on tools and resources for text data minining.

Questions? Ask Us!

If you have questions about text mining tools, techniques, or resources, reach out to UMKC Libraries Digital Scholarship Services for a research consultation.

Sources for Text Data

Here are some sources where you can find text data. If you need text data from one of the Library's subscription databases, please email umkcdss@umkc.edu and we can help you get started.

Data Is Plural
A curated newsletter of interesting open data sources, including text.
HathiTrust Digital Library
HathiTrust has >18 million books digitized with a suite of text mining tools in HathiTrust Research Center Analytics. Log in with your UM System ID.
Project Gutenberg
Project Gutenberg is a library of over 70,000 free eBooks.

more... less...

The TidyText R package draws from Project Gutenberg texts.
U.S. Government Data Repositories

University Libraries

Digital Scholarship Services

Text Mining Overview

What is text mining?

Questions? Ask Us!

Sources for Text Data

Miller Nichols Library

Giving to the Libraries