Research Guides: Introduction to Text Data Mining: Introduction to Text Data Mining

Text Mining Overview

What is text mining?

Text mining is a research practice that involves using computers to discover information in large amounts of unstructured text.

Unstructured text is data not formatted according to an encoding structure like HTML or XML.

Examples of unstructured data used for text mining include journal and news articles, blog posts, and email.

Researchers use text mining tasks such as:

sentiment analysis
named entity extraction
document summarization

By using these methods, researchers can make connections and draw conclusions about the content of large text corpora.

Text Mining Goals

Text mining helps researchers detect patterns and connections in large volumes of textual material.

According to researcher Marti Hearst, "In text mining, the goal is to discover heretofore unknown information, something that no one yet knows and so could not have yet written down." Text mining enables researchers to draw conclusions from large volumes of material they would not be able to otherwise read, synthesize, and incorporate into their scholarship.

Researchers in fields ranging from biological sciences to the humanities have begun using text mining to detect patterns and discover unknown information.

Questions? Ask Us!

If you have questions about text mining tools, techniques, or resources, reach out to UMKC Libraries Digital Scholarship Services for a research consultation.

Highlights from Our Collections

Text Mining for Information Professionals by Manika Lamba; Margam Madhusudhan
ISBN: 9783030850845
Publication Date: 2022-04-22
This book focuses on a basic theoretical framework dealing with the problems, solutions, and applications of text mining and its various facets in a very practical form of case studies, use cases, and stories. The book contains 11 chapters with 14 case studies showing 8 different text mining and visualization approaches, and 17 stories. In addition, both a website and a Github account are also maintained for the book. They contain the code, data, and notebooks for the case studies; a summary of all the stories shared by the librarians/faculty; and hyperlinks to open an interactive virtual RStudio/Jupyter Notebook environment. The interactive virtual environment runs case studies based on the R programming language for hands-on practice in the cloud without installing any software. From understanding different types and forms of data to case studies showing the application of each text mining approaches on data retrieved from various resources, this book is a must-read for all library professionals interested in text mining and its application in libraries. Additionally, this book will also be helpful to archivists, digital curators, or any other humanities and social science professionals who want to understand the basic theory behind text data, text mining, and various tools and techniques available to solve and visualize their research problems.
Macroanalysis by Matthew L. Jockers
Call Number: PN73 .J63 2013
ISBN: 9780252037528
Publication Date: 2013-04-01
In this volume, Matthew L. Jockers introduces readers to large-scale literary computing and the revolutionary potential of macroanalysis--a new approach to the study of the literary record designed for probing the digital-textual world as it exists today, in digital form and in large quantities. Using computational analysis to retrieve key words, phrases, and linguistic patterns across thousands of texts in digital libraries, researchers can draw conclusions based on quantifiable evidence regarding how literary trends are employed over time, across periods, within regions, or within demographic groups, as well as how cultural, historical, and societal linkages may bind individual authors, texts, and genres into an aggregate literary culture. Moving beyond the limitations of literary interpretation based on the "close-reading" of individual works, Jockers describes how this new method of studying large collections of digital material can help us to better understand and contextualize the individual works within those collections.
The Text Mining Handbook by Ronen Feldman; James Sanger
Call Number: QA76.9.D343 F45 2007
ISBN: 9780521836579
Publication Date: 2006-12-11
Text mining tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, this book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches. Finally, it explores current real-world, mission-critical applications of text mining and link detection in such varied fields as M&A business intelligence, genomics research and counter-terrorism activities.
Distant Horizons by Ted Underwood
Call Number: PN73 .U53 2019
ISBN: 9780226612669
Publication Date: 2019-02-14
Just as a traveler crossing a continent won't sense the curvature of the earth, one lifetime of reading can't grasp the largest patterns organizing literary history. This is the guiding premise behind Distant Horizons, which uses the scope of data newly available to us through digital libraries to tackle previously elusive questions about literature. Ted Underwood shows how digital archives and statistical tools, rather than reducing words to numbers (as is often feared), can deepen our understanding of issues that have always been central to humanistic inquiry. Without denying the usefulness of time-honored approaches like close reading, narratology, or genre studies, Underwood argues that we also need to read the larger arcs of literary change that have remained hidden from us by their sheer scale. Using both close and distant reading to trace the differentiation of genres, transformation of gender roles, and surprising persistence of aesthetic judgment, Underwood shows how digital methods can bring into focus the larger landscape of literary history and add to the beauty and complexity we value in literature.

Introduction to Text Data Mining

Text Mining Overview

What is text mining?

Text Mining Goals

Questions? Ask Us!

Highlights from Our Collections

Miller Nichols Library

Giving to the Libraries