Skip to Main Content UMKC University Libraries

Introduction to Text Data Mining

This is a beginner's guide to the principles and concepts of text data mining (TDM). TDM is the computational and statistical analysis of large corpora of texts. In this guide you'll find brief descriptions of different types of text mining, some low bar

Machine Learning Tools

Topic Modeling

Software

Visit the IS Lab Software page to see what software is available in campus labs.

  • R (also available for free online)
  • Python (also available for free online)
Online Tools

Natural Language Processing

Software
  • Python (also available for free online)
Online Tools

Stylometry

  • stylo: Stylometric Multivariate Analyses is an R package to perform various analyses in the field of computational stylistics, authorship attribution, etc.

Network and Citation Analysis Tools

Software

Visit the IS Lab Software page to see what software is available in campus labs.

  • Gephi (network analysis - also available online)
Online Tools

Qualitative Mark-Up and Annotation

Software
  • ATLAS.ti 
  • NVivo 
Online Tools

Text Data Visualization Tools

Software

Visit the IS Lab Software page to see what software is available in campus labs.

  • ATLAS.ti 
  • NVivo 
  • R (also available for free online)
  • Gephi (also available online)
Online Tools

Word Frequency Analysis Tools

Software

Visit the IS Lab Software page to see what software is available in campus labs.

  • ATLAS.ti
  • NVivo 
  • R
  • SAS Text Miner 
  • Python
Online Tools