Text analysis often relies on machine learning, a branch of computer science that trains computers to recognize patterns. There are two kinds of machine learning used in text analysis: supervised learning, where a human helps to train the pattern-detecting model, and unsupervised learning, where the computer finds patterns in text with little human intervention. An example of supervised learning is Naive Bayes Classification. See Natural Language Processing and Topic Modeling for examples of unsupervised machine learning.
Example Project Using Classification (Supervised Machine Learning):
- Horton, R., Morrissey, R., Olsen, M., Roe, G., Voyer, R. (2009). Mining Eighteenth Century Ontologies: Machine Learning and Knowledge Classification in the Encyclopédie. Digital Humanities Quarterly, vol. (3)2. Retrieved from http://www.digitalhumanities.org/dhq/vol/3/2/000044/000044.html.
Topic Modeling
Topic modeling, a form of machine learning, is a way of identifying patterns and themes in a body of text. Topic modeling is done by statistical algorithms, such as Latent Dirichlet Allocation, which groups words into "topics" based on which words frequently co-occur in a text.
Example Project using Topic Modeling:
- Mendenhall, R., Brown, N., Black, M., Van Moer, M., Lourentzou, I., Flynn, K., McKee, M., Zerai, A. (2016). Rescuing lost history: Using big data to recover black women's lived experiences. In Proceedings of XSEDE 2016: Diversity, Big Data, and Science at Scale (Vol. 17-21-July-2016). https://doi.org/10.1145/2949550.2949642.
Natural Language Processing
Natural language processing, a kind of machine learning, is the attempt to use computational methods to extract meaning from free text. Among other things, natural language processing algorithms can derive names of people and places, dates, sentiment, and parts of speech.
Example Project using Natural Language Processing: