Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Mining

What is text and data mining?

Text and data mining (TDM) is the computational analysis of vast quantities of digital information, whether free-form natural language text or structured data. 

Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns. 

Materials to be analyzed range from websites (such as publicly available Facebook posts), 16th C. manuscripts, DNA sequences, to old newspapers.

Image of a graphic analysis, constructed using Voyant, of the frequency of terms in the novel, Agnes Grey, by Charlotte Bronte.

This is a graphic analysis, constructed using Voyant, of the frequency of terms in the novel, Agnes Grey, by Charlotte Bronte.

Policies for Mining Licensed Content

If you wish to undertake a text or data mining project with content from the Libraries’ licensed databases, please contact a librarian to investigate options, which may include negotiating with the vendor or purchasing access to the data. Although many database licenses prohibit text and data mining and the use of software such as scripts, agents, or robots, it is possible to actively negotiate text mining rights with database vendors. Unauthorized text or data mining in violation of our licenses can result in loss of access for the entire NCU community.

Please also see our Best Practice Tips for mining licensed databases.

Lit & News Feed

Loading ...

Twitter feed

Learn More

NCU Library Home

ASC Home

CTL Home

IRB Home

DSE Home

ADE Home

JFK Resource Home