Text is everywhere in research — from interview transcripts and historical documents to clinical notes and scientific publications. Whether you’re cleaning up survey responses or building generative models, Research Computing supports a wide range of text analysis workflows across disciplines.
What We Support
We offer guidance at every level of the text analysis pipeline:
Text Cleaning & Preprocessing: Turning unstructured or messy text into structured data for analysis
Named Entity Recognition (NER) and keyword extraction for tagging people, places, concepts, or chemicals in your corpus
Topic Modeling & Clustering: Discovering patterns in large text collections with LDA, NMF, or BERTopic
Text Visualization: Tools like word clouds, concordance plots, topic projections, and term frequency charts
Text Classification: Supervised workflows for coding documents, sentiment analysis, and category prediction
Embeddings & Representation: Word2Vec, BERT, and other vector-based approaches for semantic similarity and clustering
Generative AI & LLMs: Use of models like ChatGPT for summarization, rephrasing, translation, or question answering in domain-specific contexts
Who We Work With
Our text analysis clients span the humanities, social sciences, and biomedical research:
History, Literature & Digital Humanities: Thematic analysis, archival text mining, authorship attribution, and document exploration
Sociology & Political Science: Analyzing open-ended survey responses, policy documents, interviews, and speeches
Clinical & Biomedical Sciences: De-identifying and analyzing clinical notes, extracting information from medical records