cmu.textstat

The cmu.textstat package is for use in the 36-468/668 course (Special Topics in Statistics & Data Science) at Carnegie Mellon University.

Installing cmu.textstat

Use devtools to install the package.

devtools::install_github("browndw/cmu.textstat")

Running cmu.textstat

The package itself serves primarily as wrapper for 4 other packages, which will install automatically when you install cmu.textstat:

mda.biber
ngramr.plus
quanteda.extras
vnc

The documentation also includes a description of pseudobibeR. That package needs to be installed separately. It is included here as it is one way to generate data for the mda.biber functions.

When you load the cmu.textstat library, those 4 other packages will attach, giving you access to all of their functions.

library(cmu.textstat)

The main functions in the packages associated with cmu.textstat are described in the table below. The functions are designed to facilitate the analysis of textual data, assisting in the exploration of questions related to language variation and change, language use, and language structure.

Many of the functions (though not all) were written to support the processes and procedures described by Brezina in the required textbook, and replicate many of his web-based statistics tools in an R environment.

Many of the functions are designed to be used at the end of a processing pipeline. For our class, we will laregly rely on tidyverse packages and quanteda for pre-processing, corpus creation, and tokenization.

Functions

cmu.textstat functions
- from_play

Data

cmu.textstat data

Packages

Labs

Labs Overview
- Labs

cmu.textstat

Installing cmu.textstat

Running cmu.textstat

Functions

Data

Indices and tables