Analyzing Textual Information
SAGE Publications, Inc.
Listing of Website Resources
We have used two basic copyrighted databases in this book, one derived from the Congressional Globe for the 39th Congress, and the other derived from several volumes of the Territorial Papers of the United States. These databases are not public domain. We have copyrighted them because we have spent many years building them into a usable form for this book and other projects.
You may only access the 39th Congress files, combine39.txt and PrelimData.RData, from our website for use with this book. You may not transfer, give, sell, or publish this database, or parts of this database, in any form. If you wish to use the database for publication, or for any other use, please seek written permission from Lea VanderVelde, Director of the RAOS project, at Lfirstname.lastname@example.org. The Territorial papers database (datacomb.csv) is also under copyright and currently not available for use.
- Chapter 1: Introduction
- Chapter 2: A Description of the Studied Text Corpora and a Discussion of Our Modeling Strategy
- Chapter 3: Preparing Text for Analysis: Text Cleaning and Formatting
- Chapter 4: Word Distributions: Document-Term Matrices of Word Frequencies and the “Bag of Words” Representation
- Chapter 5: Metavariables and the Text Analysis Stratified on Metavariables
- Chapter 6: Sentiment Analysis
- Chapter 7: Clustering of Documents
- Chapter 8: Classification of Documents
- Chapter 9: Modeling Text Data: Topic Models
- Chapter 10: n-Grams and Other Ways of Analyzing Adjacent Words
- Chapter 11: Concluding Remarks