I ran these textual analyses on my complete, scrubbed copy text. In order to scrub the text, I needed a stoplist. I started with a basic English stoplist from the Journal of Machine Learning Research. I then ran a simple analysis of my entire text on Textalyser, which reports statistics such as word frequency. I added high frequency Scots words to the English stoplist to create a custom list of words to ignore during analysis.
I then scrubbed the text using Lexos. The scrub settings removed all punctuation (retaining hyphens and possessive apostrophes), removed digits, and made all the text lowercase. It also applied the stoplist to cut out many common words. Here are the differences between the scrubbed and unscrubbed first lines of text:
I then scrubbed the text using Lexos. The scrub settings removed all punctuation (retaining hyphens and possessive apostrophes), removed digits, and made all the text lowercase. It also applied the stoplist to cut out many common words. Here are the differences between the scrubbed and unscrubbed first lines of text:
THE AMBITIOUS MITE.
A FABLE. ___ WHEN hope persuades, and fame inspires us, And pride with warm ambition fires us, Let Reason instant sieze the bridle, And wrest us frae the Passions’ guidal; Else, like the Hero of our fable, We’ll aft be plung’d into a habble. |
ambitious mite
fable hope persuades fame inspires pride warm ambition fires reason instant sieze bridle wrest frae passions guidal hero fable aft plungd habble |
While using a scrubbed text is unsuitable for some uses, such as total text word frequency analysis, it makes it much easier to use in most visualization tools. Many of these tools include their own stoplists or text scrubbers, but doing it yourself allows you to control output between different tools.
Analyses and Visualizations
Voyant Tools: access entire corpus, split into 97 separate poems and songs (uncleaned): http://voyant-tools.org/?corpus=1428033701363.9614
Voyant Tools - Cirrus Word Cloud Generator
Voyant Tools - Word Frequency
|
Voyant Tools - Cirrus Summary
TagCrowd Word Visualization
ane (20) auld (27) bard (21) bonny (31) burns (24) coggie (19) dark (22) dear (34) een (18) eer (19) eye (22) fame (26) friend (31) green (20) hae (23) heart (49) hes (24) ill (33) joy (32) lang (19) life (33) lines (22) love (31) man (48) mcnl (22) mind (21) muse (20) nature (36) neer (19) night (24) poor (43) song (25) soul (29) speak (19) sweet (26) tear (28) tho (31) thought (19) thro (21) till (33) tis (28) true (19) twas (24) warm (19) wha (28) wild (19) winter (18) wood (27) worth (37) yon (32) created at TagCrowd.com
|
Textexture Network Visualization - Scrubbed Text
Nodes (Words): 100, Edges (Co-Occurrences): 892.
Most influential keywords in this text: oer dear nature man Most influential contexts in this text: #0: oer frae sweet bonny #1: dear worth bard burn #2: nature man poor winter #3: life dark lay lie http://textexture.com/index.php?text_id=25223 |
Textexture Network Visualization - Unscrubbed Text
Nodes (Words): 100, Edges (Co-Occurrences): 919.
Most influential keywords in this text: o thy ye sae filter: off Most influential contexts in this text: #0: o ye winter wood #1: thy thou oer social #2: sae sweet bonny frae #3: joy tear nature wild http://textexture.com/index.php?text_id=25225 |
Tools Used
Future Use: