Word Frequency Analysis

I haven’t blogged for a while – I guess I’ve not really found anything worthwhile to say.

I’m currently preparing the third draft of the sequel to The Glittering Cage, and going through various stages of angst about form and structure. In particular, I’m obsessing on over-use of favourite words and those little beggars that seem to insinuate themselves into the text without you noticing: adverbs. So I wanted to find out just how often I use certain words.

You may be familiar with Wordle, or at least seen the strange graphics popping up here and there on the web. If you’re not, go and have a look… http://www.wordle.net/

I forced all 130k words of the novel down its throat, and it didn’t even hiccup. The visual map it produces shows which words I used the most, emphasising the most frequent by increasing their size. By default, it removes common English words such as, the, and, to, etc. There is a setting to include them if you want.

Book2 Wordle

This is fascinating stuff. As I’d expected, my main character Samanta is by far the largest word. What’s a little worrying is how prominent words such as; eyes, away, like, head and hand are. This suggests a penchant to convey emotion and interaction via the eyes and head movements, probably using too many similes (hence the prominence of the word ‘like’).

For confirmation, I did the same thing for the text of The Glittering Cage. Yep, I’m big on eyes in that one too, as well as, back, head and like. You could say this is stylistic, but less kindly and perhaps more honestly, it’s lazy and unimaginative. But now I know about it, I could probably address it.

Book 1 wordle

Even more interesting was a feature in Wordle I’d not noticed before. At the top of the word cloud, there are some drop down menus. Under Language, right at the bottom is a function called Show Word Counts. This gives a table of every word used and the exact number of instances. Use ctrl+A to select all, then copy and paste it into Excel. Now using sorting and filtering, you can really get to know how you use the language – and possible spot where you can improve your use of it.

But wait, there’s more!!! (Hmm, I used 62 exclamation marks in the book – which is 62 too many)

I found this link on the Wordle forum pages, (http://holme.se/stem/ )which lets you look at stem words and block others in the analysis. By removing character names, you can then clarify in great detail, the way you choose words, which has got to be a useful insight. You can even feed those filter results back into a Wordle word map.