Principia Discordia

Principia Discordia => Techmology and Scientism => Topic started by: Rococo Modem Basilisk on March 26, 2010, 08:18:54 PM

Title: ATTN 000: information theory nerdspaggery
Post by: Rococo Modem Basilisk on March 26, 2010, 08:18:54 PM
I wrote a short python script to generate a first order markov model of a document (tokenizes by whitespace), and after each token use the equation i=log2(anbn-1/an-1bn) to figure out the difference in information from one token to the next in all of the token pairs in the model.

I ran it on the first 4539 words in the phrack archives, and used google docs to graph it:
(http://namcub.accela-labs.com/pics/information_content_per_word_in_phrack-first_4500_words.png) (http://namcub.accela-labs.com/pics/information_content_per_word_in_phrack-first_4500_words.png)

Edit: Whoops! I forgot to mention: I model the HMM in terms of ratios a:b, where a is the frequency of a given pair, and b is the frequency of all pairs with the same first token. an would be the frequency of the pair after the current token has been added, and an-1 would be its frequency before it has been added.

I can post the code up in a bit.
Edit: here it is (http://namcub.accela-labs.com/stories/markinf.py)
Title: Re: ATTN 000: information theory nerdspaggery
Post by: Triple Zero on March 26, 2010, 10:42:41 PM
I saw you post that image on twitter, wondered what it was about.

I have alcohol in my head right now, I will check this out later.