Supercomputer predicts revolution

Cain · September 11, 2011, 03:58:46 PM

Fo' realz

QuoteFeeding a supercomputer with news stories could help predict major world events, according to US research.

A study, based on millions of articles, charted deteriorating national sentiment ahead of the recent revolutions in Libya and Egypt.

While the analysis was carried out retrospectively, scientists say the same processes could be used to anticipate upcoming conflict.

The system also picked up early clues about Osama Bin Laden's location.

Kalev Leetaru, from the University of Illinois' Institute for Computing in the Humanities, Arts and Social Science, presented his findings in the journal First Monday.
Mood and location

The study's information was taken from a range of sources including the US government-run Open Source Centre and BBC Monitoring, both of which monitor local media output around the world.

News outlets which published online versions were also analysed, as was the New York Times' archive, going back to 1945.

In total, Mr Leetaru gathered more than 100 million articles.

Reports were analysed for two main types of information: mood - whether the article represented good news or bad news, and location - where events were happening and the location of other participants in the story.

http://www.bbc.co.uk/news/technology-14841018

Not amazingly, this device has outperformed US intelligence agencies already.

rong · September 11, 2011, 04:12:45 PM

law of fives?

Salty · September 11, 2011, 06:47:24 PM

Yes but was it fed gossip rags?
That seems to be a major influence on people's view of the world and I wonder how such a machine would react to it.

Triple Zero · September 11, 2011, 08:27:53 PM

from what I read about it, (same article btw)

the revolutions were picked up by peaks in sentiment analysis (basically counting +1 for positive emotional words and -1 for negative) on articles for certain topics.

the location of Bin Laden was sorta/kinda estimated by tracking locations of mentions of his name, or something unclear. the scientific article explains their method in more detail, but I'm skeptical.

it's a lot of very nice data analysis based on huge quantities of news though:
http://contentanalysis.ichass.illinois.edu/Culturomics20/nyt-movie-1000x1000.gif

but really the Arab Spring and Bin Laden capture are the only two news events they mention.

so yeah, it's kind of law-of-five / hindsight bias.

I bet they researched a couple more media events that didn't show significant correlation.

Or maybe if they didn't, notice how one was found through a peak in a (rather noisy) graph of sentiment analysis, which is quite tweakable in keywords it looks for btw. Not to mention I wonder if there's been done much research on non-English sentiment analysis.

While the second one was found through an entirely different method, namely geospatial drawing of curves of beautifully anti-aliased lines. A technique known for centuries in the science of drawing astrological birth-charts.

I'm going to believe it when they actually make predictions for the future, so it can be tested with the scientific method.

Ok with the Arab Spring, let's say it worked, and it actually holds prediction value, the most information this graph peak could give them is that they could have said "something's going to happen soon in Libya or Arabia [or any of the other ME countries they found this sentiment peak]" -- not very informative since "something" happens there every other week or so.
And I really doubt they could drill down into these graphs to get any clearer view of what this "something" could be beforehand, because the graphs are noisy as they are when you ask them broad questions, they're not going to get much more accurate if you'd dive in to specifics.

One other thing that shows they haven't actually found very much, is that if their news analysis could have made very accurate future predictions, we wouldn't be looking at noisy graphs with thick blue lines, where you sort of have to guess which/when is the significant "indicator peak" or whether it's part of an upwards or downwards trend, in the same way you cannot do with stock market values.

Instead, they'd have gotten a result that said "if sentiment indicator X goes over threshold value Y it means an important event is about to happen" with a big red arrow on the timeline where this happens and what event it turned out to predict. Then given this formula, they take a database of important geopolitical events that they do NOT get to cherry-pick, and then they can claim "in 77% of the events we got a big red arrow shortly before, additionally we got 44% big red arrows when nothing happened"--another doubtful thingy that makes we a skeptic: no mention about false positives.

BTW is it me or do those blue lines look a lot like Terence McKenna's "Timewave Zero" plots, based on the I-Ching and also supposed to predict, well, the 2012 singularity, among other things:
http://en.wikipedia.org/wiki/File:Timewave_9_11_2001.png

And for finding Bin Laden, if they had any useful result, they could have said "we loaded all the geospatial news data into the algorithm and processed it in a complicated manner" and the output result should have been a sort of heatmap with colours indicating different levels of probability of being Bin Laden's location, not an incomprehensible cluster of curved lines from which you can divine patterns.

I want to know why they are lines and not dots btw, doesn't make sense.

Eh, just as I feared:

QuoteEach city or other geographic landmark (such as islands, oceans, mountains, rivers, etc) is color–coded on a 400–point scale from bright green (high positivity) to bright red (high negativity), based on the average tone of all articles mentioning that city in 2005. Each article mentioning two or more cities together results in a link being drawn between those cities, and the average tone of all articles mentioning both cities is used to color–code that link on the same color scale as the cities.

yeah do that with large datasets and you'll get useless scratchy drawings like those. especially cause some article mentioning both New York and Moscow draws a line all the way over Europe, even though the article in question probably doesn't contain any information relevant to Europe.

additionally, if your dataset grows large enough, the rough shape of the sketchy blob of lines is going to approximate the convex hull of your location points.
there's some tricks to fix that, among other things, you can let the proximity of locations be determined by the amount of correlation, not by their geographical position, and you get a sort of "news map" of world locations. there are a few graphs in the paper that do this, but they appear to have been cited from other publications, but that's kind of the direction I'd expect more useful results.

sorry for the rambling tone of this post, it's a bit of a braindump.

rong · September 11, 2011, 10:40:30 PM

Quotesorry for the rambling tone of this post, it's a bit of a braindump

if i had a super computer, i'd develop an algorithm to divine the meaning out of your post.

edit: incidentally, a coworker (who pronounces it "nucular") claimed that, a few weeks before the tragic event happened 10 years ago, he had a feeling that something very bad was going to happen soon. he also told me an ex-girlfriend called him up that day to tell him he was right. i find it harder to believe he actually had an ex-girlfriend.

Iason Ouabache · September 12, 2011, 10:44:26 AM

PATTERN SEEKING MACHINE FIND SIGNAL IN NOISE! FILM AT 11!!!
\

Please to note that the super computer is still relying on humans to produce the noise. I'd like some rigorous tests to see if it can make predictions that are better than humans and random chance.

Faust · September 12, 2011, 11:02:42 AM

I'd like to see how many other predictions it has made and how many of them are wrong before I suspend my disbelief.

The Johnny · September 12, 2011, 12:27:11 PM

predicts in hindsight

Triple Zero · September 12, 2011, 01:13:39 PM

Joh'Nyx answer is shorter, but yeah:

Quote from: Faust on September 12, 2011, 11:02:42 AM
I'd like to see how many other predictions it has made and how many of them are wrong before I suspend my disbelief.

The answer is zero. They only found patterns in the data they claim might have predictive value.

They didn't actually predict anything.

The proper way to go about this, as per standard Machine Learning procedure, would be to divide your dataset into a training and a testset, figure out the predictive algorithm (aka training) using only the training set, and then to measure its accuracy using the test set.

That way you get a complete picture of your False Positives, False Negatives, True Positives and True Negatives.

The scientific article linked in the BBC article really only mentions True Positives, which is quite meaningless, and in that aspect, quite similar to divination systems such as astrology, I-Ching and Tarot.

Now I'll be the first to admit that Tarot and related systems can have the advantage of making you look at the problems from novel and different angles and may make you learn something you hadn't thought of before.

Additionally this particular system has the property that the input data actually is somewhat vaguely related to the problem domain (most divination systems have randomness as their data source). Unfortunately this is both an advantage as well as a dangerous disadvantage:
- The advantage is of course that, however unlikely, there might be some clues in the general "sentiment" of global news media to predict future political events.
- The dangerous disadvantage is the Law of Fives: to paraphrase "You'll find the correlation between global news sentiment and future events to become more and more manifest, the harder you look."[5], you're going to see peaks and dips in your pretty sentiment graphs, and because you know your data must be somehow correlated with the events, you're more likely to look real hard for co-incidences.

And finally, you want to throw some decent proven effective Machine Learning algorithms at this problem, not just some basic Computational Linguistics and manual data inspection. Well, maybe that first to get a feel for what you're looking at, but if you want to find serious, non-obvious and measurably verifiable correlations, get an ML algo to do it. For some reason I'm thinking of Restricted Boltzmann Machines / Deep Learning, because it's very good at unsupervised learning with unlabeled or partially labeled data and is quite good at unaided feature-discovery, for example at the lower levels it could probably discover positive and negative sentiments on its own (as opposed to it being hardcoded like this research), at least, as long as positive/negative sentiment analysis is actually relevant, something which this research has also only given a hand-wavy intuitive justification for.

Example of what RBMs can do, Google TechTalks:
http://www.youtube.com/watch?v=AyzOUbkUf3M&feature=player_detailpage#t=1892s
(short explanation, at about 31m30s in he starts explaining how RBMs can work with documents, starts out sorta technical, but keep listening, at 33:20 he shows documents classified using Latent Semantic Analysis and PCA, which is a bit messy, and at 33:30 he shows what an RBM can do with it, which has the documents separated really nicely. After that he continues to explain many other awesome things that aren't really relevant, but it's pretty cool stuff and he's a damn good speaker, so you might enjoy it)

The Johnny · September 13, 2011, 11:53:40 AM

Problem being "How can we predict an event based on discourse?" branches out to a series of implicit suppositions:

1. Discourse reflects inner thoughts and sentiment
2. Discourse transforms into action
3. The discourse that is analyzed reflects the general populations thoughts/sentiments

1. is a given that i dont have a problem with when the subject has no reason to mislead: is there a reason to mislead or special interests in articles?

2. This is true in some cases, but its limited by the all the macro interactions such as economics, military, group cohesion, etcetera etcetera. So no, discourse does not always transform into action, and given the case that it does transform into action, there is no sure way of telling into which type of action it will be. For example, given a case of a kleptocracy ruling a country, does the population rise up and riots ensue? or does it cater to the kleptocracy in hopes of gaining benefits?

3. Discourse in news is the voice of the elites and is usually slanted in opinion, and in many cases it ignores important news that are deemed unfavorauble by their overlord.

So, no, the chosen method for this problem does not work. What COULD work is semi-structured interviews revolving around loyalty, desire for change, willingness to change, paired with an analysis of cultural, economic, militar tendencies of a given population.

Principia Discordia

News:

Supercomputer predicts revolution

Cain

rong

Salty

Triple Zero

rong

Iason Ouabache

Faust

The Johnny

Triple Zero

The Johnny