News:

All you can say in this site's defence is that it, rather than reality, occupies the warped minds of some of the planet's most twisted people; gods know what they would get up to if it wasn't here.  In these arguably insane times, any lessening or attenuation of madness is maybe something to be thankful for.

Main Menu

Michael Hastings

Started by Cain, June 20, 2013, 12:36:08 AM

Previous topic - Next topic

Mesozoic Mister Nigel

You negative nellies just need to turn those frowns upside down and look at the bright side! Things aren't so bad, it's all about your attitude.
"I'm guessing it was January 2007, a meeting in Bethesda, we got a bag of bees and just started smashing them on the desk," Charles Wick said. "It was very complicated."


The Good Reverend Roger

Quote from: FOCUS GROUP RAGEMONKEY OF HATE HATE HATE on August 23, 2013, 11:03:56 PM
You negative nellies just need to turn those frowns upside down and look at the bright side! Things aren't so bad, it's all about your attitude.

I'm in danger of having a thought, here.
" It's just that Depeche Mode were a bunch of optimistic loveburgers."
- TGRR, shaming himself forever, 7/8/2017

"Billy, when I say that ethics is our number one priority and safety is also our number one priority, you should take that to mean exactly what I said. Also quality. That's our number one priority as well. Don't look at me that way, you're in the corporate world now and this is how it works."
- TGRR, raising the bar at work.

Mesozoic Mister Nigel

Quote from: The Good Reverend Roger on August 23, 2013, 11:39:44 PM
Quote from: FOCUS GROUP RAGEMONKEY OF HATE HATE HATE on August 23, 2013, 11:03:56 PM
You negative nellies just need to turn those frowns upside down and look at the bright side! Things aren't so bad, it's all about your attitude.

I'm in danger of having a thought, here.

Hey hey hey now big guy, slow down! That kind of stuff is for policy-makers, not regular folks like you.
"I'm guessing it was January 2007, a meeting in Bethesda, we got a bag of bees and just started smashing them on the desk," Charles Wick said. "It was very complicated."


The Good Reverend Roger

Quote from: FOCUS GROUP RAGEMONKEY OF HATE HATE HATE on August 23, 2013, 11:47:07 PM
Quote from: The Good Reverend Roger on August 23, 2013, 11:39:44 PM
Quote from: FOCUS GROUP RAGEMONKEY OF HATE HATE HATE on August 23, 2013, 11:03:56 PM
You negative nellies just need to turn those frowns upside down and look at the bright side! Things aren't so bad, it's all about your attitude.

I'm in danger of having a thought, here.

Hey hey hey now big guy, slow down! That kind of stuff is for policy-makers, not regular folks like you.

:lulz:
" It's just that Depeche Mode were a bunch of optimistic loveburgers."
- TGRR, shaming himself forever, 7/8/2017

"Billy, when I say that ethics is our number one priority and safety is also our number one priority, you should take that to mean exactly what I said. Also quality. That's our number one priority as well. Don't look at me that way, you're in the corporate world now and this is how it works."
- TGRR, raising the bar at work.

The Johnny

Quote from: Triple Zero on August 12, 2013, 06:47:14 PM
BTW (somewhat related to my braindump in the Surveillance thread), "Latent Semantic Indexing" is a Natural Language Processing / Machine Learning algorithm that can do "fuzzy" text matching according to semantic content. Meaning it doesn't require sets of specific keywords to group texts with similar topics, or to calculate a "semantic distance" between two texts.

There's no real parsing or linguistic "understanding" involved, it's mainly a statistical technique that correlates groups of words and phrases used in similar contexts between different texts. But neither the words nor the contexts need to be identical in a strict word-for-word sence, in order to get a (partial) match.

Quote from: en.wikipedia.org/wiki/Latent_semantic_indexingLatent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.[1]

LSI is also an application of correspondence analysis, a multivariate statistical technique developed by Jean-Paul Benzécri[2] in the early 1970s, to a contingency table built from word counts in documents.

Called Latent Semantic Indexing because of its ability to correlate semantically related terms that are latent in a collection of text, it was first applied to text at Bell Laboratories in the late 1980s. The method, also called latent semantic analysis (LSA), uncovers the underlying latent semantic structure in the usage of words in a body of text and how it can be used to extract the meaning of the text in response to user queries, commonly referred to as concept searches. Queries, or concept searches, against a set of documents that have undergone LSI will return results that are conceptually similar in meaning to the search criteria even if the results don't share a specific word or words with the search criteria.

(...) LSI is also used to perform automated document categorization.

(...) Dynamic clustering based on the conceptual content of documents can also be accomplished using LSI. Clustering is a way to group documents based on their conceptual similarity to each other without using example documents (this is called "unsupervised learning", btw - 000) to establish the conceptual basis for each cluster. This is very useful when dealing with an unknown collection of unstructured text.

Because it uses a strictly mathematical approach, LSI is inherently independent of language. This enables LSI to elicit the semantic content of information written in any language without requiring the use of auxiliary structures, such as dictionaries and thesauri. LSI can also perform cross-linguistic concept searching and example-based categorization. For example, queries can be made in one language, such as English, and conceptually similar results will be returned even if they are composed of an entirely different language or of multiple languages.

(...) LSI automatically adapts to new and changing terminology, and has been shown to be very tolerant of noise (i.e., misspelled words, typographical errors, unreadable characters, etc.).[9] This is especially important for applications using text derived from Optical Character Recognition (OCR) and speech-to-text conversion. LSI also deals effectively with sparse, ambiguous, and contradictory data.

Text does not need to be in sentence form for LSI to be effective. It can work with lists, free-form notes, email, Web-based content, etc. As long as a collection of text contains multiple terms, LSI can be used to identify patterns in the relationships between the important terms and concepts contained in the text.

(full WP article)

It's a really cool (elegant / relatively simple) algorithm, btw.

So yeah you can probably guess how this sort of technology would be very useful to an organisation that has the need for automatic classification and relevance filtering / selection of huge amounts of textual data. I couldn't say whether it would be feasible to apply it on all data, or whether its computational complexity restricts it to use only on certain groups of targets and/or people on certain "lists".

Additionally new developments in a different technique called Restricted Boltzmann Machines / Deep Learning Networks, are said to yield even better results for unsupervised learning and Semantic Indexing of Big Data. Geoffrey Hinton is the big name in this field, he works for Google now. His talks are quite enjoyable to watch, IMHO. What I further understand about RBMs is that because of their simple structure they can be implemented in FPGAs and specialized computation hardware in order to increase performance. On the other hand, research in this field has only gotten huge results since a few years, so governments are probably not using it, just yet.

Thats the type of heuristics i was talking about in regards to qualitative analysis... its so efficient that it looks for synonims or even themes to make the analytical categories, and arranges them accordingly, so no ammount of noise is going to block that out, it simply means that in your given case there will be a greater number of categories.
<<My image in some places, is of a monster of some kind who wants to pull a string and manipulate people. Nothing could be further from the truth. People are manipulated; I just want them to be manipulated more effectively.>>

-B.F. Skinner