News:

It's like that horrible screech you get when the microphone is positioned too close to a speaker, only with cops.

Main Menu
Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Triple Zero

#76
Aneristic Illusions / Re: Michael Hastings
August 12, 2013, 06:47:14 PM
BTW (somewhat related to my braindump in the Surveillance thread), "Latent Semantic Indexing" is a Natural Language Processing / Machine Learning algorithm that can do "fuzzy" text matching according to semantic content. Meaning it doesn't require sets of specific keywords to group texts with similar topics, or to calculate a "semantic distance" between two texts.

There's no real parsing or linguistic "understanding" involved, it's mainly a statistical technique that correlates groups of words and phrases used in similar contexts between different texts. But neither the words nor the contexts need to be identical in a strict word-for-word sence, in order to get a (partial) match.

Quote from: en.wikipedia.org/wiki/Latent_semantic_indexingLatent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.[1]

LSI is also an application of correspondence analysis, a multivariate statistical technique developed by Jean-Paul Benzécri[2] in the early 1970s, to a contingency table built from word counts in documents.

Called Latent Semantic Indexing because of its ability to correlate semantically related terms that are latent in a collection of text, it was first applied to text at Bell Laboratories in the late 1980s. The method, also called latent semantic analysis (LSA), uncovers the underlying latent semantic structure in the usage of words in a body of text and how it can be used to extract the meaning of the text in response to user queries, commonly referred to as concept searches. Queries, or concept searches, against a set of documents that have undergone LSI will return results that are conceptually similar in meaning to the search criteria even if the results don't share a specific word or words with the search criteria.

(...) LSI is also used to perform automated document categorization.

(...) Dynamic clustering based on the conceptual content of documents can also be accomplished using LSI. Clustering is a way to group documents based on their conceptual similarity to each other without using example documents (this is called "unsupervised learning", btw - 000) to establish the conceptual basis for each cluster. This is very useful when dealing with an unknown collection of unstructured text.

Because it uses a strictly mathematical approach, LSI is inherently independent of language. This enables LSI to elicit the semantic content of information written in any language without requiring the use of auxiliary structures, such as dictionaries and thesauri. LSI can also perform cross-linguistic concept searching and example-based categorization. For example, queries can be made in one language, such as English, and conceptually similar results will be returned even if they are composed of an entirely different language or of multiple languages.

(...) LSI automatically adapts to new and changing terminology, and has been shown to be very tolerant of noise (i.e., misspelled words, typographical errors, unreadable characters, etc.).[9] This is especially important for applications using text derived from Optical Character Recognition (OCR) and speech-to-text conversion. LSI also deals effectively with sparse, ambiguous, and contradictory data.

Text does not need to be in sentence form for LSI to be effective. It can work with lists, free-form notes, email, Web-based content, etc. As long as a collection of text contains multiple terms, LSI can be used to identify patterns in the relationships between the important terms and concepts contained in the text.

(full WP article)

It's a really cool (elegant / relatively simple) algorithm, btw.

So yeah you can probably guess how this sort of technology would be very useful to an organisation that has the need for automatic classification and relevance filtering / selection of huge amounts of textual data. I couldn't say whether it would be feasible to apply it on all data, or whether its computational complexity restricts it to use only on certain groups of targets and/or people on certain "lists".

Additionally new developments in a different technique called Restricted Boltzmann Machines / Deep Learning Networks, are said to yield even better results for unsupervised learning and Semantic Indexing of Big Data. Geoffrey Hinton is the big name in this field, he works for Google now. His talks are quite enjoyable to watch, IMHO. What I further understand about RBMs is that because of their simple structure they can be implemented in FPGAs and specialized computation hardware in order to increase performance. On the other hand, research in this field has only gotten huge results since a few years, so governments are probably not using it, just yet.
#77
Quote from: YOUR Social Science Thinkmonkey on August 12, 2013, 03:54:52 AM
I AM THE PERSON WHO GIVES PEOPLE JACKETS. IF YOU WANT TO BE WARM AND COMFORTABLE YOU MUST PLEASE ME.

FOR SOME REASON I AM LESS POPULAR DURING THE SUMMER.

I still have your T-shirt, does that count too?

(you weren't kidding about them being high quality shirts, I've been wearing the shit out of it over the years, lately mostly to sleep in, and it's still strong as fuck, great shirt :) )
#78
What about Bobby Tables?

On the one hand they say it's a lesson in not accepting unsanitized user-input, but on the other hand they made their kid's name a WEAPONIZED CYBER ATTACK ... :)

In a world dominated by the von Neumann computing paradigm, a world where there is no inherent distinction between CODE = DATA, can you really make a distinction between Freedom of Speech (DATA) and "freedom of action*" (CODE EXECUTION) ? ;-)

* there is probably a proper legal term for this, I'm referring to the fact that while you should be able to SAY anything you want, you cannot just DO anything you want, there's rules against certain acts
#79
Quote from: YOUR Social Science Thinkmonkey on June 13, 2013, 01:48:08 AMYeah, he's talking about the Harvard implicit association tests I linked to in another thread; very worth checking out, if only for some self-insight.

https://implicit.harvard.edu/implicit/demo/

I've never seen anyone take one and not be mad afterwards.

I'm biased against white people and prefer light-skinned brown people over dark-skinned brown people, FWIW. Almost all of us carry prejudices we aren't conscious of, it's what we do with them when we know about them that's important.

Wow, very cool test! I took the dark/light skinned test, since that's what this thread is about.

Apparently it suggests I have "a slight automatic preference for Light Skin compared to Dark Skin".

It's weird because initially/intuitively I felt like the order of the images influenced the outcome, but in fact the more I think back about what really happened, the more I realize it did indeed feel easier to correctly classify the words/faces when positive words were under the same key as light-skinned faces. The only other factor I can think of was maybe that the second time around I was also a bit more practiced in doing the task, and I never feel comfortable doing these kinds of timed tests. But then, at the end they said something about correcting for such biases.

I'm fairly sure that short drops in my focus/attention caused the most actual errors (got ADD, after all).  But, if I understand correctly it's not so much about the errors you make, but also about the timing, and I'm pretty sure it felt like I could do the second test much faster.

So yeah I guess it really is easier for me to classify positive words if they're under the same key as light skinned people, than vice versa.

Though the test said it was only "slight". I'm not really sure what to make of this outcome, I'll have to give it some time and see what my brain makes of it :) I guess I'll end up trying to slightly adjust for this bias or something (but only "slight" since that's what the test said :P). And only in the positive direction of course. I usually don't have to make many snap judgements this quickly, though.

Very cool test, anyway. I might make a few of the others as well, when I get the time.
#81
Yes. It'd have to be some sort of redundant peer-to-peer communications protocol using fountain codes, with the fountains being firehoses, because CDNs tend to drop anything that looks like DDoS traffic, but once you get around that, you'd totally have them by the nads.

Sorry for answering that seriously, my head is still in braindump mode :)

I shall make it good with a pun: Two IP packets walk into a Tier-1 Switch. They start drinking like madmen, ordering beer-to-beer, because they didn't have much TTL.
#82
Quote from: Carlos Danger on August 08, 2013, 06:50:47 PM


Adam Curtis, the BBC documentary maker, is having fun.

Just got this link via IRC : http://www.bbc.co.uk/blogs/adamcurtis/posts/BUGGER , didn't watch all the embedded video clips, but it's a pretty good read.




Additionally, about the idea of flooding Them with noise / bad information / etc, in theory it might work, but Cain and V3X already explained pretty well there are a few catches. The main problem seems to be that we don't quite know what exactly they are looking for, and what techniques they already use to filter out the (ubiquitous) noise in their data.

Additionally, not all of the data is textual. There's been this talk about "just metadata". That is location data, duration/timing data, IP/MAC addresses, cellphone tower data, electricity/water/power usage, traffic cameras, anything, and most importantly data about network nodes. That is, any kind of "social" network or anything that bears a vague resemblance to this, not what you write on FB, email, chat, but the graphs made up of the nodes you interact with, your buddy lists, your contact lists, and those of the nodes around you. And more.

This kind of data contains huge amounts of information, especially if you feed it to a Machine Learning algorithm that eats Big Data for lunch.

It also has the nice quality of consisting mostly of numbers and other types of easily machine-readable datatypes--unlike email/chat text logs, which need keyword matching or natural language parsing before it becomes useful, even if the keywords are provided by a human agent looking for something in particular, this still means that large amounts of text data cannot be mined for their true information potential. Metadata, however, the more they collect, the more complete of a picture this paints.

It's also hard to fake. Humans have a hard time visualizing the "shape" of your local social network to, say, 3 degrees of separation. That's probably a few thousands of people, after all. This is peanuts to a machine. The result of this? You probably have no idea how highly specific and unique the shape of your local social graph is. Say you are absent on one particular social network, or you have taken pains to keep separate identities from work / discordians / family. That's great to keep private eyes and crazy exes off your trail, but with the NSA's resources it's a simple pattern matching task. Even without any names, they will locate the "hole" in the graph of the social network you are not participating in, they will easily find the connections in your life you tried to keep compartmentalised, all they need to look for are the connections of your friends, friends-of-friends and friends-of-friends-of-friends and fit these very very specific unique structures, just a bunch of nodes connected by lines, and it doesn't quite matter if it's your FB friends, cellphone contacts, email contacts, IM chat buddies. Pattern matching fill find the same loci, even if they aren't exactly the same, I am convinced the structures they form are more uniquely identifying than DNA+fingerprints.

About the previous paragraph, slight disclaimer: this is a hunch. I haven't read any scientific articles about doing exactly the above, but I've read a couple that did some rather similar things (there was one where they guessed a person's sexual orientation to a high degree of accuracy using the shape of their extended social graph). And I believe I have fairly a good idea of the information-theoretical requirements that need to and are present to make this possible. The website http://33bits.org got its name from the FACT that you only need 33 bits of statistically independent data, or 33 bits of entropy in order to uniquely identify any individual in the world. Here "entropy" and "bit" are used in the information-theoretical sense of the words, it has to be a coinflip 50/50 odds that is entirely independent from the bits you've already gather or otherwise it counts as less than a whole bit (for instance knowing someone is straight, going with the oft-quoted statistic this is the case for 90% of people, will yield you 0.15 bits, whereas knowing the converse would yield 3.3 bits). Anyway, social graphs contain lots and lots of bits (even given your local graph structure is not quite statistically independent from that of your friends), the trick is to extract them into a useful machine-searchable and machine-learnable form.

I'm just not sure where to even start spoofing or misleading such types of datamining. You could have a group of people making 5 FB accounts and friend eachother in meticulously planned ways, but you'd never be sure if you'd escape actual fingerprinting.

Then there is another problem about the way they use the text data. This is about IF you ever become a "person of interest", or maybe one of your friends does, they can search the backlog of information. And you can spoof and write disinfo all you want, but it'd take a huge amount of effort to sufficiently "noise it up", all of it, for any particular slice of time they decide to look into, including the stuff they already gathered before you even knew of the NSA leaks.

Now, they said that they weren't able to store all that data for longer than 24 hours. But for one, that slide is from 2008, five years ago. Also, that doesn't prevent them at all from storing *some* of the data on everyone, anything that might fit into a "personal report/file", I imagine that if you filter out all the obvious crud, the cat pictures, the retweets, the spam, the newsletters, repeated quotes in a discussion, most people's textual communications wouldn't fill more than a few megabytes per year (for comparison: The Hitchhiker's Guide to the Galaxy parts 1-6 are just 1.5MB in uncompressed ASCII, Lord of the Rings is about 2-3x that).

Then there's voice, say the avg person speaks 20 minutes on the phone/Skype per day, the GSM standard can squeeze intelligble voice into as little as 6.5 kbit/s, that's only about 360MB on a yearly basis. Times the US population that's 114 petabytes. The Utah Data Centre is estimated to have capacity for between 3000 and 12000 petabytes. I don't know if they have the computing capacity to make this data full-text searchable (fuzzy speech matching is easier than actual speech-to-text, false positives can be checked by hand after a query is done), I just don't know. Requesting "calls made to people on list XYZ between dates A and B" is easy and powerful enough, however.

I'm pretty sure that the overwhelming majority of the rest of all Internet data nowadays is made up of streaming video of the Torrent, YouTube, porn and Netflix kinds, for which they probably have no reason to store. Although if they could filter out all the duplicate content, and wanted to, they could still do it (IIRC all Hollywood movies ever made in 10GB BluRay format is less than 5 petabyte, but don't quote me on that) (which incidentally is the reason why I believe that the 25 petabytes they seized from MegaUpload and lost must have been mostly non-infringing).

For completeness, I bet a good fraction of traffic is also made up of random DDoS collateral, the contents of which is complete junk and its origins generally untraceable. It would be funny if they would store that, though :) (but they can't, the numbers are stupid immense).
#83
Quote from: Carlos Danger on August 05, 2013, 11:52:24 PM
Worth mentioning this, from all the way back in 2007

http://www.cryptogon.com/?p=624

(snip)

Somehow I get the feeling I've read this article before, but that's all pretty solid advice. Some of it used to (seem to) be a bit on the tinfoil side back then, but it's pretty effective. And with what we know now to be the case, tinfoil's again something to pack your lunch with.
#84
Aneristic Illusions / Re: Wow WHAT THEEEE FUCK EVEN
August 11, 2013, 07:38:18 PM
actually doesn't that particular instance also put the men in a role of interchangeable nonpeople? ("if boyfriend can do it, anyone can do it")

not that that makes it make any more sense.

no, no sense.
#85
#86
Quote from: YOUR Social Science Thinkmonkey on August 11, 2013, 04:32:04 AMI think I've been Socialist leaning all my life. I think that if anything, I might call myself a Nigelist.

you should team up with someone named Eva, spread your faith and start evanigelizing people.
#87
Quote from: Pæs on August 08, 2013, 07:24:37 AM


Aahahaha yes beautiful!

They really are positioned perfectly for some sort of cartoon thing :D

And I want that suit. Yes, with the pixels in it.

Quote from: Pæs on August 10, 2013, 12:53:24 AM


Is that ENKI? What is going on?

and ENKI is slightly taller than I expected
#88
Two vast and trunkless legs of stone / Re: Spagbook
August 11, 2013, 01:03:07 PM
Awesome pics Cram!! thx for posting and making them, can't believe nobody else took pics--me included, I'll bring a camera next time

Quote from: Pixie on August 10, 2013, 02:28:54 AM
Zippletits and Regret have seemingly not cut their hair since 2010

It's to spite my balding friends, mainly.

Quote from: Pæs on August 11, 2013, 05:33:05 AM
Quote from: Pæs on August 10, 2013, 02:15:07 AM
I don't even know where to put this, but I accidentally this while WOMPing.




Paes thank you, good to know somebody was keeping that pic around for just this occasion.
#89
* kiki [Mibbit@hide-202A4FE1.sub-174-239-40.myvzw.com] entered the room.
<triplezero> that's funny, 'kiki' is the name of the bot I tried crashing yesterday :)
<kiki> aww that's not nice.
<triplezero> no what is not nice is creating a bot that can't do anything except SCREAM
<kiki> true that.
* Wilhelm [aaaaa@hide-F90C9797.warpnet.nl] entered the room.
<Wilhelm> AAAAAAAAAAAAAAAAAAAAAAAAAAAA
<triplezero> there he is
<triplezero> Wilhelm, tell the people how you feel each day, every day, each and every single agonizing second of your day
<Wilhelm> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
<Cram> you poor poor robot
<triplezero> crashing them is mercy.
#90
It's just, they really can see every fart I make on the internet, read my emails, everybody's emails, and to top it off those slides show an interface that a 13 year old could use--that last part really drives it home. That they could technically see everything, I knew for a long time, it's practically built into the Internet, but, damn, the technology! That's some serious hardcore pieces of software, I guess you need to be a tech-head to just see the magnitude of (software) engineering involved in even making this system possible. Of course that's stupid, because Google has been using this very same technology to handle their Big Data--well not exactly the same, they rolled their own, and last I heard the frameworks the NSA uses were made by the Apache Foundation (yeah that part is open source! of course you won't have the machinery to put it to this kind of use, but still).