Hacker Gets the Goods on Global Warming... or something

AFK · December 18, 2009, 06:38:52 PM

Quote from: Doctor Rat Bastard on December 18, 2009, 06:33:48 PM
Quote from: Rev. What's-His-Name? on December 18, 2009, 06:18:01 PM
Well, my research protocol would be laid out and my sources would be cited. Everything that is necessary to pass the IRB process. IMO, if that's not good enough for the MPP, then yes, I would tell them, politely, to go suck eggs. If a research design can pass muster with IRB, I'm not sure why it is necessary, or useful, to share raw data with an entity like the MPP. And the MPP of course would be welcome to check with my sources if they really wanted to. But I was commenting more on the general public, and I don't see how it makes any sense to give raw data to untrained eyes. I am the professional, it is my job to process and present the data. If they don't trust me, they need to take it up with the IRB that approved my research.

Hrmmm, but without raw data, how can someone duplicate your results? How can someone verify your interpretation... how can someone see if perhaps you made a mistake in your math or your data manipulation?

This seems especially important with Climate data, as its now clear that only some data was used (the homogenized data) lots of data was tossed and the computer programs that manipulated the data appear to produce results now that don't match.

So the CRU data is based on a subset of measurements that were then run through a computer model (which can't now be replicated at least not according to the developer in the Harry Read Me file).

So even if your research protocols are good and your data collection is sound... the code used to manipulate the data might be buggy... if so, how is anyone going to discover the error, if they simply accept your report because the IRB said you were cool?

It seems absurd to me that we're even having this discussion... how does refusing access to raw data get us closer to knowledge?

If my research is published, it has already gone through this process. If I'm simply gathering multiple sources of indicator data, then anyone is free to gather that as well and they'll have the sources I cite to refer too.

Bebek Sincap Ratatosk · December 18, 2009, 06:42:43 PM

Also, I'm entirely open to being shown that my interpretation of this situation is incorrect... If they are sharing data and methodology and I just missed it... I would like to know!!!

Bebek Sincap Ratatosk · December 18, 2009, 08:25:30 PM

Quote from: Rev. What's-His-Name? on December 18, 2009, 06:38:52 PM
Quote from: Doctor Rat Bastard on December 18, 2009, 06:33:48 PM
Quote from: Rev. What's-His-Name? on December 18, 2009, 06:18:01 PM
Well, my research protocol would be laid out and my sources would be cited. Everything that is necessary to pass the IRB process. IMO, if that's not good enough for the MPP, then yes, I would tell them, politely, to go suck eggs. If a research design can pass muster with IRB, I'm not sure why it is necessary, or useful, to share raw data with an entity like the MPP. And the MPP of course would be welcome to check with my sources if they really wanted to. But I was commenting more on the general public, and I don't see how it makes any sense to give raw data to untrained eyes. I am the professional, it is my job to process and present the data. If they don't trust me, they need to take it up with the IRB that approved my research.

Hrmmm, but without raw data, how can someone duplicate your results? How can someone verify your interpretation... how can someone see if perhaps you made a mistake in your math or your data manipulation?

This seems especially important with Climate data, as its now clear that only some data was used (the homogenized data) lots of data was tossed and the computer programs that manipulated the data appear to produce results now that don't match.

So the CRU data is based on a subset of measurements that were then run through a computer model (which can't now be replicated at least not according to the developer in the Harry Read Me file).

So even if your research protocols are good and your data collection is sound... the code used to manipulate the data might be buggy... if so, how is anyone going to discover the error, if they simply accept your report because the IRB said you were cool?

It seems absurd to me that we're even having this discussion... how does refusing access to raw data get us closer to knowledge?

If my research is published, it has already gone through this process. If I'm simply gathering multiple sources of indicator data, then anyone is free to gather that as well and they'll have the sources I cite to refer too.

"The British Meteorological Office is to launch a review of its temperature data and has asked 188 nations - including Australia - for permission to release raw weather data in the wake of the so-called ''Climate-gate'' email scandal." - Dec. 7, 2009 from an Austrailian source "The Age") http://www.theage.com.au/world/climategate-forces-weather-data-review-20091206-kcrk.html

The government is ASKING permission to release the raw data, indicting that the raw data is NOT currently 'free to gather'.

If the data was all open and available, then sure we only need the methodology to repeat the testing. If the data isn't available, how can the conclusions be 'peer reviewed'?

EDIT:

QuoteI am sure that, over 20 years ago, the CRU could not have foreseen that the raw station data might be the subject of legal proceedings by the CEI and Pat Michaels. Raw data were NOT secretly destroyed to avoid efforts by other scientists to replicate the CRU and Hadley Centre-based estimates of global-scale changes in near-surface temperature. In fact, a key point here is that other groups -- primarily at the NOAA National Climatic Data Center (NCDC) and at the NASA Goddard Institute for Space Studies (GISS), but also in Russia -- WERE able to replicate the major findings of the CRU and UK Hadley Centre groups. The NCDC and GISS groups performed this replication completely independently. They made different choices in the complex process of choosing input data, adjusting raw station data for known inhomogeneities (such as urbanization effects, changes in instrumentation, site location, and observation time), and gridding procedures. NCDC and GISS-based estimates of global surface temperature changes are in good accord with the HadCRUT data results. - Ben Santer, a climate scientist at Lawrence Livermore National Laboratory

This sounds good.... but it doesn't seem to fit with being required to ask countries to allow the Brits to release the data...

I wonder how much of this mess is just horrible reporting?

EDIT 2: Hrmmm, apparently not just bad reporting...

http://www.realclimate.org/index.php/archives/2009/12/please-show-us-your-code/#more-2452

QuoteIt should be a common courtesy to provide methods requested by other scientists in order to speedily get to the essence of the issue, and not to waste time with the minutiae of which year is picked to end the analysis.

The reason why Gavin and I were not able to repeat Scafetta's analysis in exact details is that his papers didn't disclose all the necessary details.

That is the thing that bothers me...

Hangshai · December 18, 2009, 08:46:47 PM

I really wish you guys would have watched that video link I posted. Its a 2 part interview with Freeman Dyson. He talks about how he was on a climate study for a while and they were watching the intake of co2 by vegetation, and even though the amount of co2 has risen in the atmosphere, the data shows in some places (brazil and somewhere else, I cant remember) the co2 is being absorbed by forests and what not. Canada, not so much I guess. Maybe because of all the industrialization in N.A. I dont really know. But then he goes on to explain how most of the data used for climate change is computer models, and not actually OBSERVED data. He also states that the observed data hasnt been going on long enough for anyone to even know WHAT is going on. All they know is something is going on. Anyway, Im sure he can say it a lot better than I can, but, it pretty much seems to invalidate any data based on computer models.

Bebek Sincap Ratatosk · December 18, 2009, 08:51:57 PM

Quote from: Hangshai on December 18, 2009, 08:46:47 PM
I really wish you guys would have watched that video link I posted. Its a 2 part interview with Freeman Dyson. He talks about how he was on a climate study for a while and they were watching the intake of co2 by vegetation, and even though the amount of co2 has risen in the atmosphere, the data shows in some places (brazil and somewhere else, I cant remember) the co2 is being absorbed by forests and what not. Canada, not so much I guess. Maybe because of all the industrialization in N.A. I dont really know. But then he goes on to explain how most of the data used for climate change is computer models, and not actually OBSERVED data. He also states that the observed data hasnt been going on long enough for anyone to even know WHAT is going on. All they know is something is going on. Anyway, Im sure he can say it a lot better than I can, but, it pretty much seems to invalidate any data based on computer models.

*nods* It was an interesting interview... but its separate from my basic issue which is how can one trust 'science' if we keep bits of the process secret? If we can't see what your choices in computer code, choices in data manipulation etc. then how can we trust your conclusion?

(And I've heard others in the business say the same thing about the models)

Bebek Sincap Ratatosk · December 18, 2009, 09:18:43 PM

So I guess this boils down to two questions:

1. In general how much source data should be available when someone produces a scientific paper?

2. If computer models/code are used, should that source be available for review?

I guess it's just seems stunning to me that any scientist would not want all the data out there to provide support for their position... and more importantly, to double check their work. Its good that other groups have done other studies and they are very similar in their results, but it seems absurd to be against the free exchange of information if you peruse knowledge. I'm not sure "well assholes will misinterpret it" is a good excuse.

Hangshai · December 18, 2009, 09:55:26 PM

this just in from the Beeb. A preliminary agreement has been made by USA, China, India, And EU. It is not THE treaty, but a 'first step'. What does this mean, who knows. They did also say America will be passing climate legislation in the senate next year though(because of this).

Triple Zero · December 18, 2009, 09:58:16 PM

Quote from: Doctor Rat Bastard on December 18, 2009, 09:18:43 PM
2. If computer models/code are used, should that source be available for review?

preferably, but a detailed description of the algorithms used should be enough.

Triple Zero · December 18, 2009, 10:11:44 PM

Oh and in addition, it's a bit of a "me too", but Rat has been wording it excellently and I have nothing useful to add, but Rat is riding the correct scientific motorcycle, a lot.

I can think of some valid reasons for not releasing raw data, for instance a friend of mine measures cosmic particles using sattelite dish arrays in argentina, and it produces several gigabytes of raw data per hour (or something like that), so naturally they just keep the filtered stuff, which is a lot smaller.
Another reason could be pending patents, but that data can be released when it's not "sensitive" anymore.

Requia ☣ · December 18, 2009, 10:34:48 PM

In practice, its more that 'Anybody who is one of us' can repeat the experiment. People outside the academic community are not only locked out of the raw data, but locked out of everything else as well by the paywalls journals set up.

It has nothing to do with climate change in the specific though, just arrogance and profit motive in general.

NOTHING in those emails surprised me even a little bit, its business as usual from what I can see.

Rococo Modem Basilisk · December 18, 2009, 10:46:58 PM

Quote from: Triple Zero on December 18, 2009, 10:11:44 PM
Oh and in addition, it's a bit of a "me too", but Rat has been wording it excellently and I have nothing useful to add, but Rat is riding the correct scientific motorcycle, a lot.

Agreed.

Quote
I can think of some valid reasons for not releasing raw data, for instance a friend of mine measures cosmic particles using sattelite dish arrays in argentina, and it produces several gigabytes of raw data per hour (or something like that), so naturally they just keep the filtered stuff, which is a lot smaller.

Some individuals and organizations with interest in this type of thing have big disks and fat pipes. Although there is no full precedent for this, I could imagine something like wikileaks popping up dedicated to caching raw data, and maybe a bunch of concerned laypeople keeping it around in chunks with bittorrent or some other distributed system. Mind you, it would be fundamentally different from both, so I may be making a failanalogy, but if it keeps this kind of situation from happening I could see people being interested in distributedly caching big chunks of raw and ostensibly useless scientific data that would otherwise be pitched out.

Quote
Another reason could be pending patents, but that data can be released when it's not "sensitive" anymore.

Would patents on the sensors and other things correspond to restrictions on the data recorded? That doesn't really jive with my understanding of IP law -- it's kind of like claiming that everything written with Word belongs to Microsoft and everything written using a mac keyboard belongs to Apple.

Triple Zero · December 18, 2009, 10:51:00 PM

no, I meant if the research is about to discover something that can be patented, you might want to wait releasing the data until you get the patent approved.

Rococo Modem Basilisk · December 18, 2009, 10:51:49 PM

Quote from: Requia ☣ on December 18, 2009, 10:34:48 PM
In practice, its more that 'Anybody who is one of us' can repeat the experiment. People outside the academic community are not only locked out of the raw data, but locked out of everything else as well by the paywalls journals set up.

It really shouldn't matter much WHO has access to raw data, since only the experts with the equipment and intimate knowledge of the subject can make sense of it. That's something that hits me funny about not releasing the raw data -- releasing only cooked data makes sense for dead tree stuff, and sometimes (when there's lots) for other forms, but it doesn't seem like the size is the issue here. The only people who could benefit from raw data in terms of a zero-sum situation is experts, and so when someone says they'd "rather destroy the raw data" than let it leak, I can't imagine these guys are paranoid about lay-people.

Rococo Modem Basilisk · December 18, 2009, 10:54:24 PM

Quote from: Triple Zero on December 18, 2009, 10:51:00 PM
no, I meant if the research is about to discover something that can be patented, you might want to wait releasing the data until you get the patent approved.

My bad. Still, I don't think you'd be publishing stuff in scientific journals in the first place if you are doing it for industry. There's probably counterexamples (those harmonic motors that Dyson did, some types of experimental dextrous manipulators for robot arms, etc -- terribly applied stuff novel enough to represent lots of original research).

Requia ☣ · December 18, 2009, 11:03:22 PM

Quote from: Enki v. 2.0 on December 18, 2009, 10:51:49 PM
Quote from: Requia ☣ on December 18, 2009, 10:34:48 PM
In practice, its more that 'Anybody who is one of us' can repeat the experiment. People outside the academic community are not only locked out of the raw data, but locked out of everything else as well by the paywalls journals set up.
It really shouldn't matter much WHO has access to raw data, since only the experts with the equipment and intimate knowledge of the subject can make sense of it. That's something that hits me funny about not releasing the raw data -- releasing only cooked data makes sense for dead tree stuff, and sometimes (when there's lots) for other forms, but it doesn't seem like the size is the issue here. The only people who could benefit from raw data in terms of a zero-sum situation is experts, and so when someone says they'd "rather destroy the raw data" than let it leak, I can't imagine these guys are paranoid about lay-people.

I'm talking more about access to the research in general, not the raw data. But as for raw data:

Part of it is that everything still revolves around dead tree publication, your career in academia depends on which particular dead trees you get published on, and what other people write about your work on dead trees, the other part is that the filtering method is more important than the raw data itself, anybody (outside of the very expensive to run experiments) can get new raw data. In fact, you're *supposed* to go get new raw data. It covers against faked data/statistical anomalies/unknown variables.

Quote from: Enki v. 2.0 on December 18, 2009, 10:54:24 PM
Quote from: Triple Zero on December 18, 2009, 10:51:00 PM
no, I meant if the research is about to discover something that can be patented, you might want to wait releasing the data until you get the patent approved.

My bad. Still, I don't think you'd be publishing stuff in scientific journals in the first place if you are doing it for industry. There's probably counterexamples (those harmonic motors that Dyson did, some types of experimental dextrous manipulators for robot arms, etc -- terribly applied stuff novel enough to represent lots of original research).

Thanks to the Hatch act, universities can patent the work of their students/professors now too, and they frequently do.

Principia Discordia

News:

Hacker Gets the Goods on Global Warming... or something

AFK

Bebek Sincap Ratatosk

Bebek Sincap Ratatosk

Hangshai

Bebek Sincap Ratatosk

Bebek Sincap Ratatosk

Hangshai

Triple Zero

Triple Zero

Requia ☣

Rococo Modem Basilisk

Triple Zero

Rococo Modem Basilisk

Rococo Modem Basilisk

Requia ☣