Nov 222010

Freakonomics is an interesting book. It brought back a lot of memories of projects I’ve worked on and papers and presentations I’ve seen. Some of the special bits for me are the surprising statistical results. Like: Car seats haven’t done much to save kids’ lives, but they’ve made some

Filtering Water

companies a lot of money. I’d love to have the time to sit and look for this sort of stuff, it sounds cool. And I was just vaguely citing something like this a couple of weeks ago when camping: I didn’t bother to bring my water filter. I thought about it and figured that:

  1. Parks organizations feel they need to warn people about giardia (‘Beaver Fever’) so that they don’t get sued
  2. Water filter manufacturers warn people about it so that they can make money.

The kids and I drank unfiltered water from the Ghost River all weekend with no ill effects.

But mining statistics is a funny one. Working in digital radiography and developing image processing algorithms made me think a bit differently. Radiologists are looking for unusual things in images, not necessarily the standard things; hey, look two lungs! Patients come in with all sorts of deformities, like missing fingers, legs, you name it. So an image segmentation algorithm that looks for a hand in an image by trying to identify the 5 digits may be a bit unreliable; you don’t always get to see 5 digits on an ER radiograph of a hand! And when you do – they might not present themselves in a sensible set of straight lines. Crushing trauma does a lot of crazy stuff to the phalanges. Then you have X-ray images with metal plates and other implants – but these aren’t necessarily the norm, so you need to make sure that your algorithms can deal with stuff that isn’t normal, and most of the stuff you see isn’t normal. After all, that’s why these people are getting X-rays taken!  What about the few dead pixels in a CCD that you might encounter.   Suppose it’s 5 out of 16 million pixels that are bad.  Statistically insignificant.  But important because they distract the reader from the real pathology – a human eye will pick out the flaw immediately.

Similarly, if I had been the one who got sick with beaver fever after drinking water from the Ghost River, I probably wouldn’t care so much about how unlikely it was.

 Posted by at 10:27 pm

  2 Responses to “Freakonomics”

  1. Interesting points about data mining, indeed.
    But i wouldn’t use the Freakonomics books as a positive illustration here as they are the worst example of how you can get rich (they sold millions of copies) if you can tell a good story and don’t let ethics stand in the way. I thought they were just sloppy and sensationalist after reading through Freakonomics but i found out they are intellectually dishonest as well after reading through SuperFreakonomics.

  2. Perhaps the Freakonomics team have fallen victim to their own warnings about incentives and their unintended consequences.