data mining

All You Need is Hate – Stanley Fish

Hate Springs Eternal – Krugman
Law enforcement requests for postal info granted –

U.S. postal authorities have approved more than 10, 000 law enforcement requests to record names, addresses and other information from the outside of letters and packages of suspected criminals every year since 1998, according to U.S. Postal Inspection Service data.

Online Advertising: So Good, Yet So Bad for Us is my latest Wired News column.

The column discusses a debate at this year’s Computers Freedom and Privacy conference between Jeff Chester of the Center for Digital Democracy and Mike Zaneis from the Interactive Advertising Bureau. On one side, Chester argued that online advertising is privacy invasive and should be subject to consumer opt-in.  On the other side, Zaneis argued that advertising makes great content possible, gives people ads that are relevant to them, and doesn’t collect sensitive information.  I find something each man says to disagree with in the column, and don’t necessarily come up with a better answer.  Both Chester and Zaneis wrote me nice emails about the column, which I really appreciate.

I weigh in on NSLs and the FBI with today’s Circuit Court column: FBI Slips Demand Patriot Act Cuts.

I think the most interesting part of the column is the end where I try to grapple with the FBI’s assertion that the lower standard of proof for national security letters is really helpful to their investigations. While its a small part of this column, its an issue I plan to discuss more in future columns, and as Congress begins to reconsider the USAPATRIOT grant of powers in light of the FBI abuses.

One of the most challenging problems for national security is predicting and stopping terrorist attacks before they happen. The government proposes that data mining is a useful tool for finding terrorists. By using database technology, statistical analysis and modeling, the government says it can search our email, phone calls, shopping habits, educational records, and find the needle (terrorists) in the haystack (the general population). One has to know a bit about the science and statistics behind data mining to evaluate this claim.

The debate over data mining is often cast as a trade-off between security and the privacy of individuals. But the real problem with national security data mining is that there is no trade off. There’s an invasion of privacy, but no corresponding uptick in security. Why is this so?

I’ve argued that data mining can’t work. This recent report from Jim Harper and Jeff Jonas at CATO takes a closer look at the practice and agrees with me.

Meanwhile, the city of Philadelphia is trying to use data mining to predict which ex-cons will become murderers.

Why might data mining help Philly find murderers, but not help the United States find terrorists?

First, Philly is analysing a discreet population, people on probation. This narrows the ratio of subjects to killers to one in one hundred. In contrast, the ratio of subjects to terrorists in the United States is one in millions.

Second, though its a relatively rare offense, there have been a lot of murders and so we have a lot of information about the characteristics of people who kill. We know what the indicators are that incline someone toward violence. Similarly, with consumer behavior, identity theft and credit card fraud, the models for suspicious activity are based on hundreds of thousands of known examples. Terrorism, in contrast, has no broad based model. As the CATO report says, there are “a relatively small number of attempts every year and only one or two major terrorist incidents every few years—each one distinct in terms of planning and execution—there are no meaningful patterns.”

Data mining has two main public policy questions. First, does it help with resource allocation. In a world of scarce resources, you can’t check every lead. Does data mining narrow the leads effectively, or does it generate so many false leads that it exacerbates the resource allocation problem?

Next, what are the costs of false positives, and given the number that data mining will generate, can we bear that cost?

Philadelphia is able to better its odds with datamining, because its dealing with a relatively high rate of murderers within a finite, tested population and because it has a model based on known data. It will be interesting to see how well the program works. But the results can not be extrapolated to the hunt for terrorists.