Thursday, April 12, 2007

Are you really a data miner?

As we plan for the next Data Mining Users Group (May 9), we find ourselves debating what data mining really is. How does your organizaiton define data mining? Does it include classical statistics (like regression and clustering)? Is it limited to machine learning and artificial intelligence? Does it include OLAP and reporting?

We'll be discussing this topic - and why it matters - on May 9 at the Toronto Data Mining Users Group (www.torontodatamining.ca )

2 comments:

Enrique said...

Ref:
http://www.webopedia.com/TERM/D/data_mining.html

A class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data.
Data mining is popular in the science and mathematical fields but also is utilized increasingly by marketers trying to distill useful consumer data from Web sites.

Am I a Data Miner?
If (pursuant to above definition of Data Mining) a Data Miner is someone that uses some unspecified user interface to a access a database containing customer behavioral data with an intent to predict future behaviour, the answer is yes.

I would argue, however, that the definition should be tightened up. If one used a similar definition for traditional mining, it would include prospectors simply panning for gold. I think the key difference is the proportion of luck required for success. Not to denigrate prospectors, but just as modern methods have generally replaced panning, I think that most true data miners are much farther up the technology curve.

Anonymous said...

Data Mining is a knowledge discovery approach where analysis of information provides insights that can be ultimately actioned to produce supposedly incremental ROI. In the business world, it is important to understand that knowledge discovery without any foresight on its impact to the overall business results in situations often referred to as 'Analysis Paralysis.' Clearly, there must be direction on how data mining results will be used to deliver incremental business ROI.

The other misnomer on data mining is that it exclusively refers to software and automated processes. This is false as data mining is more about an approach that is adopted to achieve a solution which will deliver incremental ROI. Some of these processes will involve software and automated procedures but many of these processes will involve human intervention and the requirement for intellectual capital to be a significant component of the overall solution.

Data Mining as a discipline will always involve the marriage of technology/mathematics and the human element if solutions are going to deliver superior results for a given business challenge.



Richard Boire