Target Independent Mining

Remotely sensed satellite data is characterized by very large data sets. This means that data centers will not be able to maintain a significant amount of this data on-line, but must archive this data in much slower access tertiary storage. The implication for data mining is that there will be a high access latency to move this data from tertiary storage to disk and ultimately to memory where it can be mined. This suggests a mining requirement to develop techniques to minimize having to access data stored on tertiary storage. The target-independent mining results represent concentrated data in which most of the data's volume is removed, while still retaining most of its interesting characteristics.

The figure below is an example of how transient events in SSM/I data are identified by applying a threshold and how data compression in percent relates to loss of transient data in percent. Thus, at a threshold of 15° K, 92% of MCS that were detectable in raw SSM/I data are detectable in only 2% of the enriched data.

Collaboration with Domain Expert:
Dr. Tom Hinke (UAH Computer Science)