When data-mining, the first step is to obtain the data that you would like to mine. I have decided that I would like to try my hand at playing the stock market so it became necessary for me to obtain historical stock market data. To that end, I have devised a method to obtain end of day results for every listing on NYSE, AMEX and NASDAQ since their inception. The data is in the process of being assembled and I expect it to be complete within a few days. Current estimates expect the data to take up approximately 2GB, making it the largest single dataset that I have ever played with. Just having this much data makes my data hoarding senses tingle.
I'll probably spend a little bit of time putting the data into an easy to understand and use format and then I'll start looking for patterns. I'm hoping to throw my modeling background and experience at the stock market to see if I can't beat the system. If I can beat the stock market and make bajillions of dollars (or euro if the dollar collapses) that would be pretty sweet but if I don't, at the very least, I expect to have fun playing with lots and lots of numbers.
As a second approach, since it turns out to be rather difficult to get this sort of data in the first place, I'm half considering the idea of cleaning it up a bit and then reselling it myself.