Recently, I felt I needed to find a project to go along with my data mining course at U of I. The project needed to be (1) Fun, (2) Challenging, (3) Worthy of my time and potentially useful to a large number of users. To my delight a friend of mine, Robb Seaton, had already created a project dubbed bkz: https://github.com/robertseaton/bkz. It is a project designed to predict books which a user would like to read, clearly books are time consuming and its definitely worth it to use a recommendation method. Imagine, there are roughly 42 million books on Amazon and currently there are only a few ways in which to be recommended a book, through a friend, through Amazon (which does not exactly have a book recommendation tool) or perhaps Goodreads *now owned by Amazon. Unfortunately, each of these options do not take into account the mass number of books nor the taste of each reader. How many times have you been recommended a book and never had the desire to read past the first chapter? No longer.
Robb agreed to let me work on his project and I feel I can make a significant contribution. I had already given some pointers on where to take the project (my side hobby has been reading about financial/market analysis) and made a few suggestions. Now, I am getting my feet wet implementing features to gather more information. For example I recently added a feature to search via author as well as title and gather Amazons book rank. It took a bit to brush up on my Ruby programming after not programming in that language for a few years, but after a couple hours work there were clear results and I enjoyed it.
My hope is that at some point within the next year or two we can launch some sort of website/app which can recommend books to all users. One thing that has me troubled is the idea that the only way in which to properly predict a book to a user is to form either a map of similar users OR profile each user via personality and predict books based off their preferences using a random forest of their data or something. It will be interesting to see what we implement because I think the lightest weight, but slightly less accurate method may actually be better in our case with little funds to buy/rent servers.