|
|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV SYS-CON.TV WEBCASTS |
TOP COLDFUSION LINKS CF Techniques
Collaborative Filtering
Using predictive analysis to make recommendations
By: Joe Danziger
Digg This!
Collaborative filtering on the Web has existed for a long time, dating all the way back to the original incarnations of sites like CDNow and Amazon.com. Recommendation systems are a powerful tool for businesses to extract additional value from their e-commerce and customer databases. They benefit customers by enabling them to find products they like, and help businesses by generating more sales. We're going to look at some of the basic principles of predictive systems and introduce some methods you can utilize to make recommendations in your own applications. Along the way, we'll attempt to point out the benefits and limitations of each type of system. Basic Predictions This scenario can provide quick, quality recommendations as the computer is not guessing at the association and also does not have to perform any on-the-fly calculations. The technique suffers, however, by requiring your product administrator to have a deep knowledge of the products in your store, which may be unrealistic for larger sites. It also requires you to continuously update the "related items" lists of older items as new products are added to the catalog. User-Based Collaborative Filtering We'll assume an Items table and a Users table in the database with respective primary keys of ItemID and UserID, and we'll rate using a scale of 1 (lowest) to 5 (highest). You can go as high as you'd like, though statistically there's not much value in going above 7. The system will determine, on the fly, a community of like users whose ratings of items most closely match those of the current user. We'll set up a sample table of five users providing ratings for each of the colors of the rainbow (Figure 1).
![]() To determine our community of users, we'll use the "Mean Squared Differences (MSD)" algorithm. This measures the degree of dissimilarity between two user profiles. Squaring adds more weight to the larger differences, which is appropriate since points further from the mean may be more significant (we care more about things that a user has a positive or negative feeling about versus items they are ambivalent about). To perform the calculation in laymans' terms: take the difference between the two users' rankings on each item that they have both rated, square that number, add those all up, and take the average. The lower the result, the closer that user's preferences are to the current user. Listing 2 provides the query used to determine the community of users with the lowest mean squared difference to the user. Figure 2 provides the results of the query and the MSD values. We'll use a TOP value of 5 at the beginning of our query to display only the five most similar users to userID 1.
![]() We're going to use the three most like-minded users to come up with predictions on what colors this user would like. We see from Figure 2 that our three closest neighbors are Mike, Laura, and Sam, since they have the lowest MSD values. Products that this community likes most will then be recommended to the user, as he will probably also like them. We loop over each member in the community and assign a weighted rating (based upon their MSD value) to each of the other items that they have rated (see Listing 3). These weighted ratings from the query in Listing 3 are then inserted into a database table (see Listing 4) to aid with our calculations. Now that we have all of our weighted ratings in the database, we total up the weighted ratings and divide by the total MSDs to give us the items with the highest weighted averages that have not already been rated by the user (see Listing 5). Our final results are shown in Listing 6. Although this is a simplified example, it allows us to see where our recommendations come from. Better predictions would be gained by increasing the neighborhood size (up to a point), so you should experiment to find a reasonably large neighborhood size that does not significantly affect processing time. Since we were using a scale of 1-5, the higher the weighted average for the prediction, the more likely this user is to desire this item (or color in our case). Although we used the Mean Squared Differences algorithm, there are several other mathematical formulas each with their own drawbacks and limitations. The model presented could easily be modified to provide recommendations of favorite artists, authors, or whatever your site calls for. You could also base recommendations on the demographics of your users, or you may want to provide an explicit survey for all of your users to fill out to gain knowledge of your users' preferences on whatever topic your site deals with. Drawbacks of User-Based Collaborative Filtering Item-Based Collaborative Filtering This is the simplest way to provide quality item-based recommendations. It should perform quickly on the fly, but could always be run offline as a scheduled job for your entire database. A more in-depth discussion is beyond the scope of this article, but you can visit the link below for articles that will lead you in the right direction. Conclusion Credit should also be given to Peter Boot who put out the first collaborative filter custom tag back in 2001. For more info on the science of collaborative filtering, you can visit http://jamesthornton.com/cf/ to find links to more than 40 articles and research papers that deal with the subject. Much research continues to be done on the science of determining which collaborative filtering algorithms work best. CFDJ LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||