(See Getting Started with SFrames for setup instructions)
import graphlab
# Limit number of worker processes. This preserves system memory, which prevents hosted notebooks from crashing.
graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)
song_data = graphlab.SFrame('song_data.gl/')
Music data shows how many times a user listened to a song, as well as the details of the song.
song_data.head() # 查看表格前几行
graphlab.canvas.set_target('ipynb')
song_data['song'].show()
len(song_data)
unique()
:去掉重复的 user_id,只输出不同的 user_id
users = song_data['user_id'].unique()
len(users) # 查看有多少个不同的用户
train_data,test_data = song_data.random_split(.8,seed=0) # 将数据分成训练数据和测试数据,其中训练数据占 80%,seed=0 可以使每次运行得到相同的结果
popularity_model = graphlab.popularity_recommender.create(train_data,
user_id='user_id',
item_id='song')
A popularity model makes the same prediction for all users, so provides no personalization.
popularity_model.recommend(users=[users[0]])
popularity_model.recommend(users=[users[1]]) # 基于流行度的推荐系统会为每个用户推荐完全一样的最流行的歌曲
We now create a model that allows us to make personalized recommendations to each user.
# 使用相似度推荐系统
personalized_model = graphlab.item_similarity_recommender.create(train_data,
user_id='user_id',
item_id='song')
As you can see, different users get different recommendations now.
personalized_model.recommend(users=[users[0]])
personalized_model.recommend(users=[users[1]])
personalized_model.get_similar_items(['With Or Without You - U2'])
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])
We now formally compare the popularity and the personalized models using precision-recall curves.
if graphlab.version[:3] >= "1.6":
model_performance = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
graphlab.show_comparison(model_performance,[popularity_model, personalized_model])
else:
%matplotlib inline # inline 可以使 matplotlib 显示在 notebook 上,matplotlib 的功能是计算模型的性能
model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=.05)