Python学习笔记（七）——数据可视化

1 根据地理编码数据创建 Google 地图应用

PythonLearningNote7_1

注意：例中所用所有完整代码可在这里查看

where.data：存放需要检索的地理位置
geoload.py：将地理位置与通过检索得到的数据信息放入数据库 geodata.sqlite
geodump.py：从数据库 geodata.sqlite 中读取数据并将 GPS 坐标和地名以 JSON 格式写入 where.js

geoload.py：

import urllib
import sqlite3
import json
import time
import ssl

# If you are in China use this URL:
# serviceurl = "http://maps.google.cn/maps/api/geocode/json?"
serviceurl = "http://maps.googleapis.com/maps/api/geocode/json?"

# Deal with SSL certificate anomalies Python > 2.7
# scontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
scontext = None

conn = sqlite3.connect('geodata.sqlite')
cur = conn.cursor()

cur.execute('''
CREATE TABLE IF NOT EXISTS Locations (address TEXT, geodata TEXT)''')

fh = open("where.data")
count = 0
for line in fh:
    if count > 200 : break
    address = line.strip() # 移除字符串首尾的空白
    print ''
    cur.execute("SELECT geodata FROM Locations WHERE address= ?", (buffer(address), )) # buffer()用来转换Unicode

    try:
        data = cur.fetchone()[0]
        print "Found in database ",address
        continue
    except:
        pass # 跳出循环

    print 'Resolving', address
    url = serviceurl + urllib.urlencode({"sensor":"false", "address": address})
    print 'Retrieving', url
    uh = urllib.urlopen(url, context=scontext)
    data = uh.read()
    print 'Retrieved',len(data),'characters',data[:20].replace('\n',' ')
    count = count + 1
    try: 
        js = json.loads(str(data))
        # print js  # We print in case unicode causes an error
    except: 
        continue

    if 'status' not in js or (js['status'] != 'OK' and js['status'] != 'ZERO_RESULTS') : 
        print '==== Failure To Retrieve ===='
        print data
        break

    cur.execute('''INSERT INTO Locations (address, geodata) 
            VALUES ( ?, ? )''', ( buffer(address),buffer(data) ) )
    conn.commit() 
    time.sleep(1)

print "Run geodump.py to read the data from the database so you can visualize it on a map."

geodump.py：

import sqlite3
import json
import codecs

conn = sqlite3.connect('geodata.sqlite')
cur = conn.cursor()

cur.execute('SELECT * FROM Locations')
fhand = codecs.open('where.js','w', "utf-8") # 用codecs提供的open方法来指定打开的文件的语言编码，它会在读取的时候自动转换为内部unicode
fhand.write("myData = [\n")
count = 0
for row in cur :
    data = str(row[1])
    try: js = json.loads(str(data))
    except: continue

    if not('status' in js and js['status'] == 'OK') : continue

    lat = js["results"][0]["geometry"]["location"]["lat"]
    lng = js["results"][0]["geometry"]["location"]["lng"]
    if lat == 0 or lng == 0 : continue
    where = js['results'][0]['formatted_address']
    where = where.replace("'","")
    try :
        print where, lat, lng

        count = count + 1
        if count > 1 : fhand.write(",\n")
        output = "["+str(lat)+","+str(lng)+", '"+where+"']"
        fhand.write(output)
    except:
        continue

fhand.write("\n];\n")
cur.close()
fhand.close()
print count, "records written to where.js"
print "Open where.html to view the data in a browser"

2 PageRank[^1]

PythonLearningNote7_2

PythonLearningNote7_4
Pages之间是多对多的关系（如果 a 页面中有 b 页面的链接，那么 a 页面是 from_page，b 页面是 to_page），所以所以中间添加一张连接表

简化后的表：
PythonLearningNote7_3

[^1]: Google 创始人 Larry Page 和 Sergy Brin 关于 PageRank 算法早期思想的论文《The Anatomy of a Large-Scale Hypertextual Web Search Engine》http://infolab.stanford.edu/~backrub/google.html