今天跑來聽 OSSF::自由軟體鑄造場 舉辦的 Python network programming -進階 課程,紀錄上課的心得,以及講師提到的一堆重點整理,分享給大家,收穫實在是太多了,本身在南部能聽到的課程就很少,一看到有開課程,就非常開心報名參加,講師對於上課準備的講義也很用心,學習到平常看書學不到的經驗跟實作。 1. 字串處理函式
# 字串轉換小寫 string.lower # 字串轉換大寫 string.upper # 切割字串 string.split # 合併字串 string.join # 找尋字串 string.find
底下來一個範例:
import string # 切割字串 print str.split(str.lower(str.upper("hi appleboy"))) # 合併 array print str.join("",str.split("hi appleboy")) # 找尋字串 print str.find("hi appleboy", "apple") # # 字串替換 template s = string.Template("$who likes $what") print s.substitute(who='appleboy', what='eat apple')
2. 日期處理
import time from datetime import date from datetime import datetime # 時間 d = date(2005, 7, 14) print d.isoformat() print date.today().isoformat() print d - date.today()這裡講師有提到說,在 import module 的時候,希望有用到的 module 在 import 進來就可以了,這樣可以增進效能,也可以避免不需要的 load,在很多 MVC 裡面,大部分很多套件都會預先載入,可是我們在寫程式真的有用到嗎,講師提到 java,當我們想要 System.output 輸出,載入的 module 就很多,造成系統讀取速度降低阿。
3. Random 亂數處理 可以參考官方網站文件:http://docs.python.org/library/random.html 4. bsddb — Interface to Berkeley DB library 教學網站 bsddb module 提供一個 interface 介面來連接 Berkeley DB,使用者可以隨意新增 ash、btree 或 record,可以利用 pickle.dumps() 或者 marshal.dumps() 儲存。 python 預設沒有安裝這個 module,底下可以利用 FreeBSD ports 安裝
# Python bindings to the Berkeley DB library cd /usr/ports/databases/py-bsddb; make install clean給一個範例:
import bsddb db = bsddb.btopen('/tmp/spam.db', 'c') for i in range(10): db['%d'%i] = '%d'% (i*i) # 印出第四筆資料 print db['3'] # 印出所有 key 值 print db.keys() # 輸出 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] # for 迴圈把 key & value 列出來 for k, v in db.iteritems(): print k, v
5. Regular Expression 直接舉例,分別取出 IP Address 四個欄位數字
import re phonePattern = re.compile(r'^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$') a = phonePattern.search('140.123.107.249').groups() for v in a: print v投影片資料:
* ^ matches the beginning of a string. * $ matches the end of a string. * \b matches a word boundary. * \d matches any numeric digit. * \D matches any non-numeric character. * x? matches an optional x character (in other words, it matches an x zero or one times). * x* matches x zero or more times. * x+ matches x one or more times. * x{n,m} matches an x character at least n times, but not more than m times. * (a|b|c) matches either a or b or c. * (x) in general is a remembered group. You can get the value of what matched by using the groups() method of the object returned by re.search.如果您在單一程式大量重複利用 regular expression,那可以用 re.compile(pattern[, flags]) object 來讓程式更有效率,學習 Regular Expression 好處很多,可以幫助您解決字串處理,擷取您想要的字串,底下推薦一些網站給大家:
zlib(gzip) & bz2 基本的壓縮介紹,gzip 速度優於 bz2,適合用於壓縮檔案在網路傳輸方面,那 bz2 適合用在系統備份部份,可以把檔案壓縮更小,其實用的地方不太一樣,各有優缺點,那 apache2 就有提供 mod_deflate 增進傳輸效能,大量的 request 如果經過 mod_deflate 壓縮,可以大大減少網路傳輸流量。 最後當然就是介紹 python 怎麼寫抓取網頁 tag 部份,以及 Unicode 中文編碼的一些介紹,相當不錯,老師給的範例,等於是提供一隻小型的程式,可以當作指令來用,也有寫 help 用法:
def help(): print ("Usage: %s [Option] [Location]") % os.path.basename(sys.argv[0]) print ("Option: ") print ("\t%2s, %-25s %s" % ("-h","--help","show this usage message.")) print ("\t%2s, %-25s %s" % ("-v","--version","print version number.")) print ("\t%2s, %-25s %s" % ("-c","--csv","print result in csv.")) print ("Location: ") for x in allpart.keys(): print ("\t%-12s"%x)這隻程式包含了整個 python 寫程式的基本功能,看完這個 code 可以大上把老師這次上課跟上一堂課程整合在一起,包含 thread、class…等,那底下是老師寫的範例,抓取各地溫度,輸出成 csv 檔案,以及 unicode 的處理:
#!/usr/bin/env python # # Copyleft, No Right Reserved. # # kevinwatt 2006/04/25 # import urllib import os,sys,re,string import time import threading version="cwb.py version 0426" TIMEOUT=1.0*10 # timeout for the operation in seconds MAX_THREADS=4 # max thread. Taipei=['Keelung','Taipei','Yangmingshan','Taoyuan','Sinwu','Hsinchu','Guanwu','Sanyi','Jhunan'] Tainan=['Tainan','Kaohsiung','Jiasian','Sandimen','Hengchun'] Yilan=['Yilan','Su-ao','Taipingshan','Hualien','Yuli','Chenggong','Taitung','Dawu','Lanyu'] Taichung=['Taichung','Wuci','Lishan','Yuanlin','Lugang','Sun-Moon-Lake','Lushan','Hehuan-Mountain','Huwei','Caoling','Chiayi','Alishan','Yushan'] Penghu=['Penghu','Kinmen','Matsu'] allpart={'Taipei': Taipei, 'Taichung':Taichung, 'Tainan':Tainan, 'Yilan':Yilan, 'Penghu':Penghu} listtitle=['zone','datetime','rep','temp','direct','ane','max_ane','km','humidity','hPa','Rmm','uvi'] class cwb(threading.Thread): "Get weather information from www.cwb.gov.tw" def __init__(self,zonename): threading.Thread.__init__(self) self.zonename=zonename self.reg=re.compile('<([^>]|\n)*>|
|
| | ') self.retab=re.compile('tabletype1-2') self.reconvspace=re.compile(' | ') def run(self): self.contents = self.getinfo() def urlcontent(self): try: b = urllib.urlopen("http://www.cwb.gov.tw/pda/observe/"+self.zonename+".htm",proxies={}).read() except: b = "Connection false" sys.exit(0) return b def getinfo(self): b=unicode(self.urlcontent(), "cp950") list=[] firsttarget="tabletype1-1" b=b[b.find(firsttarget)+len(firsttarget)+2:] b=b[:b.find('')] b=re.sub("\n+", "\n", self.reconvspace.sub(" ",self.reg.sub("",self.retab.sub("\>--",b)))) conlist=string.split(b,"\n") info=0 for x in conlist: if x[0:2]=='--' and info<2: info+=1 # list[1] is data time. list.append(x[4:]) elif x[0:2]=='--': list.append(re.sub('\s+','', x[4:])) return list def help(): print ("Usage: %s [Option] [Location]") % os.path.basename(sys.argv[0]) print ("Option: ") print ("\t%2s, %-25s %s" % ("-h","--help","show this usage message.")) print ("\t%2s, %-25s %s" % ("-v","--version","print version number.")) print ("\t%2s, %-25s %s" % ("-c","--csv","print result in csv.")) print ("Location: ") for x in allpart.keys(): print ("\t%-12s"%x) def count_active(tail): """ returns the number of Getter threads that are alive """ num_active = 0 for g in tail: if g.isAlive(): num_active += 1 return num_active def listprint(list,style): if style=="c": i=1 for x in list: if len(list)==i and i==12: print ('"%s"' % x) elif len(list)==i and i<12: print ('"%s",""' % x) else: print ('"%s",' % x), i+=1 else: for x in list: print ("%-8s" % x), print if len(sys.argv) == 1: help() sys.exit(2) # common exit code for syntax error else: if sys.argv[1:]: arglist=sys.argv[1:] getargv=arglist[0] if allpart.has_key(getargv): location=allpart[getargv] elif allpart.has_key(arglist[len(arglist)-1]): location=allpart[arglist[len(arglist)-1]] if arglist[0:] in (['--help'], ['-h'], ['--usage'], ['-?']): help() sys.exit(0) if arglist[0:] in (['-v'],['--version']): print version sys.exit(0) if arglist[0] in ('-c','--csv') and len(arglist)>1: style="c" else: style="nor" tail=[] try: # get location len(location) except: print "Error: Could not find location" help() sys.exit(2) for zone in location: while count_active(tail) >= MAX_THREADS: #print "too many active, others wait here." time.sleep(1) g=cwb(zone) tail.append(g) g.start() # execute cwb.run() #print "there are",threading.activeCount()-1,"connecton thread started" listprint(listtitle,style) for waterlist in tail: waterlist.join(TIMEOUT) # print waterlist.getName()+":", listprint(waterlist.contents,style)