python实现网页链接提取的方法分享

代码如下:

#encoding:utf-8import socketimport htmllib,formatterdef open_socket(host,servname): s=socket.socket(socket.af_inet,socket.sock_stream) port=socket.getservbyname(servname) s.connect((host,port)) return shost=”host=input(‘请输入网址\n’)mysocket=open_socket(host,’http’)message=’get http://%s/\n\n’%(host,)mysocket.send(message)file=mysocket.makefile()htmldata=file.read()file.close()parser=htmllib.htmlparser(formatter.nullformatter()) parser.feed(htmldata)print ‘\n’.join(parser.anchorlist)parser.close()

Posted in 未分类

发表评论