Kjwon’s Web_Crawler (using sqlite3)

Usage:
Web_Crawler <seed_uri> [domain]
ex> Web_Crawler http://www.google.com/search?q=query google.com
functions:

UnescapeURI:
http://server.domain/page?name=%76%61%6c%75%65 -> http://server.domain/page?name=value

GetHtml:
It downloads content

FindNode:
It finds “href=” and save URIs to database

Parsing:
It finds unparsed uri from database and call GetHtml and FindNode

URItoFilename:
http://server.domain/page?name=value -> server.domain/page/name=value

CheckDomain:
Match URI and base domain

cfile9.uf.151C4F484D68A962064776.rar
cfile4.uf.1741DA4C4D68A96003CE98.exe

2011/02/08 – [Computer/Programing] – Kjwon15’s Web Crawler (WebBot)

Kjwon15’s Web Crawler (WebBot)

Usage:
Web_Crawler <seed_uri> [domain]
ex> Web_Crawler http://www.google.com/search?q=query google.com
functions:

UnescapeURI:
http://server.domain/page?name=%76%61%6c%75%65 -> http://server.domain/page?name=value

GetHtml:
It downloads content

FindNode:
It finds “href=” and save URIs to database

Parsing:
It finds unparsed uri from database and call GetHtml and FindNode

URItoFilename:
http://server.domain/page?name=value -> server.domain/page/name=value

CheckDomain:
Match URI and base domain

cfile24.uf.1738124F4D68A95F0A5239.rar
cfile30.uf.141A6A4A4D68A95F2D2DF7.exe