Kjwon’s Web_Crawler (using sqlite3)

Usage:
Web_Crawler <seed_uri> [domain]
ex> Web_Crawler http://www.google.com/search?q=query google.com
functions:

UnescapeURI:
http://server.domain/page?name=%76%61%6c%75%65 -> http://server.domain/page?name=value

GetHtml:
It downloads content

FindNode:
It finds “href=” and save URIs to database

Parsing:
It finds unparsed uri from database and call GetHtml and FindNode

URItoFilename:
http://server.domain/page?name=value -> server.domain/page/name=value

CheckDomain:
Match URI and base domain

cfile9.uf.151C4F484D68A962064776.rar
cfile4.uf.1741DA4C4D68A96003CE98.exe

2011/02/08 – [Computer/Programing] – Kjwon15’s Web Crawler (WebBot)

kjwon15

I'm a hacker, I want to improve life.

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.