Although there exist many frameworks to crawl web like Scrapy but there's no need of using all these frameworks for writing simple crawlers.
Aim of this tutorial is to write Wikipedia crawler which will crawl all images of given wikipedia page.
Requirements :
1. Python
2. BeautifulSoup - Python Package
2. urllib and urllib2 - Python Packages
Make sure that above packages are installed. Mostly Python and Urllib comes preinstalled on most version of Linux. To install BeautifulSoup type following command on terminal
>>pip install bs4
Have a look at the code here.
No comments:
Post a Comment