Monday 29 April 2013

Finally 2 Karma On News Combinator

I joined news combinator 57 days before (from today i.e. 30th April). Right from the moment I joined it I was figuring out a way of how to increase Karma on this site. I googled same, searched on Quora too but nothing resulted in Success.

But just now I saw 2 written beside my name in news combinator and that feeling itself was awesome. But one question always comes out my mind "Why every site let it be URL aggregator like news combinator or simple question-answer frame-worked site's inculcate the system of Karma ??

Can't we have another system to control Spam??

Python Wikipedia Crawler To Get All Images Of A Page


Although there exist many frameworks to crawl web like Scrapy but there's no need of using all these frameworks for writing simple crawlers.

Aim of this tutorial is to write Wikipedia crawler which will crawl all images of given wikipedia page.

Requirements :

1. Python
2. BeautifulSoup - Python Package
2. urllib and urllib2 - Python Packages


Make sure that above packages are installed. Mostly Python and Urllib comes preinstalled on most version of Linux. To install BeautifulSoup type following command on terminal

>>pip install bs4

Have a look at the code here.

Steps Followed :

1. Send Request to wikipedia to connect to the site.So that data can be transferred.

2. The URL (variable site here) is opened and the source code of url is retrieved and stored into page

3. Finding img tag in whole page

4. Saving the iimages one by one in output folder

Please Contribute on git to make this crawler more useful.