Create re-usable spider to scrape information from website

sedang Berlangsung Dipasang Aug 26, 2014 Dibayar saat pengiriman
sedang Berlangsung Dibayar saat pengiriman

We need a re-usable script to iterate through many web pages to pull a table of information from each page.

The script will need to iterate through a list of 600,000 URLS, not every URL will return a table of data, so we need to record just those that return valid data.

It is very important not to crash the website that is being scraped, so a delay of 2-3 seconds between each request to the server must occur.

The results of the scraping should be stored in a csv file.

Python

ID Proyek: #6372625

Tentang proyek

Proyek online Aktif Aug 26, 2014