Website crawler and extractor

sedang Berlangsung Dipasang Aug 23, 2014 Dibayar saat pengiriman
sedang Berlangsung Dibayar saat pengiriman

Please make sure you read this job description completely before you apply. There are many details that you must know.

I am looking for someone who can create a desktop software that will extract and recreate a website archived on [login to view URL]

The main steps the software must follow are:

• Crawl the specific website to find all of its URL available in [login to view URL]

• Download all pages and documents including video, audio, PDF, images, CSS, javascript etc

• Modify the URL structure (all pages and documents must be linked together correctly), so that the website can be uploaded to another domain/directory.

• Save the results in a ZIP file

Important:

• It MUST work for ALL websites, including websites that use server side programmation, such as PHP, ASP and databases. (I know this sounds challenging, but if you apply for this job or contact me, I will show you a shortcut for this).

• All dead links must be redirected to the root directory, using a permanent 301 redirection (this is a simple line of code to add in a htaccess file)

• Some content will need to be included in a [login to view URL] file in the root directory (I will provide this content).

• External links must work

• The sites extracted from [login to view URL] must work on a different domain/directory than the original one

• You need to extract the content up to 10 pages deep in the website.

• There must be no [login to view URL] footprint left in the source code when I upload a website to a domain.

• The URLs that appear in the navigation bar of a web browser must be the same as they were on the original site. (if uploaded to the same domain of course, otherwise only the domain name is different in the url)

Keep reading, there is more important information below.

The input for the software are:

• The url from the wayback machine, ex: [login to view URL]://[login to view URL]

• The path where I want to save the ZIP file on my computer

The oupput:

• A zip file that contains all the files (html, image, video, wave, etc) that I can upload to the domain/directory of my choice.

To summarize, I want a software that allows me to extract a website from [login to view URL] in a ZIP file so that I then upload and extract this ZIP on a domain and the website is immediately working perfectly without any work required from me.

Keep reading...

If you apply for this job I will show you a couple of websites that were extracted using this method, so you can see how it was done.

IMPORTANT: When you apply for this job, please have "ARCHIVE" in the title of your job application or cover letter.

If I don't see ARCHIVE in the title I will not open it and you don't get the job. I have to do this to make sure freelancers read the job description before they apply.

Additionnal note:

I will be the only owner of the resulting software and I request to receive all source files.

Pengembangan Perangkat Lunak Web Scraping

ID Proyek: #6359986

Tentang proyek

10 proposal Proyek online Aktif Sep 9, 2014

10 freelancer rata-rata menawar $368 untuk pekerjaan ini

mhmhz

Hi Ready to provide desktop solution if you are interested. Thanks

$789 USD dalam 5 hari
(72 Ulasan)
7.1
aliarshad9691

Hi, I read your requirements, I want to discuss it with you. I can develop such a system for you. Please let me know when you are free to discuss the project details. I am intested in developing an online system t Lebih banyak

$222 USD dalam 7 hari
(2 Ulasan)
5.4
Blackhatwarrior

ARCHIVE Hi there, I'm automation expert and have create many window based applications for automation. Do send more info regarding the project. Will give it a shot. Awaiting your response! Thanks!

$450 USD dalam 3 hari
(12 Ulasan)
4.6
KomalKosare

A proposal has not yet been provided

$327 USD dalam 10 hari
(1 Ulasan)
2.1
balamuralip

hi sir i can do this job and write a crawler or extracter covering all your requirements looking forward to work with you balamurali

$277 USD dalam 3 hari
(0 Ulasan)
2.4
sflogics

Hi, We’ve had a good look at your description and we’re very interested in providing a solution. We are a professional development company head-quartered in Pakistan. Potentially we have deployed data processing s Lebih banyak

$200 USD dalam 30 hari
(0 Ulasan)
0.0
socfocus

I will give you your final deliverable prior to payment. You won't pay unless you are 100% satisfied. Please see my resume I have programmed for some of the largest companies in the world. I am bidding this job chea Lebih banyak

$740 USD dalam 30 hari
(0 Ulasan)
0.0