Need an expert data scientist developer to develop a set of scrapers to scrape firmware files and their info from several vendors websites -- 2

A set of crawlers based on Scrapy framework that can download and synchronize all of products' firmware (including all versions) from web pages of a given list of predefined vendors and store the firmware information (meta data) in SQLite DB. The mandatory metadata fields include ( Manufacturer, Model, Version, Type, Name, Release Date (if available), Download link, ( calculated Sha2 hash of the file)i.e. ( Cisco, Video Surveillance 6030 IP Camera, 2.7.0, IP Camera, [login to view URL], 21/08/2015, "link", “Sha2” ) There is a non-mandatory binary field which indicates if the device is discontinued or not depending on the availability of such information on the website of the vendor. The firmware files itself will be stored in the file system and will be referenced in SQLite. The developer is required to follow DB schema and code templates provided by us. It's also the responsibility of the developer to test crawler and ensure completeness of the solution in terms of full coverage of the firmware files and product pages.

There are no GUI components on the server that runs crawlers. Therefore, headless browsing mode should be used.

Solution Scope

1. Crawlers will be written per vendor. This is required because each vendor website will have its own implementation of the firmware download page.

2. The user should be able to pause and resume crawling jobs.

3. Crawlers should detect previously downloaded files and only download updated and new content and firmware files. At first execution of each crawler, it will download all the available firmware files but the subsequent crawler runs will only download new firmware files which are added since the last crawling. This will be achieved by analysing data available in SQLite and skipping the files that have already been downloaded and processed.

3. The developer is required to manually analyze each provided vendor site before writing a crawler to identify the following required information:

a. URLs for the firmware download page including all of the firmware versions for each product

b. URLs/files for each product that include these info which are required to be scraped: "Manufacturer", "Model", "Version", "Type", "Release Date", "if the product is discontinued"

c. Credential Requirements (Simple Signups, Specific Signups, No Signups)

d. Any Captcha on the page

e. Any honeypot traps

4. If a vendor site requires credential for firmware download, the developer is required to sign up an account using a gmail address dedicated for this project

5. Script will try to imitate human like behaviour (to a limit) while scraping the web page as well as using Tor if required, so that if the vendor site has scraper/crawler detection logic implemented, it can be skipped. This can be achieved by adding random delays, random view time, avoiding honeypot traps through manual analysis

Solution Brief

*The crawler set is expected to contain 100 vendors ( each vendor could be pretty different from the others ) and the milestones are defined per vendor and each milestone is max 50€ which is paid after we verify the completeness of each crawler and see no errors. The developer MUST test the completeness of each crawler before delivering to us and present test completion evidence in the form of a populated SQLite database of that vendor.

*The NDA must be signed before the beginning of the project.

*Please only apply when you fully read and understand the project and agree with the conditions.

Keahlian: Scrapy, Web Scraping, Python, Ilmu Data, Pengembangan Perangkat Lunak

Tentang Pemberi kerja:
( 4 ulasan ) Brussels, Belgium

ID Proyek: #25625646

18 freelancer menawar dengan rata-rata €3948 untuk pekerjaan ini


Hello Zahra K., Please discuss with me more in details about your project (Need an expert data scientist developer to develop a set of scrapers to scrape firmware files and their info from several vendors websites - Lebih banyak

€3500 EUR dalam 36 hari
(9 Ulasan)

Hi I understand the firmware and I have built a couple projects using a scrappy and python and I can crawl thousand some website with my current system please send me a private chat message

€3176 EUR dalam 8 hari
(4 Ulasan)

Hello,    I'm data scientist with huge expertise and mathematician with a number of publications. Also I'm participant and problem writer of many algorithm competitions (Topcoder, ACM ICPC).         Feel free to cont Lebih banyak

€4000 EUR dalam 7 hari
(28 Ulasan)

Hi sir, I am a Web & Data Scraping Expert who have career for 6 years over. I am very happy to bid on your job. I have already worked on several similar projects for collect data & contact & business information such a Lebih banyak

€4000 EUR dalam 7 hari
(6 Ulasan)

Greetings, Encourage Infotech is a Team of Leading Python developers that specializes in the Scraping and Data mining Industry. our Previous work includes Scraping in the Retail and Gambling Industry to Analyse the Up Lebih banyak

€4000 EUR dalam 7 hari
(11 Ulasan)

I am a full-time Full Stack Freelance having 7+ years of experience and have team working on Web App Developer-Designer (Specialising in CRM, ERP, Ecommerce, Website Developing & Designing, Android Apps, Ios Apps, web- Lebih banyak

€4000 EUR dalam 7 hari
(4 Ulasan)

Hi there, I have several years of experience building advanced web scraping scripts. My price will €35,- per site and I will be able to create scrapers for the 100 vendors in 20 days. Hit me up with a PM if you're in Lebih banyak

€3500 EUR dalam 20 hari
(4 Ulasan)

Hi, I hope you are doing well! Infogrex Technologies is an IT technology company based in Hyderabad, India. We are a diverse team of Data Scientists, Market Researchers, Analysts, Programmers, and Project Managers. Lebih banyak

€3242 EUR dalam 10 hari
(1 Ulasan)

Hi, I hope you are doing well. I am a full-time freelancer working as a data science developer having 3+ years of experience and I worked with more than 100+ clients. Let's have a chat to discuss in detail as per your Lebih banyak

€3200 EUR dalam 30 hari
(1 Ulasan)

This will be going project better to put on hourly basis since every vendor website is different and every scrape code will be different from previous one. Although data collected will be dumped in necessary format, Lebih banyak

€4000 EUR dalam 7 hari
(1 Ulasan)

Hi, Manager! How are you? I have gone through your requirements and I am very pleased because this job is a good fit for my skill set. I have been working as a full-stack developer for 7+ years and I have a good experi Lebih banyak

€4000 EUR dalam 20 hari
(1 Ulasan)

Dear Sir,  Greetings for the day !!!  AYN INFOTECH is India's Fastest Growing IT Software Consulting Company with the latest Tech Stack powered by AI. We have a huge amount of experience in Web and application deve Lebih banyak

€5000 EUR dalam 45 hari
(0 Ulasan)

Hey!, I’ve carefully checked your requirements and really interested in this job. I’m full stack node.js developer working at large-scale apps as a lead developer with U.S. and European teams. I’m offering best qualit Lebih banyak

€4440 EUR dalam 7 hari
(0 Ulasan)

Narinder Alliance Technologies LLC An IT Consulting and Software Development company. We have a team expert in Web Designing, Application Development and Databases. We have worked on various projects across various in Lebih banyak

€5000 EUR dalam 120 hari
(0 Ulasan)

Hi, Relevant Skills and Experience:: PHP, Andriod, Software architecture, Mysql, Reactive Native,HTML, CSS, Bootstrap, Javascript, Jquery, Angular JS, Node JS, C programming, Python, Java, Wordpress, Drupal, Joomla, Lebih banyak

€4000 EUR dalam 7 hari
(0 Ulasan)

Hi I read your all project requirements carefully. I can start with NDA sign and finish it with correct scraping. After build scraping tool, I can manage and improve it's functions for a long term. Regards.

€4000 EUR dalam 30 hari
(0 Ulasan)

I am an Information Technology enthusiast with a Bachelor of Science degree in Information Technology.I specialize in data science with sub skills in Data Analysis and Visualization in Python,Excel and Tableau,software Lebih banyak

€4000 EUR dalam 3 hari
(0 Ulasan)

Hi, sir How are you? Thanks for taking your valuable time reviewing my proposal. I have seen your project description very carefully. I am pretty sure that I am the best candidate for this job. For why , I have a rich Lebih banyak

€4000 EUR dalam 35 hari
(0 Ulasan)