Find Jobs
Hire Freelancers

Break up Wikipedia pages into articles

€8-30 EUR

Dibatalkan
Dibuat lebih dari 6 tahun yang lalu

€8-30 EUR

Dibayar ketika dikirim
PRELIMINARY NOTE: this project require parsing content from Wikipedia. Wikipedia is licensed under CC, so this is not only perfectly legal, but encouraged. There is a page on Wikipedia to give users advice on how to do exactly this, and we won't be scraping the website, but using a downloadable version that Wikipedia themselves provide. --- We need to break up every page on Wikipedia into multiple articles. For instance, this article: [login to view URL] is already divided into: Contents [hide] 1 Etymology 2 History 2.1 Prehistory 2.2 Bronze Age 2.3 Iron Age 2.4 Migration period 2.5 Viking Age 2.6 Kalmar Union 2.7 Union with Denmark 2.8 Union with Sweden 2.9 Dissolution of the union 2.10 First and Second World Wars 2.11 Post-World War II history 3 Geography 3.1 Climate 3.2 Biodiversity 3.3 Environment 4 Politics and government 4.1 Administrative divisions 4.2 Judicial system and law enforcement 4.3 Foreign relations 4.4 Military 5 Health 6 Economy 6.1 Resources 6.1.1 Oil fields 6.2 Transport 7 Demographics 7.1 Migration 7.1.1 Emigration 7.1.2 Immigration 7.2 Religion 7.3 Largest cities of Norway 7.4 Education 7.5 Languages 8 Culture 8.1 Human rights 8.2 Religion 8.3 Cinema 8.4 Music 8.5 Literature 8.6 Research 8.7 Architecture 8.8 Art 8.9 Cuisine 8.10 Sports 9 International rankings 10 See also 11 Notes 12 References 13 Bibliography On Wikipedia, the links point to an area of the page. Instead, we need to have the area of the page like a standalone article, so that we can import it as a module. We need to generate—for each article extracted from the page—a JSON file or database entry with metadata like the page title and the category the page was filed under, and an array of the articles generated (including the article introduction, which is not under "Contents"). If opting for JSON files, we could have a folder with the articles saved into individual HTML files (for instance, "1 [login to view URL]", "2 [login to view URL]", "[login to view URL]" for the introduction). We also need to generate a JSON file with the tree of all categories on Wikipedia. Being CC, anyone can download Wikipedia. It will be needed to parse the ZIM file with all articles. We will be using the Italian version (downloadable here: [login to view URL], file [login to view URL]). While the locale shouldn't matter, we will ultimately need to populate Imparato with contents from the Italian version. The software should ideally run from the command line on Unix systems, something like: zim-extract-categories --zim-file [login to view URL] --dest . zim-extract-articles --zim-file [login to view URL] --dest . --category 22
ID Proyek: 15237610

Tentang proyek

2 proposal
Proyek remot
Aktif 7 tahun yang lalu

Ingin menghasilkan uang?

Keuntungan menawar di Freelancer

Tentukan anggaran dan garis waktu Anda
Dapatkan bayaran atas pekerjaan Anda
Uraikan proposal Anda
Gratis mendaftar dan menawar pekerjaan
2 freelancer menawar dengan rata-rata €79 EUR untuk pekerjaan ini
Avatar Pengguna
I am an experienced Wikipedia editor and I have published more than 100 pages. I have created pages for clients on Freelancer.com as well. You can visit my profile and have a look at their reviews. Relevant Skills and Experience I ensure you that your page would go live and stay there forever. I am fully aware of how to write the content for the Wiki page and how to publish the page according to their policies and rules. Proposed Milestones €150 EUR - First for research and content writing Additional Services Offered €70 EUR - For publishing the same page in any other language (if translation is provided) I've questions but can't ask here due to the word limit. Send me a message so we can discuss.
€150 EUR dalam 1 hari
5,0 (22 ulasan)
5,6
5,6
Avatar Pengguna
Hello, my name is Antonio and i'm a native italian translator, copywriter and e-commerce specialist. Competenze ed esperienze rilevanti Check my reviews to learn more about me and don't hesitate to contact me for any need. Thank you for attention! Antonio Pietre Miliari proposte €8 EUR - ITALIAN
€8 EUR dalam 1 hari
5,0 (1 ulasan)
0,9
0,9

Tentang klien

Bendera ITALY
Italy
0,0
0
Memverifikasi Metode pembayaran
Anggota sejak Jan 22, 2016

Verifikasi Klien

Terima kasih! Kami telah mengirim Anda email untuk mengklaim kredit gratis Anda.
Anda sesuatu yang salah saat mengirimkan Anda email. Silakan coba lagi.
Pengguna Terdaftar Total Pekerjaan Terpasang
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Memuat pratinjau
Izin diberikan untuk Geolokasi.
Sesi login Anda telah kedaluwarsa dan Anda sudah keluar. Silakan login kembali.