Data comparison in batch

$250-750 USD

Ditutup

Dibuat

sekitar 10 tahun yang lalu

$250-750 USD

Dibayar ketika dikirim

Develop a mechanism & software to identify similar content in a huge base of articles. Input format from csv flat file. Output should tell which entries are similar with indicator of "similarity strength" Language for this program is flexible as long as it deliver the result. If you are interested, please give me a message and let me know how you want to start this. I can give you examples and our detail requirements.

Software Architecture

ID Proyek: 5489126

Tentang proyek

27 proposal

Proyek remot

Aktif 10 tahun yang lalu

Ingin menghasilkan uang?

Alamat email

Keuntungan menawar di Freelancer

Tentukan anggaran dan garis waktu Anda

Dapatkan bayaran atas pekerjaan Anda

Uraikan proposal Anda

Gratis mendaftar dan menawar pekerjaan

27 freelancer menawar dengan rata-rata $527 USD untuk pekerjaan ini

@theincredible

Dear sir, I am really interested in development of this project, I have strong programming skills in several languages, so I have many options to develop this application. Thanks and regards, Yasser

$250 USD dalam 2 hari

5,0

(132 ulasan)

7,1

@RotorProgrammer

Hello. I'm interested in your project since I have experience in searching similar content in databases. Please give detailed requirements. Thanks.

$600 USD dalam 20 hari

5,0

(33 ulasan)

6,1

@anuyadav1

i can make this as a python script . .

$450 USD dalam 10 hari

4,8

(79 ulasan)

6,2

@rsoftsl

This has a lot to do with my work for my Master's thesis which was in the field of Artificial Intelligence applied in Linguistics. If the article are in English, my first idea would be to first do a part of speech tagging and then only compare the sets of words that are relevant to your purpose - for example, proper nouns. After that, something like a modified version of the Lesk algorithm might show some good results. Would you rather this to be something that you can run from a server, like a PHP script, or a Windows program? How many articles are there? Can you give me some examples of entries that you consider similar and entries that you don't consider similar? The similarity measure is a subjective function.

$600 USD dalam 10 hari

5,0

(45 ulasan)

5,7

@ghazalpasha

I might be able to this project using locality sensitive hashing or compressed sensing methods, depending on the details of you dataset. Please send me a few examples and I'll let you know if this is possible.

$333 USD dalam 10 hari

5,0

(43 ulasan)

5,9

@avinav606lx

Hey There, Thanks you for posting the project overview. It looks very feasible and I am interested to do it. Next steps: Lets discuss more about the requirements/data input/output and and and I start the work accordingly. I am an excel/access VBA automation professional (Data Analyst) having more then 5+ years of experience in the same domain. Please consider and contact me for further discussion. I am available online to take any further queries. Thanks, Abhinav

$283 USD dalam 5 hari

5,0

(53 ulasan)

5,7

@dpune

Hi, I have more than 14 years of exp and I am expert in this kind of work. I have completed more than 200 projects. Please look at the feedback left by my employer to know more about my work. Waiting for your positive response. Thanks.

$277 USD dalam 10 hari

4,9

(89 ulasan)

5,9

@huanjason

I would like to give this a try. Please send me details on "similarity strength" and some sample input data. I will try to come up with a prototype in 2-3 days.

$277 USD dalam 5 hari

4,7

(21 ulasan)

5,7

@northwolves

Hi I've completed many projects before but I'm not very sure what you need now. provide some examples and your needs so that I can understand them clearly. Thanks Zhining

$444 USD dalam 10 hari

5,0

(26 ulasan)

4,3

@vikasneha

I have clearly read and understood your project requirements. I have a rich experience of Team Lead for 2+ years with a total experience of 6+ years. I am responsible for managing teams, writing Frameworks and Scripts in Python. I have recently completed several Projects in Python on oDesk, Elance and Freelancer with excellent (5 star) rating. I am in Top 10% (11th rank) amongst Python test takers at oDesk, Elance and Freelancers. Assure you of accurate and on time delivery of work with utmost quality. Please see my profile and portfolio. I assure you I am the one you are looking for as a Python Developer. Looking forward to work with you. Thanks, Vikas

$250 USD dalam 7 hari

4,6

(5 ulasan)

4,0

@designvin

Hello, I have a good experience with Python and Ruby and I wish to know more about this project. Can you please send me some more details.. Thanks Vinod

$555 USD dalam 30 hari

5,0

(6 ulasan)

3,8

@Martiny

Greetings. You have interesting project and I suppose to use Perl for data comparison program development on web base. I am ready to help you and solve your task in time and in budget.

$530 USD dalam 25 hari

4,8

(6 ulasan)

3,6

@grigiq

Hello, I am interested to work with you on this project. I would choose C++ for this project as it is a quiet fast programming language. I would need more details about the "similarity strength" and what exactly you would expect it to be. I hope we could have a nice experience on working on this project. Respectfully, Grig

$388 USD dalam 10 hari

5,0

(9 ulasan)

3,6

@bistanil98

I have check this requirement,have some query,so need to discuss this,please tell me how we can start the discussion. to know more about us please check Private Message. We have a team of professionals,they have more than 11 year of experience,so we can manage this work and will give you quality solution.

$600 USD dalam 17 hari

2,5

(37 ulasan)

6,8

@chrismit7

Hi, I have been developing in python for over 10 years, and have experience in Natural Language Processing. To accomplish this, I intend to create a statistical model based on the word distribution of your articles and use that as a comparative metric. This will remove the need to do a pairwise comparison of every file (while is impossible for a large data base) and will be rather quick. If you have any questions or wish to see some of my work, don't hesitate to ask. Thank you, Chris

$555 USD dalam 4 hari

4,6

(2 ulasan)

3,6

@jonashelgemo

I have a masters degree in applied mathematics and have worked with these kinds of problems previously. I will probably use Python or Ruby for the data processing.

$888 USD dalam 30 hari

5,0

(2 ulasan)

2,3

@tony9413

use python3 to do this. we can test it first. running in windows or linux. also can have gui to dispaly result.

$333 USD dalam 7 hari

0,0

(0 ulasan)

0,0

@runsepprun

Hello, I implemented a similar project a few years ago, but it was a bit more complex. It processed thousends of textual documents with a bunch of distributed computers. Obviously I have enough experience with Information Retrieval techniques. Based on your description, I would first automatically clean up the document (remove punctuation etc.) and then extract the pure words. These words a combined to n-grams (for n=1, .., m; with a user defined m), weighted with "term frequency - inverse document frequency" and finally the documents are compared with cosine similarity. This produces a score from 0 (not similar) to 1 (equal). Based on the score, it is possible to identify all documents DS wich are similar to D by thresholding the score. And of course it is possible to identify the k most similar documents. I would implement it with Python and the scikit-learn package (BSD licens). If you have any questions, do not hesitate and send me a message. Sincerely, Sebastian

$500 USD dalam 3 hari

0,0

(0 ulasan)

0,0

@tiranofice

A proposal has not yet been provided

$444 USD dalam 10 hari

0,0

(0 ulasan)

0,0

@nextch

Hello, I would make it with Python for sure. You can consider putting all the data in a relational database (eg. sqlite) to speed up search queries, repeat queries, analyze data etc. Because CSV files is not handy for that.

$555 USD dalam 3 hari