Building a Citation Tracker

$500-5000 USD

Dibatalkan

Dibuat

sekitar 15 tahun yang lalu

$500-5000 USD

Dibayar ketika dikirim

I would like to build a website for tracking and storing citation papers. There are services like Google Scholar, Live Academic, Citeseer and so on but they are good only up to a point: they do not keep "clean" citations, they only index academic papers (and not for example academic seminars, or popular press mentions), and they do not keep copies of papers that may stop being available online. So the service should offer the following capabilities: * The system should support multiple users. * A user should be able to register and then enter the list of publications that they want to track. * For each publication we have a title, a list of authors, a venue type (conference, journal, workshop), a venue title, publication date/year, and some additional details like volume, month, number, acceptance rate etc. Additionally, we may want to have some fields like DOI identifier, pointer to PDF, pointer to PPT slides, and so on. * The paper should also be able to be tagged (e.g., add a "project name") * Each publication should have a list of "cited by" papers. These are the papers that cite a given paper, and they can be entered by the user. (See also one list of additions that can make this process better.) * In addition to other publications, the system should also allow to add extra types of citations (e.g., "Patent", "Thesis project", "Used in academic seminar", "Mentioned in a blog", "Mentioned in press", and so on). Since many of these types of citations are transient and the webpage may disappear, we want to have the ability to store these pages. * The system should have a "report" function, in which it generates a report with the total number of citations for each paper, potentially broken down by type, and so on. I can provide an example of a citation report. * Ability to import/export publications as BibTex entries both for the papers of the user, and for the papers that cite them. ## Deliverables Building a Citation Tracker I would like to build a website for tracking and storing citation papers. There are services like Google Scholar, Live Academic, Citeseer and so on but they are good only up to a point: they do not keep "clean" citations, they only index academic papers (and not for example academic seminars, or popular press mentions), and they do not keep copies of papers that may stop being available online. So the service should offer the following capabilities: * The system should support multiple users. * A user should be able to register and then enter the list of publications that they want to track. * For each publication we have a title, a list of authors, a venue type (conference, journal, workshop), a venue title, publication date/year, and some additional details like volume, month, number, acceptance rate etc. Additionally, we may want to have some fields like DOI identifier, pointer to PDF, pointer to PPT slides, and so on. * The paper should also be able to be tagged (e.g., add a "project name") * Each publication should have a list of "cited by" papers. These are the papers that cite a given paper. * In addition to other publications, the system should also allow to add extra types of citations (e.g., "Patent", "Thesis project", "Used in academic seminar", "Mentioned in a blog", "Mentioned in press", and so on). Since many of these types of citations are transient and the webpage may disappear, we want to have the ability to store these pages. * The system should have a "report" function, in which it generates a report with the total number of citations for each paper, potentially broken down by type, and so on. I can provide an example of a citation report. * Ability to import/export publications as BibTex entries both for the papers of the user, and for the papers that cite them. These are the minimum requirements for the system. Here are a few of the desired features: * Ability to register "alerts / google queries" for each paper: The system should periodically query the web to discover pages that mention that paper. The user should be able to take a look at the matching pages and mark them as "keep" or "ignore" or "save for later". Of course, "ignored" pages should not come up again in the future results. * Install it on Amazon EC2 and use SimpleDB as the supporting database. * Integrate with academic search engines such as Google Scholar, Citeseer, and so on. When a new citation appears for the given paper, the system should generate a notification for the user, and the user should have the ability to say "keep" or "ignore" or "save for later". * Integrate the system with Amazon Mechanical Turk, asking people to verify completeness of the entry, fix errors, augment incomplete entries, and so on. * It would be nice to have the ability to export a LaTeX file that can be compiled to generate the citation report. I can provide the template, which can be very easily generated programmatically given the information in the database. Let me detail here the technical specifications. I list the main 4-5 components of the system and their functionality. # User database * A user should have a username and password to login to the system. When they first login, they give their name and their email address. The user should have the ability to retrieve the password using a "Forgot Password" link, which will send the password to the email (or a link to reset the password) # Creating a record of publications for a user * The user after logging in should have the ability to enter his publications. A publication can be a journal article, a conference paper, a workshop paper, a book chapter, or be unpublished/technical report/working paper. * Each publication has a set of (one or more) authors, a title, a publication date, a venue, a DOI, a URL pointing to the PDF, and a URL pointing to the related slides. It would be nice to make the architecture extensible, so that we can add easily new fields. * The website should allow the author to enter his own publications manually, plus have the ability to import publications from existing sources. At a minimum the website should support import from: - BibTex entries - DBLP (a user gives the URL for the author page, the website retrieves the bibtex entry for each paper from DBLP and imports them). Note: DBLP exports all its data as a big XML file at [login to view URL] - Google Scholar (a user gives the query for Google Scholar, the website retrieves the bibtex entries for each article, import them). Note: Go to Google Scholar Preferences and enable the "Import citations into bibtex" - Optional: Support SSRN author pages * Given that there will be duplicates if the author allows imports from multiple sites, the website should support a "merge" function, where the user decides which publications should be merged. The user can select which field value to keep if the two "merged" entries have different values # Import and check for citations * For each publication in the users profile, the website should be able to query Citation Engines, and identify the publications that cite it.? - Google Scholar: Each publication has a link "Cited by XXX". By clicking on this link, we get a list of papers that cite the publication. These publications should be imported in the system and be marked that they cite the publication. The user should be able to mark each such publication as "keep", "ignore", or "examine later". If the publication has been already imported earlier in the system and marked as "keep" or "ignore" the system should not notify the user again. Instead only the "new" publications should appear. The user should also have the option of revisiting the "ignored"/"keep" publications and changing their status. Since Google Scholar sometimes splits the same publication into multiple entries, the user should also have the ability of "merging" multiple Google Scholar entries into a single one. Since the papers may disappear from the web, the website should keep local copies. - Web search: For each publication, the user can generate a set of queries that will be user to query Google and monitor the web for pages that match these queries. Usually the user should just enter the title of the publication in brackets as the query, but sometimes it may make sense to have more relaxed versions of the query. For example instead of ["Approximate string joins in a database (almost) for free"], the user may also add the query ["Approximate string joins" "for free"], which will match more results but will also be noisier. The web search will return web pages, which can be again marked as "keep", "ignore", or "examine later". Since the papers may disappear from the web, the website should keep local copies of the web pages, for future reference. - Optional: Add additional academic search engines, such as CiteSeerX, Libra, SSRN, and so on - Optional: Add additional general search engines, such as Yahoo, MSN, Ask, and so on * For webpages the user should be able to annotate the web page with tags and with some notes (e.g., "PhD seminar", "blog post", "popular press", and so on) # Monitoring * As described above, each publication has an associated list of other publications and web pages that mention it. The website should periodically issue queries to the academic search engines and the general search engines and return to the user the "new" results that have not been fetched before. * The user should be able to mark each of the new results as "keep", "ignore", or "examine later". # Reporting * The user should be able to generate a report indicating the total number of ohter publications that cite each of the user's publications.? * Furthermore, the user should be able to examine the list of web pages that mention the publication. As you can see it is a rather open-ended project. If I am happy with the work, I would like to work further on the project, adding the "nice" functionality that I discuss. Feel free to bid by component. How I will pick the winner: * Please do not send template bids. I will ignore them immediately. If you do not spend time to read the project description and understand what is going on, why do you expect me to believe that you will pay attention to the project? * I value reputation, please do not bid if you have bad reputaton * I would like to see a portfolio of prior work. Having the ability to build fast and "web 2.0"-ish sites would be a big plus. * Knowledge of the Amazon EC2/SimpleDB environment a plus. * Knowledge of the Amazon Mechanical Turk a plus. I expect to integrate the tool with Mechanical Turk, so that humans can inspect the publications, fix them, and provide feedback for imrpoving the quality of the entries in the database.

Building a Citation Tracker

$500-5000 USD

$500-5000 USD

Tentang proyek

Ingin menghasilkan uang?

Keuntungan menawar di Freelancer

Tentang klien

Verifikasi Klien

Pekerjaan lain dari klien ini

Pekerjaan yang serupa