The excel file with URLs is attached.
The python/perl program needs to do the following (word file attached)
(1) remove HTML tags and leave only the main text;
(2) mark the observations if it has a word in the 200 words after “ex-10”
(2.1) from the word list of sell, sale, order, procurement, supply, supplier, purchase, purchaser in conjecture with a word in the same sentence from the word list of agreement, agrmt, Agree, agmt, form, plan, contract, letter, confirmation, commitment, order, NO, and
(2.2) from a word list of seller, purchaser, buyer, subscriber, producer, carrier, supplier, customer, consumer, manufacturer.
(3) cancel the mark if it has a word in the 200 words after “ex-10” from the word list of interest, registration, receivable, acquisition, merge, real estate, patent, lease, compensation plan, real property, property, properties, bonus, financing, equity, loan, debt, lend, borrow, debenture, incentive plan, executive, stock, security, securities, bond, option, employee, asset, note, land, credit, warrant, residual, rent, share, bank, dollar, employ.
(4) Create a new column “Mark” in the excel: if the observation is marked, then Mark=1; else Mark=0.
2 freelancer menawar dengan rata-rata $151 untuk pekerjaan ini
Experienced software developer with 2+ years of industry experience Relevant Skills and Experience Proficient with python, ruby and java Proposed Milestones $111 USD - Completion