I have designed a macro that will scrape a single page of a court listing in firefox with the imacro extension and dump it to a csv file. ? I now need to increase the complexity so that it will run with out me having to increment this myself and go page to page myself.
here is a outline of the program
go to url [url removed, login to view]
click on the yes button
choose civil from dropdown
choose 2010 from dropdown
enter ? in starting case number (enter increment 1 past the last case number to successfully- see saved file)
click on the all button first case number ever would be 724357)
see this screencast for a idea of what i want:
[url removed, login to view]
this software could be helpful to you
<[url removed, login to view]>?
loop the following:
? ? ? ? if use parser to find if the following code is in page <span id="lblError">Invalid CASE number. Please try again.</span>... if so save current case number in file and end for the day
? ? ? ? scrape data (i have included an imacro file with proper code to scrape date on page)
strip "$" "," and " " out of column c / Prayer Amount:?
? ? ? ? leave only the first three words of the "plaintiff name" and remove the rest
? ? ? ? strip all spaces / " " that may pad the beginning or end of all fields
? ? ? ? include a pause that is randomly between 5 and 15 seconds so that this website will not guess this a machine is scraping data
? ? ? ? if "case designation" string doesnt have "foreclosure" in it trash scraped data and increment case number press submit button and begin loop again
? ? ? ? if "prayer amount" number isn't greater that 150000 trash data and increment case number press submit button and begin loop again
? ? ? ? else save save data and enter increment case number press submit button and begin loop again
if "prayer amount" number isn't greater that 150000 trash data and increment to next case
else save save data and enter incremented case number and press submit button
this is an imacros file that scrapes the data from the page. ? the only thing left todo is get this down so that it can loaded from windows scheduler and will check through the court files saving the relavant data as it goes.