Extracting data and paginating on webpage

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting data and paginating on webpage

by e2241 on Fri Feb 17, 2017 3:20 am

Hello respected coders and moderators,

iMacros - latest version (i am not sure where to find that) , Firefox 51.0.1 (32-bit), Windows 7 SP1

I have a list of rows which I use as a datasource for a search form on a website. This website then gives results which could be anywhere from 0 to any number. All these results are then shown with pagination. My requirement is to find the results and paginate among the pages, extract and save as CSV file. I wrote a script which does most of it. However there are 2 problems:

1) I am never sure how many records will show up
2) I am not sure how to handle no records found scenario
3) I am not sure how to paginate if the pagination control exists and if it doesnt exist then obviously need to handle that scenario.


Code: Select all
VERSION BUILD=6861208     
TAB T=1     
'TAB CLOSEALLOTHERS     

'SET !EXTRACT_TEST_POPUP YES
'SET !ERRORIGNORE YES

'Declaring all the variables which the macro will need

SET !DATASOURCE C:\pc\list.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_COLUMNS 2

SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}

URL GOTO=http://www.mywebsite.com/amember4/mymember/myday/mysearch.html

'Set the values in the search fields from my source csv file
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:formtwnonly ATTR=ID:q3237 CONTENT={{!COL1}}
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:formtwnonly ATTR=ID:q3238 CONTENT={{!COL2}}
'Invoke search
TAG POS=1 TYPE=INPUT:IMAGE FORM=NAME:formtwnonly ATTR=ID:ajax_bt7

FRAME F=1

'Extract text
TAG POS=1 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
'TAG POS=2 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
'TAG POS=3 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT

'Remove quotes from extracted text
SET !VAR2 EVAL("var extr2=\"{{!EXTRACT}}\"; extr2.replace(/'/g,'');")

'FRAME NAME="iframe_a"
'Paginate
'However it may not be always shown
'TAG POS=1 TYPE=FONT ATTR=TXT:Next<SP>50<SP>>>

'Save extracted data
SAVEAS TYPE=EXTRACT FOLDER=C:\pc FILE=TestData.csv


I would appreciate help if any!
e2241
 
Posts: 3
Joined: Fri Feb 17, 2017 3:11 am

Re: Extracting data and paginating on webpage

by e2241 on Fri Feb 17, 2017 5:10 am

My own reply:

I have figured that if I can check to see if the pagination control exists on the page that would mean there is another page to navigate and then I can simply go to next page and repeat the imacro code. I am thinking to bring in some javascript.

Code: Select all
var exists = doesPaginationExist();
if (exists) {
iimPlay("#myimacro.iim");
}


My problem is that the pagination item looks like this:
Code: Select all
TAG POS=1 TYPE=FONT ATTR=TXT:Next<SP>50<SP>>>


In JS we can get the element using ID or name. This control has none. Any ideas what I can put in my doesPaginationExist() routine?

However in case there are records on the first page and the pagination control is not there, I would still need to copy the first page records. So my code would be:

Code: Select all
iimPlay("#myimacro.iim");
var exists = doesPaginationExist();
if (exists) {
iimPlay("#myimacro.iim");
}



Again looking at the code what would happen if there are no records displayed what would happen? Obviously my imacro code would give some error as to no tag found? Not sure how I can handle this.
e2241
 
Posts: 3
Joined: Fri Feb 17, 2017 3:11 am

Re: Extracting data and paginating on webpage

by chivracq on Fri Feb 17, 2017 8:30 am

e2241 wrote:Hello respected coders and moderators,

Code: Select all
iMacros - latest version (i am not sure where to find that) , Firefox 51.0.1 (32-bit), Windows 7 SP1


I have a list of rows which I use as a datasource for a search form on a website. This website then gives results which could be anywhere from 0 to any number. All these results are then shown with pagination. My requirement is to find the results and paginate among the pages, extract and save as CSV file. I wrote a script which does most of it. However there are 2 problems:

1) I am never sure how many records will show up
2) I am not sure how to handle no records found scenario
3) I am not sure how to paginate if the pagination control exists and if it doesnt exist then obviously need to handle that scenario.

Code: Select all
VERSION BUILD=6861208     
TAB T=1     
'TAB CLOSEALLOTHERS     

'SET !EXTRACT_TEST_POPUP YES
'SET !ERRORIGNORE YES

'Declaring all the variables which the macro will need

SET !DATASOURCE C:\pc\list.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_COLUMNS 2

SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}

URL GOTO=http://www.mywebsite.com/amember4/mymember/myday/mysearch.html

'Set the values in the search fields from my source csv file
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:formtwnonly ATTR=ID:q3237 CONTENT={{!COL1}}
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:formtwnonly ATTR=ID:q3238 CONTENT={{!COL2}}
'Invoke search
TAG POS=1 TYPE=INPUT:IMAGE FORM=NAME:formtwnonly ATTR=ID:ajax_bt7

FRAME F=1

'Extract text
TAG POS=1 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
'TAG POS=2 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
'TAG POS=3 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT

'Remove quotes from extracted text
SET !VAR2 EVAL("var extr2=\"{{!EXTRACT}}\"; extr2.replace(/'/g,'');")

'FRAME NAME="iframe_a"
'Paginate
'However it may not be always shown
'TAG POS=1 TYPE=FONT ATTR=TXT:Next<SP>50<SP>>>

'Save extracted data
SAVEAS TYPE=EXTRACT FOLDER=C:\pc FILE=TestData.csv


I would appreciate help if any!

Oh...!, very good, I'm impressed, ah-ah...! :D
Your Thread Title is now indeed much more Descriptive than your original "Extracting web information and saving to file." which is more or less what everybody is doing on this Sub-Forum, ah-ah...!
And you've even added your FCI before I would ask you so as I was a bit "confused" by the v6.86 Version in your Script which is from a Version from about 8 years ago and if you had really been using iMacros for so long already, you wouldn't need our Help I would think...
I'm only surprised you didn't find the Forum Rules where identifying your iMacros Version is explained, so if you are on FF, either by starting recording a Macro or by looking at the Properties for iMacros in the FF Add-ons Manager, if you recently installed iMacros for FF, it will then probably be v9.0.3 for FF.

And you even start answering your own Qt, ah-ah...!, good-good...! 8)

e2241 wrote:My own reply:

I have figured that if I can check to see if the pagination control exists on the page that would mean there is another page to navigate and then I can simply go to next page and repeat the imacro code. I am thinking to bring in some javascript.

Code: Select all
var exists = doesPaginationExist();
if (exists) {
iimPlay("#myimacro.iim");
}


My problem is that the pagination item looks like this:
Code: Select all
TAG POS=1 TYPE=FONT ATTR=TXT:Next<SP>50<SP>>>


In JS we can get the element using ID or name. This control has none. Any ideas what I can put in my doesPaginationExist() routine?

However in case there are records on the first page and the pagination control is not there, I would still need to copy the first page records. So my code would be:

Code: Select all
iimPlay("#myimacro.iim");
var exists = doesPaginationExist();
if (exists) {
iimPlay("#myimacro.iim");
}


Again looking at the code what would happen if there are no records displayed what would happen? Obviously my imacro code would give some error as to no tag found? Not sure how I can handle this.

So, OK, you take the path to go with a main '.js' Script to handle all the Conditional Logic related to Data/No_Data Extraction on the Page and Next_Page/No_Next_Page Navigation. Good, that's indeed the "Standard" way to do it, even if it could be done in pure '.iim' as well but it would be a little bit more cumbersome/complex and you would have to play with Nested Loops.

Using your main '.js' Script, you simply have to split the Logic of your whole Scenario into several Parts.
First, to handle the Data/No_Data Extraction on the first Page, you can simply try to "fake" extract the first Result if it was there with 'Macro1.iim', and if it returns stg, then you can launch the "real" Macro ('Macro2.iim) doing the Extraction either for the whole Page in one Macro with 10 Blocks of extracting Data if you expect 10 Results per Page, but every last Page with less than 10 Results will then return a few Rows with no valid Data.
In that case, 'Macro2.iim' could only extract one Result at the time, and either checks already if there is a next Result available to pass that Info to the main '.js' Script handling the Looping to abort the Looping if there is no Next Result, or from 'Macro1.iim' you could already extract/compute the Number of Results available on the Page to pass that Info to the main '.js' Script which will then know exactly already how many times to loop 'Macro2.iim'.
Once a Page has been fully handled, either with 10 Blocks in 'Macro2.iim' or by looping 'Macro2.iim' x10, you then check if the 'Next Page' Button/Link is present with stg like:
Code: Select all
TAG POS=1 TYPE=FONT ATTR=TXT:Next* EXTRACT=TXT
..., before actually clicking on it if it exists (with 'Macro3.iim) in order to land on the next Page of 10 Results (or maybe 50 Results...?).

Search the Forum for "Nested Loops" if you need some '.js' Script Examples...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5730
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting data and paginating on webpage

by e2241 on Fri Feb 17, 2017 8:56 am

Thank you for your detailed reply chivracq.

Yes I was able to find the version installed for iMacros on FF which was 9.0.3. :)

Yes I had used iMacros during my university days to pull data from a yellow pages website (well it was like a yellow pages website. Yes I may have gone over the top while saying that(_ :!: ). It was a huge success and I had actually crashed their site using the code routine I was using. :lol: They actually traced me and asked me to sign a document that i dont use the data for 5 years! I dug up some old code from that time in my emails and was editing and using it. Hence the old version v6.86. Good spot. :shock: I actually never used that data or iMacros after that.

Since I was looking at all forums whole morning today I spotted you answering a few threads and read about FCIM. Yes I added that later. Same goes for the change of title of the thread. :roll:

As for your answer, I will have to see how it goes after I implement it. I will still pepper you with questions if needed. I hope that is fine.
Last edited by e2241 on Sat Feb 18, 2017 3:48 pm, edited 1 time in total.
e2241
 
Posts: 3
Joined: Fri Feb 17, 2017 3:11 am

Re: Extracting data and paginating on webpage

by chivracq on Sat Feb 18, 2017 10:39 am

e2241 wrote:Thank you for your detailed reply chivracq.

Yes I was able to find the version installed for iMacros on FF which was 9.0.3. :)

Yes I had used iMacros during my university days to pull data from a yellow pages website. It was a huge success and I had actually crashed their site using the code routine I was using. :lol: They actually traced me and asked me to sign a document that i dont use the data for 5 years! I dug up some old code from that time in my emails and was editing and using it. Hence the old version v6.86. Good spot. :shock: I actually never used that data or iMacros after that.

Since I was looking at all forums whole morning today I spotted you answering a few threads and read about FCIM. Yes I added that later. Same goes for the change of title of the thread. :roll:

As for your answer, I will have to see how it goes after I implement it. I will still pepper you with questions if needed. I hope that is fine.

OK for v9.0.3 for FF, which is pretty buggy and limited btw, if you intend to use iMacros again "a bit seriously", advised is to rather use v8.9.7 which is much more stable... :idea:

Ah-ah-ah...!, funny Story about Yellow-Pages, a bit surprising I would think that you managed to "crash" their Web-Site, even if you were using 50 concurrent Instances at the same time which I guess was not the case..., I would expect their Site to be able to handle Thousands of concurrent Users, even 8 years ago, but most YP affiliated Sites at that time were trying to sell their Data on CD-Roms and didn't like "clever Nerds" harvesting their Data from their Site(s), ah-ah...!

For your Script for this current Thread, you have in this following Thread some Example of a '.js' Script with Nested Loops like you'll be needing, then it's just a question of "placing" your split parts of Code in the right place...:
- Re: Nested Loops with Javascript

Well, good luck and it's always nice if you can post your final Script as that could always help other Users in the Future... 8)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5730
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest

Website Monitoring