Loop through results

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
andrew2221
Posts: 3
Joined: Fri Aug 05, 2016 7:49 am

Loop through results

Post by andrew2221 » Fri Aug 05, 2016 8:24 am

Hi everyone.

I need to extract data from a site. There is a simple SELECT option, an INPUT, and a SEARCH BUTTON. So far so good.

Now, the result pages, are not paginated in the traditional way. There is no link to Page 1/2/...., only next page button, which is an input, and it acts like a submit. This input has a NAME attribute, which, on the last page is missing. This is fine with me, because the macro will stop automatically once finished.

I need to go through all the pages, and extract the data. Now bear with me, I've started to look into iMacros only yesterday, so I'm not really familiar with all the options..
This is the code I've came up so far, and it works for the first page:

Code: Select all

VERSION BUILD=8970419 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
SET !ERRORIGNORE YES

URL GOTO=https://www.eofcom.admin.ch/eofcom/public/searchEofcom_InaFree.do
' select option 2 from the dropdown
TAG POS=1 TYPE=SELECT FORM=ID:chofcomelicensingpresentationnainaformEofcomInaSearchForm ATTR=NAME:nrt CONTENT=%2
' insert value into input box
TAG POS=1 TYPE=INPUT:TEXT FORM=ID:chofcomelicensingpresentationnainaformEofcomInaSearchForm ATTR=NAME:pnp CONTENT=0002
' submit search
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:chofcomelicensingpresentationnainaformEofcomInaSearchForm ATTR=NAME:doSearchFreeByNumber

' *** this is the block I would need to loop through
' get out the text from the result table
TAG POS=1 TYPE=TBODY ATTR=TXT:* EXTRACT=TXT
' remove all the spaces
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/\s/g, \"\");")
' show the result
PROMPT "{{!EXTRACT}}"
' goto next page
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:chofcomelicensingpresentationnainaformEofcomInaSearchForm ATTR=NAME:getNextInaPage
' *** end block of loop
I've tried to

Code: Select all

SET !LOOP 25
and start the macro in loop mode, but it doesn't work, plus I don't know beforehand how many result pages would be.
From my researches over the net couldn't figure out how to achieve that. Any help would be greatly appreciated.

I am using the free iMacros Firefox extiension (latest, downloaded yesterday), in Debian 8.5, but I also have an XP machine with the trial iMacros Version 11.1.

Note: before someone would complain about legal issues related to scraping data, I have to mention that my company contacted the owners of the site, and asked for an API, which is not available, and they suggested to use the search page instead.

Thanks.
iimfun
Posts: 239
Joined: Tue Jul 19, 2016 1:06 pm

Re: Loop through results

Post by iimfun » Sat Aug 06, 2016 12:00 pm

Let's suppose that you are on the first page. Play the part of the macro you want to loop through in loop mode with an arbitrary value ( > 1 ) of the 'Max:' field (just for testing).

Code: Select all

' *** this is the block I would need to loop through
' get out the text from the result table
TAG POS=1 TYPE=TBODY ATTR=TXT:* EXTRACT=TXT
' remove all the spaces
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/\s/g, \"\");")
' show the result
PROMPT "{{!EXTRACT}}"
' goto next page
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:chofcomelicensingpresentationnainaformEofcomInaSearchForm ATTR=NAME:getNextInaPage
' *** end block of loop
Does that work as expected?
andrew2221
Posts: 3
Joined: Fri Aug 05, 2016 7:49 am

Re: Loop through results

Post by andrew2221 » Mon Aug 08, 2016 6:02 am

iimfun wrote:Let's suppose that you are on the first page. Play the part of the macro you want to loop through in loop mode with an arbitrary value ( > 1 ) of the 'Max:' field (just for testing).

Code: Select all

' *** this is the block I would need to loop through
' get out the text from the result table
TAG POS=1 TYPE=TBODY ATTR=TXT:* EXTRACT=TXT
' remove all the spaces
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/\s/g, \"\");")
' show the result
PROMPT "{{!EXTRACT}}"
' goto next page
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:chofcomelicensingpresentationnainaformEofcomInaSearchForm ATTR=NAME:getNextInaPage
' *** end block of loop
Does that work as expected?
Yes, it does work as expected, but I need to find a way to go through all the pages, without knowing the number of loops, plus it would be automated by date, as in, it should run once a week.
iimfun
Posts: 239
Joined: Tue Jul 19, 2016 1:06 pm

Re: Loop through results

Post by iimfun » Mon Aug 08, 2016 12:09 pm

1. You can set the 'Max' number of loops to 99999 and the macro just will stop with an error when it doesn't find the next page.

2. What do you mean by "automated by date"? And where is a date in your macro?
andrew2221
Posts: 3
Joined: Fri Aug 05, 2016 7:49 am

Re: Loop through results

Post by andrew2221 » Tue Aug 09, 2016 7:56 am

iimfun wrote:1. You can set the 'Max' number of loops to 99999 and the macro just will stop with an error when it doesn't find the next page.

2. What do you mean by "automated by date"? And where is a date in your macro?
Thank you for the reply.

1. Oh, I did not know that.. Thanks for the info

2. Automated, as in I created an AutoIT script which does run a browser, navigate to the page, and starts the script, based on current date, but this has nothing to do with iMacros. :)

But anyway, I am looking for alternative solutions, like running PHP script or using cURL inside a cron job...
Post Reply