!ENDOFPAGE to "loop" through tags to extract data

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
vlady_2009
Posts: 2
Joined: Mon Nov 20, 2017 4:27 pm

!ENDOFPAGE to "loop" through tags to extract data

Post by vlady_2009 » Mon Nov 20, 2017 5:04 pm

Using iMacros 10, IE 11, Windows 8

I'm trying to extract the text associated with the TD tags from "cash assets" row in the following HTML i.e., want to extract "11,723","4,180","11,089" - the number of TD elements can vary

Code: Select all

<tr class="tblDataRow">
    <td>Cash Assets</td>
    <td align="right" style=" white-space: nowrap; ">11,723</td>
    <td align="right" style=" white-space: nowrap; ">4,180</td>
    <td align="right" style=" white-space: nowrap; ">11,089</td>
  </tr>
<tr class="tblDataRowAlternate">
    <td>Receivables</td>
    <td align="right" style=" white-space: nowrap; ">18</td>
    <td align="right" style=" white-space: nowrap; ">419</td>
    <td align="right" style=" white-space: nowrap; ">1,256</td>
  </tr>
Using the following iMacros code

Code: Select all

TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Receivables
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R{{myloop}} TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*
but this does not produce any output except #EANF#

However, with the following iMacros code

Code: Select all

TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*
I get the desired extracted text (however, as mentioned above, the number of "rows" can vary and is not always only three.

Advise on what I may be doing wrong with !ENDOFPAGE (and or alternative I should be using) is much appreciated.

Regards
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: !ENDOFPAGE to "loop" through tags to extract data

Post by chivracq » Mon Nov 20, 2017 6:07 pm

vlady_2009 wrote:Using

Code: Select all

iMacros 10, IE 11, Windows 8
I'm trying to extract the text associated with the TD tags from "cash assets" row in the following HTML i.e., want to extract "11,723","4,180","11,089" - the number of TD elements can vary

Code: Select all

<tr class="tblDataRow">
    <td>Cash Assets</td>
    <td align="right" style=" white-space: nowrap; ">11,723</td>
    <td align="right" style=" white-space: nowrap; ">4,180</td>
    <td align="right" style=" white-space: nowrap; ">11,089</td>
  </tr>
<tr class="tblDataRowAlternate">
    <td>Receivables</td>
    <td align="right" style=" white-space: nowrap; ">18</td>
    <td align="right" style=" white-space: nowrap; ">419</td>
    <td align="right" style=" white-space: nowrap; ">1,256</td>
  </tr>
Using the following iMacros code

Code: Select all

TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Receivables
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R{{myloop}} TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*
but this does not produce any output except #EANF#

However, with the following iMacros code

Code: Select all

TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*
I get the desired extracted text (however, as mentioned above, the number of "rows" can vary and is not always only three.

Advise on what I may be doing wrong with !ENDOFPAGE (and or alternative I should be using) is much appreciated.

Regards
Your Script looks correct to me, even if I've never used that '!ENDOFPAGE' Command myself as it is only available for iMB and IE (and I only use FF myself), directly taken from the Example in the Wiki..., but hum, that Example is indeed using a "myloop" Var which is actually not defined...
=> You didn't post your whole Script, but do you define that 'myloop' Var before using it...?
You can probably directly use the raw '!LOOP' like in:

Code: Select all

TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Receivables
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R{{!LOOP}} TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
vlady_2009
Posts: 2
Joined: Mon Nov 20, 2017 4:27 pm

Re: !ENDOFPAGE to "loop" through tags to extract data

Post by vlady_2009 » Mon Nov 20, 2017 7:19 pm

I tried using !LOOP but this only gives a single iteration (with a warning dialog about the default being 1 etc) and using the HTML example from above post, results in "11,723" in the .csv file. If I use the Play (loop) control (and set the repeat macro to 2 or 3) this then results in "4,180" or "11,089" respectively in the .csv file (ie the text from the relative position of the TD element). Similar if I use myloop and define the variable etc. I only get a "relative" step/search for TYPE=TD upto the !ENDOFPAGE for whatever relative "step size", but not the desired extract of each instance of a TD (eg in the case that !LOOP = 1) upto the !ENDOFPAGE mark.

It seems I am misunderstanding what !ENDOFPAGE (coupled with the anchor tag and the R{{!LOOP}} ... lines) is suppose to enable.

The desired functionality is that a particular set of tags without distinguishing characteristics (in this case a series of <TD>) of which there is an unknown variable number between two other tags but which these two can be determined a priori (eg through txt attribute), can be cycled through and the text attribute extracted from each.

Only using iMacro for a few days, so any advise appreciated.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: !ENDOFPAGE to "loop" through tags to extract data

Post by chivracq » Mon Nov 20, 2017 8:21 pm

vlady_2009 wrote:I tried using !LOOP but this only gives a single iteration (with a warning dialog about the default being 1 etc) and using the HTML example from above post, results in "11,723" in the .csv file. If I use the Play (loop) control (and set the repeat macro to 2 or 3) this then results in "4,180" or "11,089" respectively in the .csv file (ie the text from the relative position of the TD element). Similar if I use myloop and define the variable etc. I only get a "relative" step/search for TYPE=TD upto the !ENDOFPAGE for whatever relative "step size", but not the desired extract of each instance of a TD (eg in the case that !LOOP = 1) upto the !ENDOFPAGE mark.

It seems I am misunderstanding what !ENDOFPAGE (coupled with the anchor tag and the R{{!LOOP}} ... lines) is suppose to enable.

The desired functionality is that a particular set of tags without distinguishing characteristics (in this case a series of <TD>) of which there is an unknown variable number between two other tags but which these two can be determined a priori (eg through txt attribute), can be cycled through and the text attribute extracted from each.

Only using iMacro for a few days, so any advise appreciated.
Well, if you use any Loop Mechanism, then yep, you need to play your Macro from the 'Play (Loop)' Button, and to specify a Number in the 'Max' Field which is at least the same or higher than the Nb of Rows you can expect on the Page, say 10 for example for your Source with "only" 3 Rows, then I expect the Macro will stop at the 3rd or 4th Loop with some RuntimeError saying stg about "End of Page reached, aborting Macro...".
I never had a chance to use/test that Functionality as it is not implemented for iMacros for FF... (Even if I have an easy Workaround for FF that I have posted in this Thread... (Hum, is already nearly 2 years old, I still remember that Thread pretty clearly, was an interesting one, ah-ah...!))

But hum, '!ENDOFPAGE' is meant for the Macro to stop if you have a 2nd Table with Rows with the same Attributes like in Table_1 that would get also extracted otherwise, but if on your Page/Site you only have one Table with 3 Rows and 'TAG POS=4' won't find anything anyway, then you could simply add a "real" TAG on the same Field you want to extract (before the 'EXTRACT') and at Loop=4, your Macro will automatically abort with a RuntimeError telling you that 'TAG POS=4 ...' was not found (without using '!ERRORIGNORE' of course...), and you'll know that it ran and extracted correctly the first 3 Rows...

But hum, mini-Remark..., how come if you've been using iMacros for IE since only a few days, that you are using v10.x, v10.0.3 I reckon, which is already about 5 or 6 years old, while the current/latest Version is now v12.0...? :?
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply