!ENDOFPAGE to "loop" through tags to extract data

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

!ENDOFPAGE to "loop" through tags to extract data

by vlady_2009 on Mon Nov 20, 2017 10:04 am

Using iMacros 10, IE 11, Windows 8

I'm trying to extract the text associated with the TD tags from "cash assets" row in the following HTML i.e., want to extract "11,723","4,180","11,089" - the number of TD elements can vary

Code: Select all
<tr class="tblDataRow">
    <td>Cash Assets</td>
    <td align="right" style=" white-space: nowrap; ">11,723</td>
    <td align="right" style=" white-space: nowrap; ">4,180</td>
    <td align="right" style=" white-space: nowrap; ">11,089</td>
  </tr>
<tr class="tblDataRowAlternate">
    <td>Receivables</td>
    <td align="right" style=" white-space: nowrap; ">18</td>
    <td align="right" style=" white-space: nowrap; ">419</td>
    <td align="right" style=" white-space: nowrap; ">1,256</td>
  </tr>


Using the following iMacros code
Code: Select all
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Receivables
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R{{myloop}} TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*


but this does not produce any output except #EANF#

However, with the following iMacros code
Code: Select all
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*


I get the desired extracted text (however, as mentioned above, the number of "rows" can vary and is not always only three.

Advise on what I may be doing wrong with !ENDOFPAGE (and or alternative I should be using) is much appreciated.

Regards
vlady_2009
 
Posts: 2
Joined: Mon Nov 20, 2017 9:27 am

Re: !ENDOFPAGE to "loop" through tags to extract data

by chivracq on Mon Nov 20, 2017 11:07 am

vlady_2009 wrote:Using
Code: Select all
iMacros 10, IE 11, Windows 8


I'm trying to extract the text associated with the TD tags from "cash assets" row in the following HTML i.e., want to extract "11,723","4,180","11,089" - the number of TD elements can vary

Code: Select all
<tr class="tblDataRow">
    <td>Cash Assets</td>
    <td align="right" style=" white-space: nowrap; ">11,723</td>
    <td align="right" style=" white-space: nowrap; ">4,180</td>
    <td align="right" style=" white-space: nowrap; ">11,089</td>
  </tr>
<tr class="tblDataRowAlternate">
    <td>Receivables</td>
    <td align="right" style=" white-space: nowrap; ">18</td>
    <td align="right" style=" white-space: nowrap; ">419</td>
    <td align="right" style=" white-space: nowrap; ">1,256</td>
  </tr>


Using the following iMacros code
Code: Select all
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Receivables
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R{{myloop}} TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*


but this does not produce any output except #EANF#

However, with the following iMacros code
Code: Select all
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*


I get the desired extracted text (however, as mentioned above, the number of "rows" can vary and is not always only three.

Advise on what I may be doing wrong with !ENDOFPAGE (and or alternative I should be using) is much appreciated.

Regards

Your Script looks correct to me, even if I've never used that '!ENDOFPAGE' Command myself as it is only available for iMB and IE (and I only use FF myself), directly taken from the Example in the Wiki..., but hum, that Example is indeed using a "myloop" Var which is actually not defined...
=> You didn't post your whole Script, but do you define that 'myloop' Var before using it...?
You can probably directly use the raw '!LOOP' like in:
Code: Select all
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Receivables
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS=1 TYPE=TD FORM=NAME:Form1 ATTR=TXT:Cash<SP>Assets
TAG POS=R{{!LOOP}} TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6481
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: !ENDOFPAGE to "loop" through tags to extract data

by vlady_2009 on Mon Nov 20, 2017 12:19 pm

I tried using !LOOP but this only gives a single iteration (with a warning dialog about the default being 1 etc) and using the HTML example from above post, results in "11,723" in the .csv file. If I use the Play (loop) control (and set the repeat macro to 2 or 3) this then results in "4,180" or "11,089" respectively in the .csv file (ie the text from the relative position of the TD element). Similar if I use myloop and define the variable etc. I only get a "relative" step/search for TYPE=TD upto the !ENDOFPAGE for whatever relative "step size", but not the desired extract of each instance of a TD (eg in the case that !LOOP = 1) upto the !ENDOFPAGE mark.

It seems I am misunderstanding what !ENDOFPAGE (coupled with the anchor tag and the R{{!LOOP}} ... lines) is suppose to enable.

The desired functionality is that a particular set of tags without distinguishing characteristics (in this case a series of <TD>) of which there is an unknown variable number between two other tags but which these two can be determined a priori (eg through txt attribute), can be cycled through and the text attribute extracted from each.

Only using iMacro for a few days, so any advise appreciated.
vlady_2009
 
Posts: 2
Joined: Mon Nov 20, 2017 9:27 am

Re: !ENDOFPAGE to "loop" through tags to extract data

by chivracq on Mon Nov 20, 2017 1:21 pm

vlady_2009 wrote:I tried using !LOOP but this only gives a single iteration (with a warning dialog about the default being 1 etc) and using the HTML example from above post, results in "11,723" in the .csv file. If I use the Play (loop) control (and set the repeat macro to 2 or 3) this then results in "4,180" or "11,089" respectively in the .csv file (ie the text from the relative position of the TD element). Similar if I use myloop and define the variable etc. I only get a "relative" step/search for TYPE=TD upto the !ENDOFPAGE for whatever relative "step size", but not the desired extract of each instance of a TD (eg in the case that !LOOP = 1) upto the !ENDOFPAGE mark.

It seems I am misunderstanding what !ENDOFPAGE (coupled with the anchor tag and the R{{!LOOP}} ... lines) is suppose to enable.

The desired functionality is that a particular set of tags without distinguishing characteristics (in this case a series of <TD>) of which there is an unknown variable number between two other tags but which these two can be determined a priori (eg through txt attribute), can be cycled through and the text attribute extracted from each.

Only using iMacro for a few days, so any advise appreciated.

Well, if you use any Loop Mechanism, then yep, you need to play your Macro from the 'Play (Loop)' Button, and to specify a Number in the 'Max' Field which is at least the same or higher than the Nb of Rows you can expect on the Page, say 10 for example for your Source with "only" 3 Rows, then I expect the Macro will stop at the 3rd or 4th Loop with some RuntimeError saying stg about "End of Page reached, aborting Macro...".
I never had a chance to use/test that Functionality as it is not implemented for iMacros for FF... (Even if I have an easy Workaround for FF that I have posted in this Thread... (Hum, is already nearly 2 years old, I still remember that Thread pretty clearly, was an interesting one, ah-ah...!))

But hum, '!ENDOFPAGE' is meant for the Macro to stop if you have a 2nd Table with Rows with the same Attributes like in Table_1 that would get also extracted otherwise, but if on your Page/Site you only have one Table with 3 Rows and 'TAG POS=4' won't find anything anyway, then you could simply add a "real" TAG on the same Field you want to extract (before the 'EXTRACT') and at Loop=4, your Macro will automatically abort with a RuntimeError telling you that 'TAG POS=4 ...' was not found (without using '!ERRORIGNORE' of course...), and you'll know that it ran and extracted correctly the first 3 Rows...

But hum, mini-Remark..., how come if you've been using iMacros for IE since only a few days, that you are using v10.x, v10.0.3 I reckon, which is already about 5 or 6 years old, while the current/latest Version is now v12.0...? :?
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6481
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 2 guests

-->