extract data from not specified class

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information:CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
dawidR
Posts: 13
Joined: Thu May 31, 2018 3:27 pm

extract data from not specified class

Post by dawidR » Thu Apr 25, 2019 2:33 pm

Hello,
I am wondering if it is possible to extract data (text) from unspecified TD or TR class?

this is the HTML:
<tr>
<td><b style="outline: 1px solid blue;">Reward</b></td>
<td align="center"><b>540</b></td>
<td align="right"><b>20</b></td>
<td align="right"><b>80</b></td>
<td colspan="2">&nbsp;</td>
<td align="right"><b>2</b></td>
<td align="right"><b>5</b></td>
<td align="right"><b>1</b></td>
</tr>

I am attempting to extract number 540 (which changes daily) right after the word "Reward"

iMacros for FF V8.9.7
FF v55.0.2
MacOS v10.13
iMacros for FF V8.9.7
FF v55.0.2
MacOS v10.13
chivracq
Posts: 8523
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: extract data from not specified class

Post by chivracq » Thu Apr 25, 2019 3:45 pm

dawidR wrote:
Thu Apr 25, 2019 2:33 pm
Hello,
I am wondering if it is possible to extract data (text) from unspecified TD or TR class?

this is the HTML:

Code: Select all

<tr>
			<td><b style="outline: 1px solid blue;">Reward</b></td>
			<td align="center"><b>540</b></td>
			<td align="right"><b>20</b></td>
			<td align="right"><b>80</b></td>
			<td colspan="2">&nbsp;</td>
			<td align="right"><b>2</b></td>
			<td align="right"><b>5</b></td>
			<td align="right"><b>1</b></td>
		</tr>
I am attempting to extract number 540 (which changes daily) right after the word "Reward"

Code: Select all

iMacros for FF V8.9.7
FF v55.0.2
MacOS v10.13

Yep, that's exactly the "Purpose" of 'Relative Positioning'...! 8)

=> Stg like this...:

Code: Select all

TAG POS=1 TYPE=TD ATTR=TXT:"Reward"
SET !EXTRACT NULL
TAG POS=R1 TYPE=TD ATTR=* EXTRACT=TXT
PROMPT EXTRACT:<SP>_{{!EXTRACT}}_
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
dawidR
Posts: 13
Joined: Thu May 31, 2018 3:27 pm

Re: extract data from not specified class

Post by dawidR » Fri Apr 26, 2019 3:39 pm

thank you very much! this worked awesomely! I see how it works find attribute TD - reward and then R1 first TD data after correct?

Is there a way to also highlight multiple data? + extract?

For example find the first TD in each row 1 & 2 & 3 etc.

and then extract next two TD 540 ; 9
600 ; 10
700 ; 9
etc.


would it be something like this?

Code: Select all

TAG POS=1 TYPE=TD ATTR=TXT:ALL TD
SET !EXTRACT NULL
TAG POS=R1&R2 TYPE=TD ATTR=* EXTRACT=TXT

Code: Select all

<tr>
<td><b style="outline: 1px solid blue;">1</b></td>
<td align="center"><b>540</b></td>
<td align="center"><b>9</b></td>
</tr>

<tr>
<td><b style="outline: 1px solid blue;">2</b></td>
<td align="center"><b>600</b></td>
<td align="center"><b>10</b></td>
</tr>

<tr>
<td><b style="outline: 1px solid blue;">3</b></td>
<td align="center"><b>700</b></td>
<td align="center"><b>9</b></td>
</tr>
iMacros for FF V8.9.7
FF v55.0.2
MacOS v10.13
chivracq
Posts: 8523
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: extract data from not specified class

Post by chivracq » Fri Apr 26, 2019 5:16 pm

dawidR wrote:
Fri Apr 26, 2019 3:39 pm
thank you very much! this worked awesomely! I see how it works find attribute TD - reward and then R1 first TD data after correct?

Is there a way to also highlight multiple data? + extract?

For example find the first TD in each row 1 & 2 & 3 etc.

and then extract next two TD 540 ; 9
600 ; 10
700 ; 9
etc.


would it be something like this?

Code: Select all

TAG POS=1 TYPE=TD ATTR=TXT:ALL TD
SET !EXTRACT NULL
TAG POS=R1&R2 TYPE=TD ATTR=* EXTRACT=TXT

Code: Select all

<tr>
<td><b style="outline: 1px solid blue;">1</b></td>
<td align="center"><b>540</b></td>
<td align="center"><b>9</b></td>
</tr>

<tr>
<td><b style="outline: 1px solid blue;">2</b></td>
<td align="center"><b>600</b></td>
<td align="center"><b>10</b></td>
</tr>

<tr>
<td><b style="outline: 1px solid blue;">3</b></td>
<td align="center"><b>700</b></td>
<td align="center"><b>9</b></td>
</tr>

Yep, you first tag an 'Anchor' and you can after "experiment" with 'POS=R1', '=R2', etc..., to see what the 'R-POS' will extract...

But nope, you cannot use "TAG POS=R1&R2", you can only tag/extract one HTML Field/Element at the time...!

But when the Data on a Page is presented in a Table, it is possible to extract that Data at the 'TYPE=TABLE' Level, which will extract the whole Table in just one Statement, and iMacros automatically re-formats that Data into the Format "expected" if you were to save the 'EXTRACT' to a '.CSV' File and open it in Excel.
You can also extract at the 'TYPE=TH' Level (=> Table Header), '=TR' (=> Row) and '=TD' (=> Cell).
=> 'TYPE=TR' in this case would be able to extract a whole Row in just one Statement... But hum, I usually don't find it "very useful", as you then need to re-separate the Content from all Cells.
=> Extracting Cell by Cell is usually the easiest and "cleanest" way, with for example for the 1st Row, stg like:

Code: Select all

TAG POS=1 TYPE=TD ATTR=TXT:1
SET !EXTRACT NULL
TAG POS=R1 TYPE=TD ATTR=* EXTRACT=TXT
SET Rwd_Nb1 {{!EXTRACT}}
SET !EXTRACT NULL
TAG POS=R1 TYPE=TD ATTR=* EXTRACT=TXT
SET Rwd_Nb2 {{!EXTRACT}}
PROMPT Row_1:<BR>Reward:<SP>_{{Rwd_Nb1}}_<SP>/<SP>_{{Rwd_Nb2}}_
... Etc for the next Rows...

But you have to be "careful", because for example if you have a Row_9, then:

Code: Select all

TAG POS=1 TYPE=TD ATTR=TXT:9
SET !EXTRACT NULL
TAG POS=R1 TYPE=TD ATTR=* EXTRACT=TXT
SET Rwd_Nb1 {{!EXTRACT}}
SET !EXTRACT NULL
TAG POS=R1 TYPE=TD ATTR=* EXTRACT=TXT
SET Rwd_Nb2 {{!EXTRACT}}
PROMPT Row_9:<BR>Reward:<SP>_{{Rwd_Nb1}}_<SP>/<SP>_{{Rwd_Nb2}}_
... will not work exactly as expected, as iMacros will "catch" your Cell from the 1st Row that also contains "9" as the Anchor. Oops...!

There are several Solutions though, I let you "think" a little bit, ah-ah...! :twisted:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Post Reply