Extract Portion of Text Not Surrounded by Tags

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information:CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
z38601
Posts: 2
Joined: Fri Feb 01, 2019 2:09 pm

Extract Portion of Text Not Surrounded by Tags

Post by z38601 » Fri Feb 01, 2019 2:51 pm

So I have code that looks like this:

<TR>
<TD class="formTableTdNoBorder" align="left">
<P>
<b>Expire Date: </b>
07-31-2020
<br>
</TD>
</TR>

The iMacros Browser generates the following TAG line to capture "Expire Date: 07-31-2020":

VERSION BUILD=11.5.497.9113
...
TAG POS=14 TYPE=P FORM=NAME:validateSelectForm ATTR=* EXTRACT=TXT

However, all I really want to extract is the date portion after "Expire Date:" How do I extract just the date when it's not in its own tag?
chivracq
Posts: 8530
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract Portion of Text Not Surrounded by Tags

Post by chivracq » Fri Feb 01, 2019 3:55 pm

z38601 wrote:
Fri Feb 01, 2019 2:51 pm
So I have code that looks like this:

Code: Select all

<TR>
<TD class="formTableTdNoBorder" align="left">
    <P>
    <b>Expire Date: </b>
    07-31-2020
    <br>
</TD>
</TR>
The iMacros Browser generates the following TAG line to capture "Expire Date: 07-31-2020":

Code: Select all

    VERSION BUILD=11.5.497.9113
    ...
    TAG POS=14 TYPE=P FORM=NAME:validateSelectForm ATTR=* EXTRACT=TXT
However, all I really want to extract is the date portion after "Expire Date:" How do I extract just the date when it's not in its own tag?
FCI:

Code: Select all

iMB v11.5, Win7/8.x/10...(?)

Yep, then that's the purpose of using 'EVAL()' on your Extract to keep only the part that you want, stg like... (2 Solutions):

Code: Select all

TAG POS=14 TYPE=P FORM=NAME:validateSelectForm ATTR=* EXTRACT=TXT
SET Exp_Date_1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; z=s.replace('Expire Date: ',''); z;")
SET Exp_Date_2 EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split(': '); y=x[1]; z=y.trim(); z;")
PROMPT EXTRACT:<SP>_{{!EXTRACT}}_<BR><BR>Exp_Date_1:<SP>_{{Exp_Date_1}}_<BR><BR>Exp_Date_2:<SP>_{{Exp_Date_2}}_
The 'trim()' in the 2nd Implementation is maybe not needed, I'm not sure how iMacros will treat the "<br>" in the Cell.

And in your case, as the "Expire Date: " part is in Bold and can be tagged directly, you can probably use 'Relative Positioning', with stg like...:

Code: Select all

TAG POS=1 TYPE=STRONG FORM=NAME:validateSelectForm ATTR=TXT:Expire<SP>Date:*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=TXT
PROMPT EXTRACT:<SP>_{{!EXTRACT}}_
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
z38601
Posts: 2
Joined: Fri Feb 01, 2019 2:09 pm

Re: Extract Portion of Text Not Surrounded by Tags

Post by z38601 » Fri Feb 01, 2019 9:15 pm

Thank you so much, chivracq, for your help on this. I was finally able to extract just the date using a modified version of your solution 1.
chivracq
Posts: 8530
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract Portion of Text Not Surrounded by Tags

Post by chivracq » Fri Feb 01, 2019 10:17 pm

z38601 wrote:
Fri Feb 01, 2019 9:15 pm
Thank you so much, chivracq, for your help on this. I was finally able to extract just the date using a modified version of your solution 1.
OK, good to hear, and post your final Script with your "modified version", will be also useful for other Users... :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Post Reply