Extracting only the searched text

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
tata668
Posts: 42
Joined: Sun Jun 14, 2009 2:34 am

Extracting only the searched text

Post by tata668 » Mon Sep 21, 2009 10:08 pm

Let's say I don't know where the text I'm searching for will be located exactly, I don't know what will be the surrounding HTML tags, is it possible to extract only the searched text?

Ex:

Code: Select all

TAG POS=1 TYPE=* ATTR=TXT:*hello<SP>world* EXTRACT=TXT
When I use the above code the extract contains a lot of text (all the page text). I would like the extract to only contain "hello world", is it possible?

The following would be really cool too: to be able to specified what the extract value would be when the text is found!:

Code: Select all

TAG POS=1 TYPE=* ATTR=TXT:*hello<SP>world* EXTRACT="hello<SP>world<SP>found"
Is there any workaround to achieve what I'm trying to do?
Hannes, Tech Support

Re: Extracting only the searched text

Post by Hannes, Tech Support » Tue Sep 22, 2009 7:50 am

Two workarounds:

1) Use a script, access the extraction via iimGetLastExtract() and use the scripting language's means to identify the string you are looking for

2) Use a script and if your TAG (with TXT:*Hello<SP>world*) returns #EANF#, you know that it wasn't found.
tata668
Posts: 42
Joined: Sun Jun 14, 2009 2:34 am

Re: Extracting only the searched text

Post by tata668 » Tue Sep 22, 2009 1:00 pm

Hannes, iOpus wrote:Two workarounds:

1) Use a script, access the extraction via iimGetLastExtract() and use the scripting language's means to identify the string you are looking for

2) Use a script and if your TAG (with TXT:*Hello<SP>world*) returns #EANF#, you know that it wasn't found.
Thanks for the reply.

This is what I currently do. But since there is a lot of those found/not found in my macro, I would prefere the macro itself to return only a flag, instead of the complete text of the page.

But I can live with this! In fact it's just a suggestion, maybe for a future version: to be able to return a custom value instead of the extracted text.
Hannes, Tech Support

Re: Extracting only the searched text

Post by Hannes, Tech Support » Wed Sep 23, 2009 7:21 am

maybe for a future version: to be able to return a custom value instead of the extracted text.
The upcoming v7 indeed will support returning substrings of the matching TAG's content. So maybe you want to subscribe to the announcement forum.
tata668
Posts: 42
Joined: Sun Jun 14, 2009 2:34 am

Re: Extracting only the searched text

Post by tata668 » Wed Sep 23, 2009 2:25 pm

Hannes, iOpus wrote:
maybe for a future version: to be able to return a custom value instead of the extracted text.
The upcoming v7 indeed will support returning substrings of the matching TAG's content. So maybe you want to subscribe to the announcement forum.
Thanks!
Post Reply