Just getting started ... Regular Expressions

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
erikcsmith
Posts: 6
Joined: Thu Jan 26, 2006 11:54 pm

Just getting started ... Regular Expressions

Post by erikcsmith » Fri Jan 27, 2006 12:02 am

Hello,

I am researching using iOpus for scrapping. Currently, I am using home ground scripts based on Perl and CURL.

I am trying to extract some basic stuff and still am having difficulties despite reading much of the documentation and reviewing the demos.

There is a basic line of html that I am trying to grab PART of the Text. It looks similar to this:

<td>Today's date is 01/01/06<td>

I can extract 'Today's date is 01/01/06' without a problem. However I only want '01/01/06' extracted. Its really frustrating b/c it seems like it should be so simple (I am impressed with the complete Table scrape in the demo).

What am i doing wrong? Shouldnt i be able to grab PART of the text Without resorting to parsing it in PERL/VB/WSH?

Also, is there not a way to incorporate a customized regular expression into the EXTRACT? Seems like that would be quite handy.

Kind regard to anyone who can point me in the correct direction.

Erik
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Fri Jan 27, 2006 8:32 pm

Shouldnt i be able to grab PART of the text Without resorting to parsing it in PERL/VB/WSH?
The idea behind the current IIM design is that we allow a very easy integration into any scripting language. Thus, you can use PERL to call call iimPlay, get the data you need and do the parsing in PERL. A PERL tutorial is available at http://www.iopus.com/iim/tutorials/perl.htm
Also, is there not a way to incorporate a customized regular expression into the EXTRACT?

Not yet.
Seems like that would be quite handy.
Agreed :wink:
erikcsmith
Posts: 6
Joined: Thu Jan 26, 2006 11:54 pm

Post by erikcsmith » Fri Jan 27, 2006 9:02 pm

Thank you for your response.

Just to clarify.

There is no IIM way to EXTRACT '01/01/2010'
from <td align=right>Verified as of 01/01/2010</td>
with out outside scripting?

I was hoping that I could key in on the date using TAG and EXTRACT
or Relative Tagging:

TAG POS=1 TYPE=TD ATTR=TXT:<SP>Verified<SP>as<SP>of<SP>
EXTRACT POS=R1 TYPE=TEXT ATTR=*<SP><SP>

or something similar. Can you clarify if this can be accomplished currently.
I am testing out IIM along side Screen-scrapper.com's tool to find out which solution will work best for our company's needs.

Cheers,

Erik
erikcsmith
Posts: 6
Joined: Thu Jan 26, 2006 11:54 pm

Post by erikcsmith » Fri Jan 27, 2006 9:42 pm

PS - You said 'not yet' to the integrated regular expressions.
Is this a feature on the near horizon for iim?? It would make this product so much more versatile IMO. If so, can you give a sneak peak into time frames that this will be integrated?

Thanks again,

Erik
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Fri Jan 27, 2006 10:28 pm

Relative Tagging and extract in general works on a HTML tag level. So the smallest item of information that you can extract is the text (or link..) inside a tag like <b>price: US$ 99.95</b>. In this example you would use simple post processing to extract the price from the string.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Fri Jan 27, 2006 10:30 pm

There is no specific release date for regular expressions yet but it is on our "to do" list. It has somewhat lower priority because the Scripting Interface allows for a good and powerful workaround.
Post Reply