How to deal with information that is only sometimes there

Information related to the use of iMacros for Web Scraping, Data Mining and creating Mashups.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team
Post Reply
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

How to deal with information that is only sometimes there

Post by Tech Support » Sat Dec 06, 2008 12:25 am

When you extract information e. g. from a job website you might notice that not all listing have all information all the time. So if you use relative extraction the TAG command that is used as anchor will trigger an error message.

As example we use our demo listing at http://www.iopus.com/imacros/demo/v6/ex ... sting1.htm

The standard extraction macro is:

Code: Select all

URL GOTO=http://www.iopus.com/imacros/demo/v6/extract1/listing1.htm     
TAG POS=1 TYPE=B ATTR=TXT:Salary:  
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Position*    
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Ref*     
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
(Note: POS=R-1 is used because the HTML tag with the information starts before the text-HTML tag.)

The problem with this macro is that if e. g. the "Position:" information is missing, you will get a TAG error message.

Solution: Extract the anchor text, too! This way you will (a) get no TAG error if the anchor missing and you can easily match anchor text with extracted information:

Code: Select all

URL GOTO=http://www.iopus.com/imacros/demo/v6/extract1/listing1.htm     
TAG POS=1 TYPE=B ATTR=TXT:Salary:  EXTRACT=TXT 
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Position*    EXTRACT=TXT  
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Ref*     EXTRACT=TXT 
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
Post Reply