Page 1 of 1

How to deal with information that is only sometimes there

Posted: Sat Dec 06, 2008 12:25 am
by Tech Support
When you extract information e. g. from a job website you might notice that not all listing have all information all the time. So if you use relative extraction the TAG command that is used as anchor will trigger an error message.

As example we use our demo listing at http://www.iopus.com/imacros/demo/v6/ex ... sting1.htm

The standard extraction macro is:

Code: Select all

URL GOTO=http://www.iopus.com/imacros/demo/v6/extract1/listing1.htm     
TAG POS=1 TYPE=B ATTR=TXT:Salary:  
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Position*    
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Ref*     
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
(Note: POS=R-1 is used because the HTML tag with the information starts before the text-HTML tag.)

The problem with this macro is that if e. g. the "Position:" information is missing, you will get a TAG error message.

Solution: Extract the anchor text, too! This way you will (a) get no TAG error if the anchor missing and you can easily match anchor text with extracted information:

Code: Select all

URL GOTO=http://www.iopus.com/imacros/demo/v6/extract1/listing1.htm     
TAG POS=1 TYPE=B ATTR=TXT:Salary:  EXTRACT=TXT 
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Position*    EXTRACT=TXT  
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT 
TAG POS=1 TYPE=B ATTR=TXT:*Ref*     EXTRACT=TXT 
TAG POS=R-1 TYPE=NOBR ATTR=TXT:* EXTRACT=TXT