extract tag position keeps on varying

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
hamzajosh

extract tag position keeps on varying

Post by hamzajosh » Tue Dec 06, 2005 6:43 pm

I need to extract the descriptions from these links. When i use this link the code generated is as
http://www.sigmaaldrich.com/catalog/sea ... LUKA/54465

VERSION BUILD=5010115
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.sigmaaldrich.com/catalog/sea ... LUKA/54465
SIZE X=801 Y=602
EXTRACT POS=2 TYPE=TXT ATTR=<SPAN<SP>class=sectionHeader>*
EXTRACT POS=12 TYPE=TXT ATTR=<DIV<SP>class=leftColumn>*
EXTRACT POS=10 TYPE=TXT ATTR=<DIV<SP>class=rightColumn>*

When i change the link to
http://www.sigmaaldrich.com/catalog/sea ... LUKA/95209 via the loop, it does not get the description. Now it gets some other data. How do I make the last two lines in such a way that only the decription is picked up. please help ASAP. Hamza Josh
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Wed Dec 07, 2005 2:55 pm

Your extraction anchors are well chosen. In addition, you can make the EXTRACTION more robust against website changes by using a relative extraction anchor (EXTRACT POS=R1 ...):

This macro extracts the description on both pages correctly:

Code: Select all

VERSION BUILD=5010115 
TAB T=1 
TAB CLOSEALLOTHERS 
'URL GOTO=http://www.sigmaaldrich.com/catalog/search/ProductDetail/FLUKA/54465 
URL GOTO=http://www.sigmaaldrich.com/catalog/search/ProductDetail/FLUKA/95209
TAG POS=1 TYPE=SPAN ATTR=TXT:Descriptions*
EXTRACT POS=R1 TYPE=TXT ATTR=<DIV<SP>class=rightColumn>*
Last edited by Tech Support on Wed Dec 07, 2005 3:07 pm, edited 2 times in total.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Wed Dec 07, 2005 3:04 pm

More information about relative extraction is available at http://forum.imacros.net/viewtopic.php?p=1029
Post Reply