extracting data from tables using TYPE=HTM

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
Jat

extracting data from tables using TYPE=HTM

Post by Jat » Tue Oct 18, 2005 3:20 pm

HI ,

I am currently working on a project to extract data tables from various websites. The problem that i am having is as follows:

I extract the table using the type=htm method. In the user manual it specifies that by using this method you will be able to seperate certain data in the table that is seperated with <br> tags. After using the method i find that the data extracted does not have these tags and thus does not allow me to seperate it. Could someone please advise me of what i can do. Thank you. :lol:
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Tue Oct 18, 2005 3:58 pm

Can you post the URL of the web page and/or the Internet macro that creates the problem? This would allow me to quickly re-create the issue on our test systems.
Jat

Data Extraction

Post by Jat » Tue Oct 18, 2005 4:26 pm

I am trying to extract the data from the table into a txt file but it does not display the <br> tags.

The link

http://www.national.com/quality/green/compliance.cgi

My Macro

VERSION BUILD=4310722
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.national.com/quality/green/
SIZE X=1132 Y=879
SET !EXTRACT_TEST_POPUP NO
SET !DATASOURCE H:\ComponentDatabase\NatSemiPartNumbers.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_LINE {{!LOOP}}
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:complianceForm ATTR=NAME:nsid CONTENT={{!COL1}}
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:complianceForm ATTR=NAME:&&VALUE:Submit
EXTRACT POS=184 TYPE=HTM ATTR=<table>*
SAVEAS TYPE=EXTRACT FOLDER=H:\ComponentDatabase\NatSemi FILE={{!COL1}}_{{!NOW:yymmdd_hhmmss}}.txt
SAVEAS TYPE=BMP FOLDER=H:\ComponentDatabase\NatSemi FILE={{!COL1}}_{{!NOW:yyyymmdd_hhnnss}}.bmp
Jat

Part Number

Post by Jat » Tue Oct 18, 2005 4:27 pm

Please put this into the field to get the table

LM10CWM
Jat

ANY NEWS <br> tags not displaying

Post by Jat » Mon Oct 24, 2005 10:38 am

has any found a solution to the problems with displaying <br> tags when extracting tables using the HTM function. If someone could help me ASAP it would be very much appreciated
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Mon Oct 24, 2005 7:16 pm

SAVEAS TYPE=HTM... keeps the <BR> in place, but the SAVEAS TYPE=EXTRACT does NOT save the HTML tags to the file. It saves the table in CSV format (text only). If you want to access the table with the <br> and all other HTML elements in place you need to access the data through the Scripting Interface:

i = iim1.iimPlay ("your_macro")

s = iim1.iimGetLastExtract()

s contains the data you need!

Note: If you use the Scripting Interface, I recommend that you remove the SAVEAS TYPE=EXTRACT from your macro.

For best results, use the new version 5: http://www.iopus.com/iim/support/changelog
Jat

extracting data from tables using TYPE=HTM

Post by Jat » Tue Oct 25, 2005 1:37 pm

I am still having problems displaying the <br> tags. In the manual it says use the TYPE-HTM but this clearly does not work. The suggestion about using windows scripting has left me puzzled does this code go into the macro itself or in a seperate file.
Post Reply