Extract a table line by line

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extract a table line by line

by ripper99 on Wed Oct 26, 2005 3:01 am

I am looking for help to extract a table line by line from Ebay, in the manual I see reference to {{mypos}} and I also tried this demo http://www.iopus.com/iim/demo/v4/pos/index.htm but it doesnt seem to loop in the example?

I am confused as to how the manual mentions "and then loop through changing the POS parameter "mypos" from 1 to 999" , how exactly is this done? Any example would be appreciated..I am also confused if the script went for example from "mypos" 1-40 and there were only 38 positions would that stop the script?

I also tried the following and got and error on line 8: EXTRACT POS={{LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*

Basically I would just like to visit the page, walk down and extract all the positions I have and then hop to the next page and do the same while saving to .csv and can you have a popup that says it is finished when it cannot click ":Next" anymore...will it know?

**Also why is their " " around everything that is extracted? I had actually mentioned this back in January to Mike and he mentioned it was a problem that would be fixed....thanks for any help someone can provide, here is the code I have so far

------------------------------------------------------------------------------------


VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
SIZE X=1132 Y=623
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items
EXTRACT POS=2 TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS=2 TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS=2 TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS=2 TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS=1 TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS=1 TYPE=TXT ATTR=<TD<SP>class=ebcTim>*
TAG POS=1 TYPE=A ATTR=TXT:Next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=pioneer.csv
ripper99
 
Posts: 8
Joined: Wed Oct 26, 2005 12:15 am

by Tech Support on Wed Oct 26, 2005 8:41 am

I also tried the following and got and error on line 8: EXTRACT POS={{LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*

This line was almost correct. Please use:
EXTRACT POS={{!LOOP}} ...

Remember to start the macro with the LOOP button (not the PLAY button) in order to trigger the looping.
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

by ripper99 on Wed Oct 26, 2005 2:47 pm

Using the code below it does not seem to grab line by line, after it makes the first loop it will grab most positions from line 2 but it also grabs one from line 3 and on each iteration after that it does the same

Also why are " " around things that are extracted?

VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
SIZE X=1132 Y=623
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS={{!LOOP}} TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTim>*
TAG POS={{!LOOP}} TYPE=A ATTR=TXT:Next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=pioneer.csv
ripper99
 
Posts: 8
Joined: Wed Oct 26, 2005 12:15 am

by ripper99 on Fri Oct 28, 2005 11:05 pm

*bump*
ripper99
 
Posts: 8
Joined: Wed Oct 26, 2005 12:15 am

by Tech Support on Sat Oct 29, 2005 3:57 pm

Your extraction commands are basically ok. But you can make the extraction more reliable (= less influenced by changes on the page) by using the RELATIVE extraction. This macro works well in my test:

VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
SIZE X=1132 Y=623
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items
TAG POS={{!LOOP}} TYPE=INPUT:CHECKBOX FORM=NAME:find ATTR=NAME:coitem&&VALUE:* CONTENT=NO
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS=R1 TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTim>*
'TAG POS=R1 TYPE=A ATTR=TXT:Next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=pioneer.csv

All extraction is now relative to the orange TAG command. This command finds the checkbox at the beginning of each line. All EXTRACT commands look for the first (R1) element AFTER this tag command.
Last edited by Tech Support on Sat Oct 29, 2005 4:02 pm, edited 1 time in total.
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

by Tech Support on Sat Oct 29, 2005 4:01 pm

For a faster extraction I recommend that you split the macro in three parts:

Macro1 does the search.
Macro2 does the extraction (this macro is repeated x-times)
Macro3 clicks on the NEXT link.

VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items


TAG POS={{my_loop}} TYPE=INPUT:CHECKBOX FORM=NAME:find ATTR=NAME:coitem&&VALUE:* CONTENT=NO
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS=R1 TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTim>*


TAG POS=R1 TYPE=A ATTR=TXT:Next


Then you have the same structure as shown in http://forum.iopus.com/viewtopic.php?t=6 or http://www.iopus.com/iim/help/faq_extract_pages.htm
Last edited by Tech Support on Sat Oct 29, 2005 4:14 pm, edited 2 times in total.
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

by Tech Support on Sat Oct 29, 2005 4:07 pm

Also why are " " around things that are extracted?

The " " as in
"NEW Pioneer DVD Rec.","http://cgi.ebay.com/NEW...","-","$299.99","$19.95","1m"
is part of the definition of the CSV file format (CSV = comma separated values).
Without using "" an extracted value that contains a comma could not be saved correctly:
Example: "AAA,AAA", "BBBB", "C,C,C"
A good example are some European ebay sites where prices contain a comma instead of point ("EUR 19,95") :wink:
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

by ripper99 on Sun Oct 30, 2005 1:49 am

Ahh now I understand the "" , thanks for all your help and the example scripts..really helps alot!
ripper99
 
Posts: 8
Joined: Wed Oct 26, 2005 12:15 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: Bing [Bot] and 5 guests

-->