Extract a table line by line

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
ripper99
Posts: 8
Joined: Wed Oct 26, 2005 7:15 am

Extract a table line by line

Post by ripper99 » Wed Oct 26, 2005 10:01 am

I am looking for help to extract a table line by line from Ebay, in the manual I see reference to {{mypos}} and I also tried this demo http://www.iopus.com/iim/demo/v4/pos/index.htm but it doesnt seem to loop in the example?

I am confused as to how the manual mentions "and then loop through changing the POS parameter "mypos" from 1 to 999" , how exactly is this done? Any example would be appreciated..I am also confused if the script went for example from "mypos" 1-40 and there were only 38 positions would that stop the script?

I also tried the following and got and error on line 8: EXTRACT POS={{LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*

Basically I would just like to visit the page, walk down and extract all the positions I have and then hop to the next page and do the same while saving to .csv and can you have a popup that says it is finished when it cannot click ":Next" anymore...will it know?

**Also why is their " " around everything that is extracted? I had actually mentioned this back in January to Mike and he mentioned it was a problem that would be fixed....thanks for any help someone can provide, here is the code I have so far

------------------------------------------------------------------------------------


VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
SIZE X=1132 Y=623
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items
EXTRACT POS=2 TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS=2 TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS=2 TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS=2 TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS=1 TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS=1 TYPE=TXT ATTR=<TD<SP>class=ebcTim>*
TAG POS=1 TYPE=A ATTR=TXT:Next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=pioneer.csv
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Wed Oct 26, 2005 3:41 pm

I also tried the following and got and error on line 8: EXTRACT POS={{LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
This line was almost correct. Please use:
EXTRACT POS={{!LOOP}} ...

Remember to start the macro with the LOOP button (not the PLAY button) in order to trigger the looping.
ripper99
Posts: 8
Joined: Wed Oct 26, 2005 7:15 am

Post by ripper99 » Wed Oct 26, 2005 9:47 pm

Using the code below it does not seem to grab line by line, after it makes the first loop it will grab most positions from line 2 but it also grabs one from line 3 and on each iteration after that it does the same

Also why are " " around things that are extracted?

VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
SIZE X=1132 Y=623
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS={{!LOOP}} TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS={{!LOOP}} TYPE=TXT ATTR=<TD<SP>class=ebcTim>*
TAG POS={{!LOOP}} TYPE=A ATTR=TXT:Next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=pioneer.csv
ripper99
Posts: 8
Joined: Wed Oct 26, 2005 7:15 am

Post by ripper99 » Sat Oct 29, 2005 6:05 am

*bump*
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Sat Oct 29, 2005 10:57 pm

Your extraction commands are basically ok. But you can make the extraction more reliable (= less influenced by changes on the page) by using the RELATIVE extraction. This macro works well in my test:

VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
SIZE X=1132 Y=623
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items
TAG POS={{!LOOP}} TYPE=INPUT:CHECKBOX FORM=NAME:find ATTR=NAME:coitem&&VALUE:* CONTENT=NO
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS=R1 TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTim>*
'TAG POS=R1 TYPE=A ATTR=TXT:Next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=pioneer.csv

All extraction is now relative to the orange TAG command. This command finds the checkbox at the beginning of each line. All EXTRACT commands look for the first (R1) element AFTER this tag command.
Last edited by Tech Support on Sat Oct 29, 2005 11:02 pm, edited 1 time in total.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Sat Oct 29, 2005 11:01 pm

For a faster extraction I recommend that you split the macro in three parts:

Macro1 does the search.
Macro2 does the extraction (this macro is repeated x-times)
Macro3 clicks on the NEXT link.

VERSION BUILD=5001024
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.ebay.com/
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:searchform ATTR=NAME:satitle CONTENT=pioneer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:searchform ATTR=NAME:&&VALUE:Search
TAG POS=1 TYPE=A ATTR=TXT:see<SP>all<SP>*<SP>items


TAG POS={{my_loop}} TYPE=INPUT:CHECKBOX FORM=NAME:find ATTR=NAME:coitem&&VALUE:* CONTENT=NO
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTtl>*
EXTRACT POS=R1 TYPE=HREF ATTR=<A<SP>href="http://cgi.ebay.com/*">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcBid>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcPr>*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class="ebcShpNew<SP>">*
EXTRACT POS=R1 TYPE=TXT ATTR=<TD<SP>class=ebcTim>*


TAG POS=R1 TYPE=A ATTR=TXT:Next


Then you have the same structure as shown in http://forum.imacros.net/viewtopic.php?t=6 or http://www.iopus.com/iim/help/faq_extract_pages.htm
Last edited by Tech Support on Sat Oct 29, 2005 11:14 pm, edited 2 times in total.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Sat Oct 29, 2005 11:07 pm

Also why are " " around things that are extracted?
The " " as in
"NEW Pioneer DVD Rec.","http://cgi.ebay.com/NEW...","-","$299.99","$19.95","1m"
is part of the definition of the CSV file format (CSV = comma separated values).
Without using "" an extracted value that contains a comma could not be saved correctly:
Example: "AAA,AAA", "BBBB", "C,C,C"
A good example are some European ebay sites where prices contain a comma instead of point ("EUR 19,95") :wink:
ripper99
Posts: 8
Joined: Wed Oct 26, 2005 7:15 am

Post by ripper99 » Sun Oct 30, 2005 8:49 am

Ahh now I understand the "" , thanks for all your help and the example scripts..really helps alot!
Post Reply