Problem scraping using VBS Looping Sample Code

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information:CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
fujifoo
Posts: 3
Joined: Fri Nov 14, 2008 9:23 am

Problem scraping using VBS Looping Sample Code

Post by fujifoo » Fri Nov 14, 2008 9:33 am

Hi there,

Need some help on the above.

Extract ...
macro = macro + "TAG POS="+Cstr(counter)+" TYPE=DIV ATTR=A:* EXTRACT=TXT"+vbNewLine
macro = macro + "TAG POS="+Cstr(counter)+" TYPE=A ATTR=B* EXTRACT=TXT"+vbNewLine
macro = macro + "TAG POS="+Cstr(counter)+" TYPE=A ATTR=C* EXTRACT=TXT"+vbNewLine
error.jpg
error.jpg (27.64 KiB) Viewed 8569 times
The above works well if all 3 tags are available i.e. A, B, C. However, for row 2, the application actually picks up C3 as there is not C2. As a result, an error is encountered towards the last row of the table. Any one out there who has any solution for this problem. Your advice is much appreciated. Thanks in advance.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Re: Problem scraping using VBS Looping Sample Code

Post by Tech Support » Tue Nov 18, 2008 12:18 pm

I highly recommend relative Positioning:

http://wiki.imacros.net/Data_Extraction ... ositioning

Example: If you have something like

Price: B2 then "Price" can be the anchor.

Or you maybe have headlines like
Part 1
A1 B1 C1
-----
Part 2
A2 B2
-----
Part 3
A3 B3 C3

then the "Part1" (etc) headline (= web site text!) can be the anchor.

Or even the black lines in your screenshot (if they are also lines in the website) can be used as anchor for the TAG command!
fujifoo
Posts: 3
Joined: Fri Nov 14, 2008 9:23 am

Re: Problem scraping using VBS Looping Sample Code

Post by fujifoo » Thu Nov 20, 2008 5:53 am

Hi there,

Thank you very much for your help. I have tried relative positioning as per your advice but

Row 1
...
<h5><span class="rRowCompany">Company A</span></h5><br />

Row 2
...
<h5><a href="http://company.com" class="rRowCompanyClick">Company B</a></h5><br/>

Row 3
...
<h5><span class="rRowCompany">Company C</span></h5><br />

For instances like the above, a command like the following still results in error i.e. Company C is pick instead of Company B

TAG POS=R1 TYPE=A ATTR=CLASS:rRowCompany* EXTRACT=TXT

Is there anything that I can do to pick all text before rRowCompany or rRowCompanyClick? Tried using * after rRowCompany but doesn't work. Thanks.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Re: Problem scraping using VBS Looping Sample Code

Post by Tech Support » Thu Nov 20, 2008 5:39 pm

Can you upload the complete web page (just one page with the data)? It is difficult for us to work with just this one line of HTML code.
fujifoo
Posts: 3
Joined: Fri Nov 14, 2008 9:23 am

Re: Problem scraping using VBS Looping Sample Code

Post by fujifoo » Fri Nov 21, 2008 6:10 am

Here you go ...

<div class="rTable">
<div class="rRow">
<div class="rRowDate">21 Nov<br /><span class="blur">1:34 pm</span></div>
<div class="rRowJob">
<h4 class="rRowTitle"><a href="http://siva-sg.jobstreet.com/_ads/sg/jo ... 3.htm?fr=J" title='View Job Details - Financial Analyst - Job ID : 56316' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_ads/sg/jo ... 3.htm?fr=J');return false;">Financial Analyst - Job ID : 56316</a></h4><br />
<h5><a href="http://siva-sg.jobstreet.com/_profile/p ... er_id=6917" class="rRowCompanyClick" title='View Employer Details & Other Jobs Posted by Philips Electronics (S) Pte Ltd (Main)' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_profile/p ... er_id=6917');return false;">Philips Electronics (S) Pte Ltd (Main)</a></h5><br />
<a href="http://job-search.jobstreet.com.sg/sing ... ction-jobs" title="Manufacturing / Production jobs in Singapore" class="rRowInd">Manufacturing / Production industry</a><br />
</div>
<div class="rRowDetail">5 yrs exp<br /> </div>
<div class="rRowLoc">Singapore</div>
</div>
<div class="enter"></div>
<br class="cf" />

<div class="rRow">
<div class="rRowDate">21 Nov<br /><span class="blur">1:33 pm</span></div>
<div class="rRowJob">
<h4 class="rRowTitle"><a href="http://siva-sg.jobstreet.com/_ads/sg/jo ... 9.htm?fr=J" title='View Job Details - TD - CMOS Transistor Design Principal Engineer' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_ads/sg/jo ... 9.htm?fr=J');return false;">TD - CMOS Transistor Design Principal Engineer</a></h4><br />
<h5><span class="rRowCompany">Chartered Semiconductor Manufacturing Ltd</span></h5><br />
<a href="http://job-search.jobstreet.com.sg/sing ... ation-jobs" title="Semiconductor jobs in Singapore" class="rRowInd">Semiconductor industry</a><br />

</div>
<div class="rRowDetail">5 yrs exp<br /> </div>
<div class="rRowLoc">Woodlands</div>
</div>
<div class="enter"></div>
<br class="cf" />

<div class="rRow">
<div class="rRowDate">21 Nov<br /><span class="blur">1:33 pm</span></div>
<div class="rRowJob">
<h4 class="rRowTitle"><a href="http://siva-sg.jobstreet.com/_ads/sg/jo ... 1.htm?fr=J" title='View Job Details - Director of Strategic Marketing (BU MidPower) - Job ID : 56300' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_ads/sg/jo ... 1.htm?fr=J');return false;">Director of Strategic Marketing (BU MidPower) - Job ID : 56300</a></h4><br />
<h5><a href="http://siva-sg.jobstreet.com/_profile/p ... er_id=6917" class="rRowCompanyClick" title='View Employer Details & Other Jobs Posted by Philips Electronics (S) Pte Ltd (Main)' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_profile/p ... er_id=6917');return false;">Philips Electronics (S) Pte Ltd (Main)</a></h5><br />
<a href="http://job-search.jobstreet.com.sg/sing ... ction-jobs" title="Manufacturing / Production jobs in Singapore" class="rRowInd">Manufacturing / Production industry</a><br />
</div>
<div class="rRowDetail">5 yrs exp<br /></div>
<div class="rRowLoc">Singapore</div>
</div>
<div class="enter"></div>
<br class="cf" />
</div><!--End rTable-->

<div class="rPage">
<div class="rSort">
<a href="#">^ Back to top</a> </div>
<div class="rPaging">
Post Reply