Problem scraping using VBS Looping Sample Code

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Problem scraping using VBS Looping Sample Code

by fujifoo on Fri Nov 14, 2008 2:33 am

Hi there,

Need some help on the above.

Extract ...
macro = macro + "TAG POS="+Cstr(counter)+" TYPE=DIV ATTR=A:* EXTRACT=TXT"+vbNewLine
macro = macro + "TAG POS="+Cstr(counter)+" TYPE=A ATTR=B* EXTRACT=TXT"+vbNewLine
macro = macro + "TAG POS="+Cstr(counter)+" TYPE=A ATTR=C* EXTRACT=TXT"+vbNewLine

error.jpg
error.jpg (27.64 KIB) Viewed 7656 times


The above works well if all 3 tags are available i.e. A, B, C. However, for row 2, the application actually picks up C3 as there is not C2. As a result, an error is encountered towards the last row of the table. Any one out there who has any solution for this problem. Your advice is much appreciated. Thanks in advance.
fujifoo
 
Posts: 3
Joined: Fri Nov 14, 2008 2:23 am

Re: Problem scraping using VBS Looping Sample Code

by Tech Support on Tue Nov 18, 2008 5:18 am

I highly recommend relative Positioning:

http://wiki.imacros.net/Data_Extraction ... ositioning

Example: If you have something like

Price: B2 then "Price" can be the anchor.

Or you maybe have headlines like
Part 1
A1 B1 C1
-----
Part 2
A2 B2
-----
Part 3
A3 B3 C3

then the "Part1" (etc) headline (= web site text!) can be the anchor.

Or even the black lines in your screenshot (if they are also lines in the website) can be used as anchor for the TAG command!
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

Re: Problem scraping using VBS Looping Sample Code

by fujifoo on Wed Nov 19, 2008 10:53 pm

Hi there,

Thank you very much for your help. I have tried relative positioning as per your advice but

Row 1
...
<h5><span class="rRowCompany">Company A</span></h5><br />

Row 2
...
<h5><a href="http://company.com" class="rRowCompanyClick">Company B</a></h5><br/>

Row 3
...
<h5><span class="rRowCompany">Company C</span></h5><br />

For instances like the above, a command like the following still results in error i.e. Company C is pick instead of Company B

TAG POS=R1 TYPE=A ATTR=CLASS:rRowCompany* EXTRACT=TXT

Is there anything that I can do to pick all text before rRowCompany or rRowCompanyClick? Tried using * after rRowCompany but doesn't work. Thanks.
fujifoo
 
Posts: 3
Joined: Fri Nov 14, 2008 2:23 am

Re: Problem scraping using VBS Looping Sample Code

by Tech Support on Thu Nov 20, 2008 10:39 am

Can you upload the complete web page (just one page with the data)? It is difficult for us to work with just this one line of HTML code.
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

Re: Problem scraping using VBS Looping Sample Code

by fujifoo on Thu Nov 20, 2008 11:10 pm

Here you go ...

<div class="rTable">
<div class="rRow">
<div class="rRowDate">21 Nov<br /><span class="blur">1:34 pm</span></div>
<div class="rRowJob">
<h4 class="rRowTitle"><a href="http://siva-sg.jobstreet.com/_ads/sg/jobs/2008/11/p/20/2015193.htm?fr=J" title='View Job Details - Financial Analyst - Job ID : 56316' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_ads/sg/jobs/2008/11/p/20/2015193.htm?fr=J');return false;">Financial Analyst - Job ID : 56316</a></h4><br />
<h5><a href="http://siva-sg.jobstreet.com/_profile/previewProfile.asp?advertiser_id=6917" class="rRowCompanyClick" title='View Employer Details &amp; Other Jobs Posted by Philips Electronics (S) Pte Ltd (Main)' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_profile/previewProfile.asp?advertiser_id=6917');return false;">Philips Electronics (S) Pte Ltd (Main)</a></h5><br />
<a href="http://job-search.jobstreet.com.sg/singapore/job/vacancy/ind/manufacturing-production-jobs" title="Manufacturing / Production jobs in Singapore" class="rRowInd">Manufacturing / Production industry</a><br />
</div>
<div class="rRowDetail">5 yrs exp<br /> </div>
<div class="rRowLoc">Singapore</div>
</div>
<div class="enter"></div>
<br class="cf" />

<div class="rRow">
<div class="rRowDate">21 Nov<br /><span class="blur">1:33 pm</span></div>
<div class="rRowJob">
<h4 class="rRowTitle"><a href="http://siva-sg.jobstreet.com/_ads/sg/jobs/2008/11/c/20/2022289.htm?fr=J" title='View Job Details - TD - CMOS Transistor Design Principal Engineer' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_ads/sg/jobs/2008/11/c/20/2022289.htm?fr=J');return false;">TD - CMOS Transistor Design Principal Engineer</a></h4><br />
<h5><span class="rRowCompany">Chartered Semiconductor Manufacturing Ltd</span></h5><br />
<a href="http://job-search.jobstreet.com.sg/singapore/job/vacancy/ind/semiconductor-wafer-fabrication-jobs" title="Semiconductor jobs in Singapore" class="rRowInd">Semiconductor industry</a><br />

</div>
<div class="rRowDetail">5 yrs exp<br /> </div>
<div class="rRowLoc">Woodlands</div>
</div>
<div class="enter"></div>
<br class="cf" />

<div class="rRow">
<div class="rRowDate">21 Nov<br /><span class="blur">1:33 pm</span></div>
<div class="rRowJob">
<h4 class="rRowTitle"><a href="http://siva-sg.jobstreet.com/_ads/sg/jobs/2008/11/p/20/2016031.htm?fr=J" title='View Job Details - Director of Strategic Marketing (BU MidPower) - Job ID : 56300' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_ads/sg/jobs/2008/11/p/20/2016031.htm?fr=J');return false;">Director of Strategic Marketing (BU MidPower) - Job ID : 56300</a></h4><br />
<h5><a href="http://siva-sg.jobstreet.com/_profile/previewProfile.asp?advertiser_id=6917" class="rRowCompanyClick" title='View Employer Details &amp; Other Jobs Posted by Philips Electronics (S) Pte Ltd (Main)' target ='_blank' onclick="popWin('http://siva-sg.jobstreet.com/_profile/previewProfile.asp?advertiser_id=6917');return false;">Philips Electronics (S) Pte Ltd (Main)</a></h5><br />
<a href="http://job-search.jobstreet.com.sg/singapore/job/vacancy/ind/manufacturing-production-jobs" title="Manufacturing / Production jobs in Singapore" class="rRowInd">Manufacturing / Production industry</a><br />
</div>
<div class="rRowDetail">5 yrs exp<br /></div>
<div class="rRowLoc">Singapore</div>
</div>
<div class="enter"></div>
<br class="cf" />
</div><!--End rTable-->

<div class="rPage">
<div class="rSort">
<a href="#">^ Back to top</a> </div>
<div class="rPaging">
fujifoo
 
Posts: 3
Joined: Fri Nov 14, 2008 2:23 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 5 guests

-->