Extracting Text

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderator: iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting Text

by Oscar1998 on Fri Mar 25, 2011 11:17 am

Hello,

I am using imacros version 7.2 on firefox 4.0 windows 7 and would like some help if possible.

I am trying to extract a block of text that includes the name, address, phone #, and fax # from a webpage here is the html code:


Code: Select all
<table style="margin:20px 0px 20px 0px;" align="center" cellpadding="0" cellspacing="0">
   
   <tr>
      <td align="center" style="font-weight:bold;font-size:20px;">
      ROBERT HALF INTERNATIONAL INC.
      </td>
   </tr>
   
</table>



<table align="center" style="font-size:11px;" cellpadding="0" cellspacing="0" width="90%">
   
   <tr>
      <td valign="top" width="60%"><strong>ROBERT HALF INTERNATIONAL INC.</strong>
      
         <div>Principal Executive Offices</div>
         <div>2884 Sand Hill Rd.</div>
         <div>Menlo Park, CA 94025</div>
         <div style="padding-top:10px;"><b>Phone:</b> 650-234-6000</div>
         <div><b>Fax:</b> 650-234-6999</div>   [/color]
      </td>



Here is the code I am working on

Code: Select all
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:login ATTR=VALUE:<SP><SP><SP><SP><SP>Login<SP><SP><SP><SP><SP>
WAIT SECONDS=10
FRAME F=2
TAG POS=1 TYPE=SPAN ATTR=TXT:All<SP>*<SP>Major<SP>Accounts
TAG POS=1 TYPE=SELECT ATTR=ID:HQState CONTENT=6
TAG POS=1 TYPE=IMG ATTR=ID:all
WAIT SECONDS=10
TAG POS=1 TYPE=SELECT ATTR=ID:AlphaStart CONTENT=5
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:alphaForm ATTR=VALUE:View<SP>This<SP>Report
FRAME F=2
TAG POS=1 TYPE=STRONG ATTR=TXT:* EXTRACT=TXT


Any help would be much appreciated.
Oscar1998
 
Posts: 2
Joined: Fri Mar 25, 2011 11:07 am

Re: Extracting Text

by Tom, Tech Support on Thu Mar 31, 2011 2:49 am

Hello Oscar,

There are a number of different ways to do this, and the best approach may also depend on the other content that appears on the page. Since you have only provided a snippet of the page, I can only provide suggestions that will work with this specific subset of the content, but which likely won't work with the full page, so just keep that in mind.

Approach 1 - Use absolute positioning. This example assumes that the desired fields are contained in DIV elements and that no other DIV elements appear on the page prior to the target fields.

Code: Select all
' Name
TAG POS=1 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' Address
TAG POS=2 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' City/State/Zip
TAG POS=3 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' Phone
TAG POS=4 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' Fax
TAG POS=5 TYPE=DIV ATTR=TXT:* EXTRACT=TXT

Approach 2 - Use relative positioning. This example assumes that there two and only two tables on the page and that the target fields are in DIV elements in the second table.

Code: Select all
' Set extraction anchor
TAG POS=1 TYPE=TABLE ATTR=TXT:*
' Name
TAG POS=R1 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' Address
TAG POS=R1 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' City/State/Zip
TAG POS=R1 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' Phone
TAG POS=R1 TYPE=DIV ATTR=TXT:* EXTRACT=TXT
' Fax
TAG POS=R1 TYPE=DIV ATTR=TXT:* EXTRACT=TXT

This is just two examples, but there are more. It ultimately depends on how you want to extract the data (as individual fields or as one entire block), and as I already mentioned, the layout of the entire page particularly the elements that appear before the desired fields.
Regards,

Tom, iOpus Support
Tom, Tech Support
 
Posts: 3016
Joined: Mon May 31, 2010 9:59 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 2 guests

Website Monitoring