Yellow Pages macro Difficulty

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Yellow Pages macro Difficulty

by maxdoldan on Wed Jun 16, 2010 11:00 am

Hi I have scripted this macro in order to extract data from Yellow pages.
The idea is to generate a CSV file with the data from each individual record in any given search we do.
So far this is the code I have made.
VERSION BUILD=6021121
TAB T=1
TAB CLOSEALLOTHERS
URL
GOTO=http://www.yellowpages.com/miami-fl/web-hosting?g=Miami%2C+FL&q=web+hosting

SET !EXTRACT_TEST_POPUP NO
VERSION BUILD=6021121
TAB T=1
TAB CLOSEALLOTHERS
URL
GOTO=http://www.yellowpages.com/miami-fl/web-hosting?g=Miami%2C+FL&q=web+hosting

TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=2 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=3 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=4 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=5 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=6 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=7 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=8 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=9 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=10 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=11 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=12 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=13 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=14 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=17 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=18 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=20 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=19 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT SAVEAS TYPE=EXTRACT FOLDER=* FILE=yellowpages{{!NOW:yymmdd}}.csv
TAG POS=1 TYPE=A ATTR=TXT:Next

Now here are the problems i have with this script.
FIrst of all the CSV file contains no formatting. There is no separation between the records making them much harder to analyze.
And the second problem is that each page in Yellowpages contains 30 records and I can only TAG 20 POS. After I get to 20 POSi get the following error Automatic POS detection failed (POS > 20). please try to specify the element further.

my questions are the following.
How can I make the macro format the CSV file so that I have one value in each line.
And is there any way aroun the 20 POS limitation.
Best Regards.
Image
Paste this code in your browser. Direct Iternational toll free number.
Code: Select all
http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
maxdoldan
 
Posts: 9
Joined: Wed Jun 16, 2010 10:33 am

Re: Yellow Pages macro Difficulty

by billbell52 on Wed Jun 16, 2010 1:01 pm

I posted a solution to this a few weeks ago

forum.iopus.com/viewtopic.php?f=7&t=10267
billbell52
 
Posts: 125
Joined: Tue Mar 23, 2010 1:45 pm

Re: Yellow Pages macro Difficulty

by maxdoldan on Wed Jun 16, 2010 2:32 pm

Yes I tried that but i can't get it to get past the 20th POS or even get it to go to the SECOND POS.
What can i add? I ran it in loop mode but i just got the 1st company copied over and over again. Please take into consideration this is my first Macro done by me. The previous work I've done with macros was just running them.
I will keep on trying and paste the full code once i can do it right.
Image
Paste this code in your browser. Direct Iternational toll free number.
Code: Select all
http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
maxdoldan
 
Posts: 9
Joined: Wed Jun 16, 2010 10:33 am

Re: Yellow Pages macro Difficulty

by billbell52 on Thu Jun 17, 2010 10:20 am

You need to use a loop to extract the data like I posted previously. Something like this will pull each info div

'SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP NO
TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=C:\TEMP FILE=YP.csv

This pulls the whole company info into a string.
Trying to scrape each piece of company data is more difficult since some listings have no street/city info. Imacros ends up pulling data from some other company rather than skipping the data. Maybe someone at iOpus can shed some light on this.

If it was me I would use the scripting edition and pull the data that way. I could save each company info as HTM and then use regular expressions to get the pieces (i.e. street, city etc).
billbell52
 
Posts: 125
Joined: Tue Mar 23, 2010 1:45 pm

Finally Cracked just needs some finishing touches.

by maxdoldan on Fri Jun 25, 2010 9:02 pm

I have made some tweaks to what you did to get exactly what I wanted. This is how the code now looks like.


Code: Select all
VERSION BUILD=6021121     
'SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP NO
TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=H3 ATTR=CLASS:business-name<SP>fn<SP>org&&TXT:* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:business-phone<SP>phone&&TXT:* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:street-address&&TXT:* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:locality&&TXT:* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:region&&TXT:* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:postal-code&&TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=yellowpages{{!NOW:yymmdd}}.csv     


This is what the result gives you this. Copy and pase the code onto a notepad and save it as {name}.csv then you can import to ecel or whatever.
Code: Select all
"Web Hosting Headquarter 200 S Biscayne Blvd # 5300, Miami, FL 33131 » Map (305) 622-6669 0.4 miles ","Web Hosting Headquarter ","(305) 622-6669 ","200 S Biscayne Blvd # 5300, ","Miami","FL","33131"
"Beyond Hosting Mainville, OH 45039 (724) 790-4678 ","Beyond Hosting ","(724) 790-4678 ","4995 NW 72nd Ave, ","Mainville","OH","45039"
"Weby Host 4995 NW 72nd Ave, Miami, FL 33166 » Map (305) 406-3822 8.1 miles ","Weby Host ","(305) 406-3822 ","2440 SW 64 Ave, ","Miami","FL","33166"
"MIAMI WEB DESIGN CORCO INTERNATIONAL 2440 SW 64 Ave, Miami, FL 33155 » Map (786) 291-4828 6.8 miles ","MIAMI WEB DESIGN CORCO INTERNATIONAL ","(786) 291-4828 ","9563 SW 145th Ct, ","Miami","FL","33155"
"at&t Serving the Miami Area. (888) 283-7331 ","at&t ","(888) 283-7331 ","Wm, ","Miami","FL","33186"
"Mojo Media Miami, Inc. 9563 SW 145th Ct, Miami, FL 33186 » Map (786) 200-9169 16 miles ","Mojo Media Miami, Inc. ","(786) 200-9169 ","P.O. Box 170267, ","Miami","FL","33144"
"at&t Serving the Miami Area. (888) 436-8638 ","at&t ","(888) 436-8638 ","12361 SW 99 Street, ","Miami","FL","33017"
"Loffler Turner Solutions Wm, Miami, FL 33144 (305) 263-3100 ","Loffler Turner Solutions ","(305) 263-3100 ","15225 NE 6th Ave, ","Miami","FL","33186"
"Hostway Serving the Miami Area. (888) 507-9659 ","Hostway ","(888) 507-9659 ","12507 NW 11th Ln, ","Miami","FL","33162"
"iWeb, Inc. P.O. Box 170267, Miami, FL 33017 (786) 426-9249 ","iWeb, Inc. ","(786) 426-9249 ","13218 SW 131st St, ","Miami","FL","33182"
"Advanced Data Technologies 12361 SW 99 Street, Miami, FL 33186 » Map (305) 469-3994 14.3 miles ","Advanced Data Technologies ","(305) 469-3994 ","PO Box 772582, ","Miami","FL","33186"
"1804 Design Miami Web Design 15225 NE 6th Ave, Miami, FL 33162 » Map (305) 407-1642 9.7 miles ","1804 Design Miami Web Design ","(305) 407-1642 ","10300 SW Sunset Drive, ","Miami","FL","33177"
"Advanced Website Design 12507 NW 11th Ln, Miami, FL 33182 » Map (305) 379-1809 12.6 miles ","Advanced Website Design ","(305) 379-1809 ","12134 SW 117th Ct., ","Miami","FL","33173"


Now the finishing touch would be for me to add the next page feature so i can leave it to run and finish searching all the results.
I can't figure out a way to tell imacros that when loop hits 30 go to next page and start loop count back or something. I would really appreciate translating this or pointing the way on a thread explaining.
NOTE: this macro is assuming that you are already in the page you want to search for more flexibilyty. In this case the URL is http://www.yellowpages.com/miami-fl/web-hosting?g=Miami%2C+FL&page=1&q=web+hosting
Best Regards
Image
Paste this code in your browser. Direct Iternational toll free number.
Code: Select all
http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
maxdoldan
 
Posts: 9
Joined: Wed Jun 16, 2010 10:33 am

Re: Yellow Pages macro Difficulty

by billbell52 on Sat Jun 26, 2010 5:41 am

You can automate the Next Page using the scripting version of iMacros. You need to write a loop that extracts the data and checks when it gets an error. When you get an error you can assume you are at the end of page. This should get you started.

http://wiki.imacros.net/Web_Scripting
billbell52
 
Posts: 125
Joined: Tue Mar 23, 2010 1:45 pm

Re: Yellow Pages macro Difficulty

by mike007 on Sat Jun 26, 2010 6:16 am

Or use firefox and javascript
mike007
 
Posts: 31
Joined: Wed Jun 23, 2010 2:03 am

Re: Yellow Pages macro Difficulty

by billbell52 on Sat Jun 26, 2010 9:54 am

Correct, the web scripting version is supported in FF and you can use javascript.
billbell52
 
Posts: 125
Joined: Tue Mar 23, 2010 1:45 pm

Re: Yellow Pages macro Difficulty

by maxdoldan on Tue Jun 29, 2010 9:06 am

I have the scripting edition. my problem right now is this.
The macro i have pulls the data out of each page.
THe problem is that the macro runs based on the Loop Amount. So for every page loop must be 30.
Is there a way to run the macro i pasted before. Then run another macro that changes to next page and finally to reset the loop count to 0 for the second page to be extracted correctly?
Guys you just tossed some links at me that i can't handle quite well i would really appreciate it someone from Iopus gave me a hand.
Image
Paste this code in your browser. Direct Iternational toll free number.
Code: Select all
http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
maxdoldan
 
Posts: 9
Joined: Wed Jun 16, 2010 10:33 am

Re: Yellow Pages macro Difficulty

by maxdoldan on Tue Jun 29, 2010 12:07 pm

More problems... Now I have tried to get something done so i started experimenting with VBS scripts... I have created one that plays my macros in order... THe only problem I have is that the main macro has the !LOOP value on itself... So if i run the macro directly from the VBS script the !LOOP value isn't affected. I have tried setting it with SET !LOOP 1 and other numbers but i can't...
An additional problem is the fact that I need to run the Macro 30 times... And THEN but only THEN i need to go to next page.
Now if i include the Next page function within the macro itself i have the problem that the macro runs only once and doesn't wait for the loop to reach 30 so that every value is extracted...
Please help I'm going insane with this...
Is there a way to do this in a quicker faster macro script... I don't know how to keep on with this every single thing I try has a complication...
Regards.
Image
Paste this code in your browser. Direct Iternational toll free number.
Code: Select all
http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
maxdoldan
 
Posts: 9
Joined: Wed Jun 16, 2010 10:33 am

Re: Yellow Pages macro Difficulty

by billbell52 on Wed Jun 30, 2010 3:46 pm

There are a few problems with your current script. I mentioned this in my June 17th post. Look at at&t on the first page. It only has a telephone number but the script picks up an address from another customer. Maybe someone from iOpus can comment. If it did work a vbs script like the one below would work (almost). It has a bug. If there are exactly 30, 60 etc results it tries to click next and it won't be there. The only way I see to fix it extract the html and regular expressions.

Code: Select all
Option Explicit
Dim fso, csvPath, csvFile
Dim done, i, iim1, cnt
Dim allDone, pageDone
Dim csvStr, k, iret, value

csvPath = "C:\temp\YP.csv"
Set fso = CreateObject("Scripting.fileSystemObject")
Set csvFile  = fso.CreateTextFile(csvPath, TRUE)
csvFile.WriteLine("Header" + vbCrLf)

Set iim1= CreateObject ("imacros")
iret = iim1.iimInit
iret = iim1.iimPlay("C:\temp\QryYP.iim")
While Not allDone
  cnt = 1
  While Not pageDone And cnt <= 30
    iret = iim1.iimSet("Cnt", cnt)
    iret = iim1.iimPlay("C:\temp\ExtractEntry.iim")
    csvStr = ""
    For k = 1 to 7
      value = iim1.iimGetLastExtract(CInt(k))
      If k = 1 And InStr(value, "EANF") > 0 Then
        pageDone = True
        allDone = True
        Exit For
      Else
        csvStr = csvStr + """" + value + ""","
      End If
    Next
    csvFile.WriteLine(csvStr + vbCrLf)
    cnt = cnt + 1
  Wend
  If Not allDone then
    iret = iim1.iimPlay("C:\temp\Next.iim")
  End If
Wend   
csvFile.Close
iret = iim1.iimExit
Set iim1 = Nothing
Set fso = Nothing
WScript.Quit(0)


ExtractEntry.iim

Code: Select all
'SET !EXTRACT_TEST_POPUP NO
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=H3 ATTR=CLASS:business-name<SP>fn<SP>org&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:business-phone<SP>phone&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:street-address&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:locality&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:region&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:postal-code&&TXT:* EXTRACT=TXT


Next.iim

Code: Select all
TAG POS=1 TYPE=A ATTR=TXT:Next   
billbell52
 
Posts: 125
Joined: Tue Mar 23, 2010 1:45 pm

Re: Yellow Pages macro Difficulty

by Tom, Tech Support on Thu Jul 01, 2010 7:40 am

Hello,

Billbell52, I like your solution. I've modified it a bit to address the issue when there's incomplete address information for a particular listing by using Extraction with Relative Positioning. I also added some handling to take care of the "bug" in your script (not really a bug imo, just a condition that needed to be tested for).

Code: Select all
Option Explicit

Dim fso, csvPath, csvFile, macroPath
Dim done, i, iim1, cnt
Dim allDone, pageDone
Dim csvStr, k, iret, value

macroPath = "C:\temp\"
csvPath = "C:\temp\YP.csv"

Set fso = CreateObject("Scripting.fileSystemObject")
Set csvFile  = fso.CreateTextFile(csvPath, TRUE)
csvFile.WriteLine("Header")

Set iim1= CreateObject ("imacros")
iret = iim1.iimInit()
iret = iim1.iimPlay(macroPath + "QryYP.iim")

While Not allDone
  cnt = 1
  While Not pageDone And cnt <= 30
   iret = iim1.iimSet("Cnt", cnt)
   iret = iim1.iimPlay(macroPath + "ExtractEntry.iim")
   
   If iret = 1 Then
      csvStr = ""
      For k = 1 to 6
        value = Replace(iim1.iimGetLastExtract(CInt(k)), "#EANF#", "")
        csvStr = csvStr + """" + value + ""","
      Next
   Else
      pageDone = True
      allDone = True
   End If
      
   csvFile.WriteLine(csvStr)
   cnt = cnt + 1
  Wend
  If Not allDone then
   iret = iim1.iimPlay(macroPath + "Next.iim")
   If iret < 0 Then
      allDone = True
   End If
  End If
Wend   

csvFile.Close
iret = iim1.iimExit
Set iim1 = Nothing
Set fso = Nothing
WScript.Quit(0)

ExtractEntry.iim
Code: Select all
SET !TIMEOUT_TAG 1
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:listing_actions
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=H3 ATTR=CLASS:business-name<SP>fn<SP>org&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:business-phone<SP>phone&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:street-address&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:locality&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:region&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:postal-code&&TXT:* EXTRACT=TXT

Next.iim
Code: Select all
SET !TIMEOUT_TAG 1
TAG POS=1 TYPE=A ATTR=TXT:Next   
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3298
Joined: Mon May 31, 2010 9:59 am

Re: Yellow Pages macro Difficulty

by billbell52 on Thu Jul 01, 2010 11:24 am

Thanks for the feedback. I did try relative positioning but I only set the anchor once. I did not interleave it between each relative extract.
billbell52
 
Posts: 125
Joined: Tue Mar 23, 2010 1:45 pm

Re: Yellow Pages macro Difficulty

by maxdoldan on Mon Jul 12, 2010 9:09 am

Thanks so much Tom I will try this.


Tom i tried it but i have a question why is there ANOTHER macro called QryYP.iim ?
I got a second mistake now. it says SET !TIMEOUT_TAG 1 is not a command.
Image
Paste this code in your browser. Direct Iternational toll free number.
Code: Select all
http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
maxdoldan
 
Posts: 9
Joined: Wed Jun 16, 2010 10:33 am

Re: Yellow Pages macro Difficulty

by Tom, Tech Support on Thu Jul 15, 2010 4:41 am

Maxdoldan,

Sorry I didn't post QryYP.iim, it's just a one-line URL GOTO command that navigates to the desired search page:

QryYP.iim:
Code: Select all
URL GOTO=http://www.yellowpages.com/miami-fl/web-hosting?g=Miami%2C+FL&page=1&q=web+hosting

!TIMEOUT_TAG is valid only in version 6.80 and later. So you are running a version older than 6.80?
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3298
Joined: Mon May 31, 2010 9:59 am

Next

Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 1 guest

-->