Yellow Pages macro Difficulty

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
maxdoldan
Posts: 9
Joined: Wed Jun 16, 2010 5:33 pm

Yellow Pages macro Difficulty

Post by maxdoldan » Wed Jun 16, 2010 6:00 pm

Hi I have scripted this macro in order to extract data from Yellow pages.
The idea is to generate a CSV file with the data from each individual record in any given search we do.
So far this is the code I have made.
VERSION BUILD=6021121
TAB T=1
TAB CLOSEALLOTHERS
URL
GOTO=http://www.yellowpages.com/miami-fl/web ... eb+hosting

SET !EXTRACT_TEST_POPUP NO
VERSION BUILD=6021121
TAB T=1
TAB CLOSEALLOTHERS
URL
GOTO=http://www.yellowpages.com/miami-fl/web ... eb+hosting

TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=2 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=3 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=4 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=5 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=6 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=7 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=8 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=9 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=10 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=11 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=12 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=13 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=14 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=17 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=18 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=20 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=19 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT TAG POS=1 TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT SAVEAS TYPE=EXTRACT FOLDER=* FILE=yellowpages{{!NOW:yymmdd}}.csv
TAG POS=1 TYPE=A ATTR=TXT:Next

Now here are the problems i have with this script.
FIrst of all the CSV file contains no formatting. There is no separation between the records making them much harder to analyze.
And the second problem is that each page in Yellowpages contains 30 records and I can only TAG 20 POS. After I get to 20 POSi get the following error Automatic POS detection failed (POS > 20). please try to specify the element further.

my questions are the following.
How can I make the macro format the CSV file so that I have one value in each line.
And is there any way aroun the 20 POS limitation.
Best Regards.
Image
Paste this code in your browser. Direct Iternational toll free number.

Code: Select all

http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
billbell52
Posts: 125
Joined: Tue Mar 23, 2010 8:45 pm

Re: Yellow Pages macro Difficulty

Post by billbell52 » Wed Jun 16, 2010 8:01 pm

I posted a solution to this a few weeks ago

forum.imacros.net/viewtopic.php?f=7&t=10267
maxdoldan
Posts: 9
Joined: Wed Jun 16, 2010 5:33 pm

Re: Yellow Pages macro Difficulty

Post by maxdoldan » Wed Jun 16, 2010 9:32 pm

Yes I tried that but i can't get it to get past the 20th POS or even get it to go to the SECOND POS.
What can i add? I ran it in loop mode but i just got the 1st company copied over and over again. Please take into consideration this is my first Macro done by me. The previous work I've done with macros was just running them.
I will keep on trying and paste the full code once i can do it right.
Image
Paste this code in your browser. Direct Iternational toll free number.

Code: Select all

http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
billbell52
Posts: 125
Joined: Tue Mar 23, 2010 8:45 pm

Re: Yellow Pages macro Difficulty

Post by billbell52 » Thu Jun 17, 2010 5:20 pm

You need to use a loop to extract the data like I posted previously. Something like this will pull each info div

'SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP NO
TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=C:\TEMP FILE=YP.csv

This pulls the whole company info into a string.
Trying to scrape each piece of company data is more difficult since some listings have no street/city info. Imacros ends up pulling data from some other company rather than skipping the data. Maybe someone at iOpus can shed some light on this.

If it was me I would use the scripting edition and pull the data that way. I could save each company info as HTM and then use regular expressions to get the pieces (i.e. street, city etc).
maxdoldan
Posts: 9
Joined: Wed Jun 16, 2010 5:33 pm

Finally Cracked just needs some finishing touches.

Post by maxdoldan » Sat Jun 26, 2010 4:02 am

I have made some tweaks to what you did to get exactly what I wanted. This is how the code now looks like.

Code: Select all

VERSION BUILD=6021121     
'SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP NO
TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=H3 ATTR=CLASS:business-name<SP>fn<SP>org&&TXT:* EXTRACT=TXT 
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:business-phone<SP>phone&&TXT:* EXTRACT=TXT 
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:street-address&&TXT:* EXTRACT=TXT 
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:locality&&TXT:* EXTRACT=TXT 
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:region&&TXT:* EXTRACT=TXT 
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:postal-code&&TXT:* EXTRACT=TXT 
SAVEAS TYPE=EXTRACT FOLDER=* FILE=yellowpages{{!NOW:yymmdd}}.csv     
This is what the result gives you this. Copy and pase the code onto a notepad and save it as {name}.csv then you can import to ecel or whatever.

Code: Select all

"Web Hosting Headquarter 200 S Biscayne Blvd # 5300, Miami, FL 33131 » Map (305) 622-6669 0.4 miles ","Web Hosting Headquarter ","(305) 622-6669 ","200 S Biscayne Blvd # 5300, ","Miami","FL","33131"
"Beyond Hosting Mainville, OH 45039 (724) 790-4678 ","Beyond Hosting ","(724) 790-4678 ","4995 NW 72nd Ave, ","Mainville","OH","45039"
"Weby Host 4995 NW 72nd Ave, Miami, FL 33166 » Map (305) 406-3822 8.1 miles ","Weby Host ","(305) 406-3822 ","2440 SW 64 Ave, ","Miami","FL","33166"
"MIAMI WEB DESIGN CORCO INTERNATIONAL 2440 SW 64 Ave, Miami, FL 33155 » Map (786) 291-4828 6.8 miles ","MIAMI WEB DESIGN CORCO INTERNATIONAL ","(786) 291-4828 ","9563 SW 145th Ct, ","Miami","FL","33155"
"at&t Serving the Miami Area. (888) 283-7331 ","at&t ","(888) 283-7331 ","Wm, ","Miami","FL","33186"
"Mojo Media Miami, Inc. 9563 SW 145th Ct, Miami, FL 33186 » Map (786) 200-9169 16 miles ","Mojo Media Miami, Inc. ","(786) 200-9169 ","P.O. Box 170267, ","Miami","FL","33144"
"at&t Serving the Miami Area. (888) 436-8638 ","at&t ","(888) 436-8638 ","12361 SW 99 Street, ","Miami","FL","33017"
"Loffler Turner Solutions Wm, Miami, FL 33144 (305) 263-3100 ","Loffler Turner Solutions ","(305) 263-3100 ","15225 NE 6th Ave, ","Miami","FL","33186"
"Hostway Serving the Miami Area. (888) 507-9659 ","Hostway ","(888) 507-9659 ","12507 NW 11th Ln, ","Miami","FL","33162"
"iWeb, Inc. P.O. Box 170267, Miami, FL 33017 (786) 426-9249 ","iWeb, Inc. ","(786) 426-9249 ","13218 SW 131st St, ","Miami","FL","33182"
"Advanced Data Technologies 12361 SW 99 Street, Miami, FL 33186 » Map (305) 469-3994 14.3 miles ","Advanced Data Technologies ","(305) 469-3994 ","PO Box 772582, ","Miami","FL","33186"
"1804 Design Miami Web Design 15225 NE 6th Ave, Miami, FL 33162 » Map (305) 407-1642 9.7 miles ","1804 Design Miami Web Design ","(305) 407-1642 ","10300 SW Sunset Drive, ","Miami","FL","33177"
"Advanced Website Design 12507 NW 11th Ln, Miami, FL 33182 » Map (305) 379-1809 12.6 miles ","Advanced Website Design ","(305) 379-1809 ","12134 SW 117th Ct., ","Miami","FL","33173"
Now the finishing touch would be for me to add the next page feature so i can leave it to run and finish searching all the results.
I can't figure out a way to tell imacros that when loop hits 30 go to next page and start loop count back or something. I would really appreciate translating this or pointing the way on a thread explaining.
NOTE: this macro is assuming that you are already in the page you want to search for more flexibilyty. In this case the URL is http://www.yellowpages.com/miami-fl/web ... eb+hosting
Best Regards
Image
Paste this code in your browser. Direct Iternational toll free number.

Code: Select all

http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
billbell52
Posts: 125
Joined: Tue Mar 23, 2010 8:45 pm

Re: Yellow Pages macro Difficulty

Post by billbell52 » Sat Jun 26, 2010 12:41 pm

You can automate the Next Page using the scripting version of iMacros. You need to write a loop that extracts the data and checks when it gets an error. When you get an error you can assume you are at the end of page. This should get you started.

http://wiki.imacros.net/Web_Scripting
mike007
Posts: 31
Joined: Wed Jun 23, 2010 9:03 am

Re: Yellow Pages macro Difficulty

Post by mike007 » Sat Jun 26, 2010 1:16 pm

Or use firefox and javascript
billbell52
Posts: 125
Joined: Tue Mar 23, 2010 8:45 pm

Re: Yellow Pages macro Difficulty

Post by billbell52 » Sat Jun 26, 2010 4:54 pm

Correct, the web scripting version is supported in FF and you can use javascript.
maxdoldan
Posts: 9
Joined: Wed Jun 16, 2010 5:33 pm

Re: Yellow Pages macro Difficulty

Post by maxdoldan » Tue Jun 29, 2010 4:06 pm

I have the scripting edition. my problem right now is this.
The macro i have pulls the data out of each page.
THe problem is that the macro runs based on the Loop Amount. So for every page loop must be 30.
Is there a way to run the macro i pasted before. Then run another macro that changes to next page and finally to reset the loop count to 0 for the second page to be extracted correctly?
Guys you just tossed some links at me that i can't handle quite well i would really appreciate it someone from Iopus gave me a hand.
Image
Paste this code in your browser. Direct Iternational toll free number.

Code: Select all

http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
maxdoldan
Posts: 9
Joined: Wed Jun 16, 2010 5:33 pm

Re: Yellow Pages macro Difficulty

Post by maxdoldan » Tue Jun 29, 2010 7:07 pm

More problems... Now I have tried to get something done so i started experimenting with VBS scripts... I have created one that plays my macros in order... THe only problem I have is that the main macro has the !LOOP value on itself... So if i run the macro directly from the VBS script the !LOOP value isn't affected. I have tried setting it with SET !LOOP 1 and other numbers but i can't...
An additional problem is the fact that I need to run the Macro 30 times... And THEN but only THEN i need to go to next page.
Now if i include the Next page function within the macro itself i have the problem that the macro runs only once and doesn't wait for the loop to reach 30 so that every value is extracted...
Please help I'm going insane with this...
Is there a way to do this in a quicker faster macro script... I don't know how to keep on with this every single thing I try has a complication...
Regards.
Image
Paste this code in your browser. Direct Iternational toll free number.

Code: Select all

http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
billbell52
Posts: 125
Joined: Tue Mar 23, 2010 8:45 pm

Re: Yellow Pages macro Difficulty

Post by billbell52 » Wed Jun 30, 2010 10:46 pm

There are a few problems with your current script. I mentioned this in my June 17th post. Look at at&t on the first page. It only has a telephone number but the script picks up an address from another customer. Maybe someone from iOpus can comment. If it did work a vbs script like the one below would work (almost). It has a bug. If there are exactly 30, 60 etc results it tries to click next and it won't be there. The only way I see to fix it extract the html and regular expressions.

Code: Select all

Option Explicit
Dim fso, csvPath, csvFile
Dim done, i, iim1, cnt
Dim allDone, pageDone
Dim csvStr, k, iret, value

csvPath = "C:\temp\YP.csv" 
Set fso = CreateObject("Scripting.fileSystemObject")
Set csvFile  = fso.CreateTextFile(csvPath, TRUE)
csvFile.WriteLine("Header" + vbCrLf)

Set iim1= CreateObject ("imacros")
iret = iim1.iimInit
iret = iim1.iimPlay("C:\temp\QryYP.iim")
While Not allDone
  cnt = 1
  While Not pageDone And cnt <= 30
    iret = iim1.iimSet("Cnt", cnt)
    iret = iim1.iimPlay("C:\temp\ExtractEntry.iim")
    csvStr = ""
    For k = 1 to 7
      value = iim1.iimGetLastExtract(CInt(k))
      If k = 1 And InStr(value, "EANF") > 0 Then 
        pageDone = True
        allDone = True
        Exit For
      Else
        csvStr = csvStr + """" + value + """," 
      End If
    Next
    csvFile.WriteLine(csvStr + vbCrLf)
    cnt = cnt + 1
  Wend
  If Not allDone then
    iret = iim1.iimPlay("C:\temp\Next.iim")
  End If
Wend   
csvFile.Close
iret = iim1.iimExit
Set iim1 = Nothing
Set fso = Nothing 
WScript.Quit(0)
ExtractEntry.iim

Code: Select all

'SET !EXTRACT_TEST_POPUP NO
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=H3 ATTR=CLASS:business-name<SP>fn<SP>org&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:business-phone<SP>phone&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:street-address&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:locality&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:region&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=SPAN ATTR=CLASS:postal-code&&TXT:* EXTRACT=TXT 
Next.iim

Code: Select all

TAG POS=1 TYPE=A ATTR=TXT:Next   
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Yellow Pages macro Difficulty

Post by Tom, Tech Support » Thu Jul 01, 2010 2:40 pm

Hello,

Billbell52, I like your solution. I've modified it a bit to address the issue when there's incomplete address information for a particular listing by using Extraction with Relative Positioning. I also added some handling to take care of the "bug" in your script (not really a bug imo, just a condition that needed to be tested for).

Code: Select all

Option Explicit

Dim fso, csvPath, csvFile, macroPath
Dim done, i, iim1, cnt
Dim allDone, pageDone
Dim csvStr, k, iret, value

macroPath = "C:\temp\"
csvPath = "C:\temp\YP.csv"

Set fso = CreateObject("Scripting.fileSystemObject")
Set csvFile  = fso.CreateTextFile(csvPath, TRUE)
csvFile.WriteLine("Header")

Set iim1= CreateObject ("imacros")
iret = iim1.iimInit()
iret = iim1.iimPlay(macroPath + "QryYP.iim")

While Not allDone
  cnt = 1
  While Not pageDone And cnt <= 30
	iret = iim1.iimSet("Cnt", cnt)
	iret = iim1.iimPlay(macroPath + "ExtractEntry.iim")
	
	If iret = 1 Then
		csvStr = ""
		For k = 1 to 6
		  value = Replace(iim1.iimGetLastExtract(CInt(k)), "#EANF#", "")
		  csvStr = csvStr + """" + value + ""","
		Next
	Else
		pageDone = True
		allDone = True
	End If
		
	csvFile.WriteLine(csvStr)
	cnt = cnt + 1
  Wend
  If Not allDone then
	iret = iim1.iimPlay(macroPath + "Next.iim")
	If iret < 0 Then
		allDone = True
	End If
  End If
Wend   

csvFile.Close
iret = iim1.iimExit
Set iim1 = Nothing
Set fso = Nothing
WScript.Quit(0)
ExtractEntry.iim

Code: Select all

SET !TIMEOUT_TAG 1
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:listing_actions
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=H3 ATTR=CLASS:business-name<SP>fn<SP>org&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:business-phone<SP>phone&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:street-address&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:locality&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:region&&TXT:* EXTRACT=TXT
TAG POS={{Cnt}} TYPE=DIV ATTR=CLASS:info&&TXT:*
TAG POS=R1 TYPE=SPAN ATTR=CLASS:postal-code&&TXT:* EXTRACT=TXT 
Next.iim

Code: Select all

SET !TIMEOUT_TAG 1
TAG POS=1 TYPE=A ATTR=TXT:Next   
Regards,

Tom, iMacros Support
billbell52
Posts: 125
Joined: Tue Mar 23, 2010 8:45 pm

Re: Yellow Pages macro Difficulty

Post by billbell52 » Thu Jul 01, 2010 6:24 pm

Thanks for the feedback. I did try relative positioning but I only set the anchor once. I did not interleave it between each relative extract.
maxdoldan
Posts: 9
Joined: Wed Jun 16, 2010 5:33 pm

Re: Yellow Pages macro Difficulty

Post by maxdoldan » Mon Jul 12, 2010 4:09 pm

Thanks so much Tom I will try this.


Tom i tried it but i have a question why is there ANOTHER macro called QryYP.iim ?
I got a second mistake now. it says SET !TIMEOUT_TAG 1 is not a command.
Image
Paste this code in your browser. Direct Iternational toll free number.

Code: Select all

http://www.theclicktocall.com/areftoe.aspx?&key=0474A5AD5832329B4A165E21EC0A4385&refnum=ogfd8duyhSMZF1xkci4Z1g==
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Yellow Pages macro Difficulty

Post by Tom, Tech Support » Thu Jul 15, 2010 11:41 am

Maxdoldan,

Sorry I didn't post QryYP.iim, it's just a one-line URL GOTO command that navigates to the desired search page:

QryYP.iim:

Code: Select all

URL GOTO=http://www.yellowpages.com/miami-fl/web-hosting?g=Miami%2C+FL&page=1&q=web+hosting
!TIMEOUT_TAG is valid only in version 6.80 and later. So you are running a version older than 6.80?
Regards,

Tom, iMacros Support
Post Reply