Need help with simple google result page extract

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Need help with simple google result page extract

by green.pine on Tue Sep 26, 2017 2:41 pm

Hello;
I need help with my code, it does not work and gives me error:
allocation size overflow, line: 10 (Error code: -1001)

what I want to do is simple,
part 1:
search Google for a keyword, when the result appears, copy the whole page to a text or csv file
just like when you highlight the page using (Ctrl+A)and copy paste it to a text file,

part 2:
then I want the code continues on to the next page of the result, and do the same thing, add it to the same file and continues until
the end of result pages, but I am stuck in the fist part, I have not tried the second part yet.

here is my code:

Code: Select all
VERSION BUILD=9030808 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
TAB T=1
URL GOTO=https://www.google.com/
TAG POS=1 TYPE=INPUT:TEXT FORM=ID:tsf ATTR=ID:lst-ib CONTENT=dog<sp>grooming
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:tsf ATTR=NAME:btnK
TAG POS=1 TYPE=HTML ATTR=* EXTRACT=TXT
WAIT SECONDS=2
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+{{!NOW:ddmmyyyy}}.txt



appreciate your help to make this work.
green.pine
 
Posts: 61
Joined: Thu Nov 04, 2010 10:21 am

Re: Need help with simple google result page extract

by chivracq on Tue Sep 26, 2017 5:57 pm

green.pine wrote:Hello;
I need help with my code, it does not work and gives me error:
Code: Select all
allocation size overflow, line: 10 (Error code: -1001)


what I want to do is simple,
part 1:
search Google for a keyword, when the result appears, copy the whole page to a text or csv file
just like when you highlight the page using (Ctrl+A)and copy paste it to a text file,

part 2:
then I want the code continues on to the next page of the result, and do the same thing, add it to the same file and continues until
the end of result pages, but I am stuck in the fist part, I have not tried the second part yet.

here is my code:

Code: Select all
VERSION BUILD=9030808 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
TAB T=1
URL GOTO=https://www.google.com/
TAG POS=1 TYPE=INPUT:TEXT FORM=ID:tsf ATTR=ID:lst-ib CONTENT=dog<sp>grooming
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:tsf ATTR=NAME:btnK
TAG POS=1 TYPE=HTML ATTR=* EXTRACT=TXT
WAIT SECONDS=2
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+{{!NOW:ddmmyyyy}}.txt


appreciate your help to make this work.

FCIM...! :mrgreen: (Read my Sig...)
=> iMacros for FF v9.0.3, FF55/56...?, OS...?

Your Script only has 9 Lines btw, where is Line_10...?

But OK, running your Script runs fine for me on 2 different Versions of iMacros for FF:
- iMacros for FF v8.8.2, Pale Moon v26.3.3 (=FF47), Win10_x64.
- iMacros for FF v8.9.7, FF v55.0.3, Win10_x64.
(I don't use v9.0.3 for FF and never installed it as it is a bit too limited and buggy compared to v8.9.7 for FF which is much more stable than v9.0.3.)

Strangely enough for both Envs with apparently the same Results (x100 per Page), the Size of the 'SAVEAS' is quite different: 340Kb on PM and 700Kb on FF. And for both the Content is full of "garbage" JavaScript Functions...

I tried with a 'SAVEAS TYPE=TXT' as well which could be a Workaround and sounds more "straightforward" to me than your 'SAVEAS TYPE=EXTRACT' on the 'TYPE=HTML' Element/Page, which works fine as well, and only saves some "real" Text Content without any JavaScript and which Content looks closer to me to a 'Ctrl^a + Ctrl^c', for 32Kb on PM and 37Kb on FF. And it doesn't add an extra ".csv" Extension on top of the ".txt" like 'SAVEAS TYPE=EXTRACT' does.
But you would need to give different Names for each 'SAVEAS' for each Page of Results as 'SAVEAS TYPE=TXT' seems to overwrite the previous File instead of appending the File like 'SAVEAS TYPE=EXTRACT' does...

But hum, another Workaround would be to simply do the 'Ctrl^a + Ctrl^c' using the 'EVENT' Mode. (I didn't try...)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6381
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Need help with simple google result page extract

by green.pine on Wed Sep 27, 2017 2:05 am

Thanks a lot for your reply;
Now it is working;
sorry about the line 10 miss match,
one line was empty , when I pasted the code here I removed the line, so that is why it looks that way, that is the whole codes.

I removed the new version of imacros, and installed an older ver. (v8.9.7,) with the same FF version I have, and it works fine,

I am still trying to figure it, if I can save the data of the next pages to the same file without the previous text being overwritten.

if anyone have ideas on this appreciate your help.

Thanks,
green.pine
 
Posts: 61
Joined: Thu Nov 04, 2010 10:21 am

Re: Need help with simple google result page extract

by chivracq on Wed Sep 27, 2017 5:56 am

green.pine wrote:Thanks a lot for your reply;
Now it is working;
sorry about the line 10 miss match,
one line was empty , when I pasted the code here I removed the line, so that is why it looks that way, that is the whole codes.

I removed the new version of imacros, and installed an older ver. (v8.9.7,) with the same FF version I have, and it works fine,

I am still trying to figure it, if I can save the data of the next pages to the same file without the previous text being overwritten.

if anyone have ideas on this appreciate your help.

Thanks,

OK for reverting v8.9.7 for FF, good news..., but sorry, "FCIM" again for me to follow up... What "same FF version" are you using...? + OS...?
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6381
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Need help with simple google result page extract

by green.pine on Wed Sep 27, 2017 9:04 am

sorry about it,
I did have the info but it took some time and I was logged out,
so the second time I missed the info about my firefox version and others

I have FF ver 55.0.3 32-bit, running on Win 7 32.
I have now Imacros v8.9.7

About adding more content to file, I am sitll trying to figure it out,

but I realized that I can not use text file, because I want to get the content of that text file to be pasted into a text area of a form, so it seems that imacros can not read from text line.

so I made it in a csv file, but now I have another problem, I can not pull complete data from the csv file, it does it one row by one row,
I guess that is suppose to be.

is there a way to pulled all data from csv file at one go and put it into a text area in a form?

Thanks
green.pine
 
Posts: 61
Joined: Thu Nov 04, 2010 10:21 am

Re: Need help with simple google result page extract

by chivracq on Wed Sep 27, 2017 10:37 am

green.pine wrote:sorry about it,
I did have the info but it took some time and I was logged out,
so the second time I missed the info about my firefox version and others

Code: Select all
I have FF ver  55.0.3 32-bit, running on Win 7 32.
I have now  Imacros v8.9.7


About adding more content to file, I am sitll trying to figure it out,

but I realized that I can not use text file, because I want to get the content of that text file to be pasted into a text area of a form, so it seems that imacros can not read from text line.

so I made it in a csv file, but now I have another problem, I can not pull complete data from the csv file, it does it one row by one row,
I guess that is suppose to be.

is there a way to pulled all data from csv file at one go and put it into a text area in a form?

Thanks

Ah...!, OK for your FCI... (I usually don't react to Threads when that Info is missing...)

"About adding more content to file,..." => When, that's the Purpose of the 'SAVEAS TYPE=EXTRACT' Mechanism which appends Data if you save to an existing File...
Oh...!, and I figured out why the File had '.txt.csv' Extension, it came from the "FILE=+" (or "FILE=*") which creates/saves to a File called "extract.csv" per Default...

About pulling the complete Data from your '.csv' or '.txt', I don't really understand what you want to do, ah-ah...! Your Google Search on "Dog Grooming" yields about nearly 20 Millions Results..., and you want to paste all those 20M Results into one Input Field in some Form with one Copy&Paste...!?! :shock: That's an "ambitious" Plan, I would think...!? :?
If 100 Results represent about 30Kb of Data in "clean" Text (without all the JS Garbage from your 'EXTRACT' on 'TYPE=HTML'), 20M Results represent about 6M Kb = 6Tb of Data...! I'm not sure the Web-Server will be happy...! Or is this some new form of DDOS Attack...!?

>>>
:arrow: TEMP_SAVE:
I'm running some "dangerous" Test that might crash my poor Browser, and I might lose my Post, ah-ah...!

:arrow: I'M BACK:
A bit later, "dangerous" Test didn't crash my Browser but froze it indeed for several minutes...! iMacros for FF has Difficulties doing an 'EXTRACT=HTM' on a large Page (like 100 Results on Google) with '!EXTRACT_TEST_POPUP' not disabled.
>>>

But OK, I managed to retrieve the following Info about the Data Structure on the Page by doing the following Statement:
Code: Select all
TAG POS=4 TYPE=DIV ATTR=TXT:Trenton<SP>Pet<SP>Grooming,<SP>Boarding,<SP>and<SP>Daycaresu* EXTRACT=HTM

.... which shows that each Result is constructed from 8 embedded 'DIV''s, for which the first 3 could be used to extract the whole Data in one Statement related to one Result.
Code: Select all
<div style="outline: 1px solid blue;" class="srg">

<div class="g"><!--m-->
  <div data-hveid="38" data-ved="0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBAVCCYoADAA">
    <div class="rc"><h3 style="outline: 1px solid blue;" class="r"><a href="http://sundogpets.com/" onmousedown="return rwt(this,'','','','101','AFQjCNHlzu-L4Iqg4byKdpQ7pjnguqbO4g','','0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBAWCCcwAA','','',event)">Trenton Pet Grooming, Boarding, and Daycare</a></h3>
      <div class="s">
        <div>
          <div class="f kv _SWb" style="white-space: nowrap; outline: 1px solid blue;"><cite class="_Rm">sundogpets.com/</cite>
            <div class="action-menu ab_ctl"><a class="_Fmb ab_button" href="#" id="am-b100" aria-label="Result details" aria-expanded="false" aria-haspopup="true" role="button" jsaction="m.tdd;keydown:m.hbke;keypress:m.mskpe" data-ved="0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBDsHQgoMAA"><span class="mn-dwn-arw"></span></a>
              <div class="action-menu-panel ab_dropdown" role="menu" tabindex="-1" jsaction="keydown:m.hdke;mouseover:m.hdhne;mouseout:m.hdhue" data-ved="0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBCpHwgpMAA"><ol><li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="http://webcache.googleusercontent.com/search?q=cache:BVYPfGsdjZIJ:sundogpets.com/+&amp;cd=101&amp;hl=en&amp;ct=clnk&amp;gl=nl" onmousedown="return rwt(this,'','','','101','AFQjCNHSL3PiIds6m4mVQDGkPnSb0Hsq_A','','0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBAgCCowAA','','',event)">Cached</a></li></ol>
              </div>
            </div>
          </div>
        <span class="st">Trenton, Georgia <em>Pet Grooming</em> | Overnight Pet Boarding | <em>Dog Grooming</em> | Cat Grooming | Dog and Cat Boarding | Pet Treats | Sun Dog <em>Pet Grooming</em>.</span>
        </div>
      </div>
    </div>
  </div>
<!--n-->
</div>

<div class="g"><!--m-->
<div data-hveid="43" data-ved="0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBAVCCsoATAB">
<div class="rc"><h3 class="r"><a href="http://groomersmall.com/" onmousedown="return rwt(this,'','','','102','AFQjCNEIC0-kBtaY8xWQjJssUQRuXOIbOQ','','0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBAWCCwwAQ','','',event)">©The Groomer's Mall-Professional Pet Grooming Supplies and ...</a></h3>
<div class="s">
<div>
<div class="f kv _SWb" style="white-space:nowrap"><cite class="_Rm">groomersmall.com/</cite>
<div class="action-menu ab_ctl"><a class="_Fmb ab_button" href="#" id="am-b101" aria-label="Result details" aria-expanded="false" aria-haspopup="true" role="button" jsaction="m.tdd;keydown:m.hbke;keypress:m.mskpe" data-ved="0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBDsHQgtMAE"><span class="mn-dwn-arw"></span></a>
<div class="action-menu-panel ab_dropdown" role="menu" tabindex="-1" jsaction="keydown:m.hdke;mouseover:m.hdhne;mouseout:m.hdhue" data-ved="0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBCpHwguMAE"><ol><li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="http://webcache.googleusercontent.com/search?q=cache:_tb259OEVbUJ:groomersmall.com/+&amp;cd=102&amp;hl=en&amp;ct=clnk&amp;gl=nl" onmousedown="return rwt(this,'','','','102','AFQjCNE3lIIThhImDUPQXptpxe38HJ1lOA','','0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBAgCC8wAQ','','',event)">Cached</a></li></ol>
</div>
</div>
</div><span class="st"><em>Pet Grooming</em> Supplies for Professional and Non-Professional Groomers.</span>
</div>
</div>
</div>
</div><!--n-->
</div>

<div class="g"><!--m-->
<div data-hveid="48" data-ved="0ahUKEwjy95GA-MPWAhWFLcAKHV_qBjM4ZBAVCDAoAjAC">
<div class="rc"><h3 class="r">
... etc...


And the following 'EXTRACT' on the Containing 'DIV' extracts the whole Data for all Results on the Page:
Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:srg&&TXT:* EXTRACT=TXT

And this Data can be saved via 'SAVEAS TYPE=EXTRACT' to a '.txt' or '.csv' File for each Page of Results.

Now if you want to retrieve the complete Data from that '.txt' or '.csv' File (it's actually easier with a '.txt' File), I've explained and demonstrated the Technique (that I use and which is the quickest) in the following Thread...:
- Re: Get number of lines from CSV and use as variable?
Last edited by chivracq on Wed Sep 27, 2017 11:58 am, edited 3 times in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6381
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Need help with simple google result page extract

by green.pine on Wed Sep 27, 2017 11:17 am

thanks for your reply;
:lol: :lol: yes you are right, huge amount of data,
maybe it was a stupid idea after all.

well no, I don't want to grab all the data, perhaps 10 pages of results.
what I want to do is to grab the text of few pages of google results and paste the data to a text area a of a website to search for emails, since there are few website that offer this free, so I thought to grab the data with imacros and past in the there to get possible email from certain keywords.

I was looking for imacros script to extract emails from google search results, I found only one on this forum, and it is not complete, since my imacros scripting sucks, I am not sure how to complete it to my needs.
http://forum.imacros.net/viewtopic.php?f=7&t=24571

thanks for helping.
green.pine
 
Posts: 61
Joined: Thu Nov 04, 2010 10:21 am

Re: Need help with simple google result page extract

by chivracq on Wed Sep 27, 2017 1:44 pm

green.pine wrote:thanks for your reply;
:lol: :lol: yes you are right, huge amount of data,
maybe it was a stupid idea after all.

well no, I don't want to grab all the data, perhaps 10 pages of results.
what I want to do is to grab the text of few pages of google results and paste the data to a text area a of a website to search for emails, since there are few website that offer this free, so I thought to grab the data with imacros and past in the there to get possible email from certain keywords.

I was looking for imacros script to extract emails from google search results, I found only one on this forum, and it is not complete, since my imacros scripting sucks, I am not sure how to complete it to my needs.
http://forum.imacros.net/viewtopic.php?f=7&t=24571

thanks for helping.

OK, I didn't know this Thread, or at least I don't remember it as I didn't (need to) participate in it... (But Advanced User @Shugar on SOF is indeed very good...)

OK, I did a little bit of Research because I knew there were a few other similar Threads on the Forum and I wanted to refer you to this one, but @OP has removed their Script and truncated most of their Posts in that Thread which now makes it a bit useless for other Users, grrr...! (Reason now why I quote all Posts, but I wasn't doing it yet in 2014, ah-ah...!) :roll:
- Re: extracted data saving to one cell?

+ A few other similar/related Threads:
- Find email on webpage
- extract email from text
The 2nd one is the one you already referred to, now they are all in one single Post with their Title..., and I found another one as well, but the User didn't share their Solution either even upon asking several times, but they started from the same or very similar 'REGEX' like in the 'extract email from text' Thread...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6381
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Need help with simple google result page extract

by green.pine on Wed Sep 27, 2017 2:21 pm

thank you for your helpful reply,
you have been helpful, the link to "find email on webpage" was very helpful,
it makes it easier now, I don't need to scrape the data into a file,
it is good to just have them in clipboard and use it.

it is working fine,
Thanks again
green.pine
 
Posts: 61
Joined: Thu Nov 04, 2010 10:21 am

Re: Need help with simple google result page extract

by chivracq on Wed Sep 27, 2017 2:24 pm

green.pine wrote:thank you for your helpful reply,
you have been helpful, the link to "find email on webpage" was very helpful,
it makes it easier now, I don't need to scrape the data into a file,
it is good to just have them in clipboard and use it.

it is working fine,
Thanks again

Thanks for the Feedback... Glad to help... :D
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6381
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 1 guest

-->