Trouble extracting data

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
erikcw
Posts: 34
Joined: Sun Nov 13, 2005 5:08 pm

Trouble extracting data

Post by erikcw » Mon Nov 21, 2005 11:07 pm

Hi,

I'm trying to extract keywords for my google adwords campaign from the google keyword suggestion tool.

I only want the keywords listed under "more specific keywords"

I've tried several macros, but can't get it to work right.
VERSION BUILD=5010115
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=https://adwords.google.com/select/KeywordSandbox
SIZE X=977 Y=883
TAG POS=1 TYPE=TEXTAREA FORM=ACTION:/select/KeywordSandbox ATTR=NAME:keywords CONTENT=mortgage
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ACTION:/select/KeywordSandbox ATTR=NAME:submit&&VALUE:Show<SP>matching<SP>queries<SP>and<SP>alternatives
EXTRACT POS=1 TYPE=TXT ATTR=<UL>*
This one extracts the correct data, but it is all lumped into one "paragraph". I need each keyword seperated by a newline, or some other marker (like [EXTRACT]) so I can use a script to organize the data.

I also tried
EXTRACT POS={{LOOP}} TYPE=TXT ATTR=<LI>*
But this has 2 problems.
1) It's hard to tell how many loops should be used - because the google page returns a different number of keywords every time (could be 12, could be 200)
2) If I loop to many times, the macro starts extracting the "Expanded Broad Matches" section of the page. (This is why looping till EXTRACT returns #EANF# won't work in this case)

The ideal solution would be to use something like ATTR=<UL>* - but with a seperator after each <LI>.

Thanks for your help!
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Tue Nov 22, 2005 11:05 pm

We did some tests on this, but we are not sure if we are looking at the same page. Can you send us an example screenshot to support2 AT iopus.com ?

In our adwords test account, we see no "Extended broad matches" section.

But there is a "download as CSV file" option - would this work for you?
erikcw
Posts: 34
Joined: Sun Nov 13, 2005 5:08 pm

Post by erikcw » Tue Nov 22, 2005 11:08 pm

I'm not using the "logged in" adwords keyword interface. I'm using the one you don't have to login for.

https://adwords.google.com/select/KeywordSandbox

I want the keywords that appear in the left column - not the "expanded" words in the right column.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Wed Nov 23, 2005 12:01 am

Your method #2 (loop over <LI>) will work well with the following addition:

In order to know when you reach the Expanded Broad Matches section do the following:

1. Extract the first Expanded Broad Matches keyword using Relative Extraction ( http://www.iopus.com/iim/help/extract_relative.htm ), see the macro below. Assume we store this value in the keyword_broad_match variable.

2. You can use this keyword to stop at the broad matches section:

if keyword = keyword_broad_match then msgbox "All regular keywords extracted!"

Code: Select all

VERSION BUILD=5010115     
TAB T=1     
TAB CLOSEALLOTHERS     
URL GOTO=https://adwords.google.com/select/KeywordSandbox     
TAG POS=1 TYPE=STRONG ATTR=TXT:Expanded<SP>Broad<SP>Matches*
EXTRACT POS=R1 TYPE=TXT ATTR=<LI>*  
erikcw
Posts: 34
Joined: Sun Nov 13, 2005 5:08 pm

Post by erikcw » Thu Dec 01, 2005 12:29 am

Still running into problems.

On some querys, google anly returns one set of results, the ones on the left. Other times, two results, but not "Expanded Broad Match".

Is it possible to somehow do positioning reletive to </ul>? (ie grap the last keyword in the list, and then check each extraction against that in scripting?

Thanks!
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Thu Dec 01, 2005 3:25 pm

Can you post an example page where the above macro fails?
Post Reply