Trouble extracting data

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Trouble extracting data

by erikcw on Mon Nov 21, 2005 4:07 pm

Hi,

I'm trying to extract keywords for my google adwords campaign from the google keyword suggestion tool.

I only want the keywords listed under "more specific keywords"

I've tried several macros, but can't get it to work right.

VERSION BUILD=5010115
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=https://adwords.google.com/select/KeywordSandbox
SIZE X=977 Y=883
TAG POS=1 TYPE=TEXTAREA FORM=ACTION:/select/KeywordSandbox ATTR=NAME:keywords CONTENT=mortgage
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ACTION:/select/KeywordSandbox ATTR=NAME:submit&&VALUE:Show<SP>matching<SP>queries<SP>and<SP>alternatives
EXTRACT POS=1 TYPE=TXT ATTR=<UL>*


This one extracts the correct data, but it is all lumped into one "paragraph". I need each keyword seperated by a newline, or some other marker (like [EXTRACT]) so I can use a script to organize the data.

I also tried
EXTRACT POS={{LOOP}} TYPE=TXT ATTR=<LI>*


But this has 2 problems.
1) It's hard to tell how many loops should be used - because the google page returns a different number of keywords every time (could be 12, could be 200)
2) If I loop to many times, the macro starts extracting the "Expanded Broad Matches" section of the page. (This is why looping till EXTRACT returns #EANF# won't work in this case)

The ideal solution would be to use something like ATTR=<UL>* - but with a seperator after each <LI>.

Thanks for your help!
erikcw
 
Posts: 34
Joined: Sun Nov 13, 2005 10:08 am

by Tech Support on Tue Nov 22, 2005 4:05 pm

We did some tests on this, but we are not sure if we are looking at the same page. Can you send us an example screenshot to support2 AT iopus.com ?

In our adwords test account, we see no "Extended broad matches" section.

But there is a "download as CSV file" option - would this work for you?
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

by erikcw on Tue Nov 22, 2005 4:08 pm

I'm not using the "logged in" adwords keyword interface. I'm using the one you don't have to login for.

https://adwords.google.com/select/KeywordSandbox

I want the keywords that appear in the left column - not the "expanded" words in the right column.
erikcw
 
Posts: 34
Joined: Sun Nov 13, 2005 10:08 am

by Tech Support on Tue Nov 22, 2005 5:01 pm

Your method #2 (loop over <LI>) will work well with the following addition:

In order to know when you reach the Expanded Broad Matches section do the following:

1. Extract the first Expanded Broad Matches keyword using Relative Extraction ( http://www.iopus.com/iim/help/extract_relative.htm ), see the macro below. Assume we store this value in the keyword_broad_match variable.

2. You can use this keyword to stop at the broad matches section:

if keyword = keyword_broad_match then msgbox "All regular keywords extracted!"

Code: Select all
VERSION BUILD=5010115     
TAB T=1     
TAB CLOSEALLOTHERS     
URL GOTO=https://adwords.google.com/select/KeywordSandbox     
TAG POS=1 TYPE=STRONG ATTR=TXT:Expanded<SP>Broad<SP>Matches*
EXTRACT POS=R1 TYPE=TXT ATTR=<LI>* 
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm

by erikcw on Wed Nov 30, 2005 5:29 pm

Still running into problems.

On some querys, google anly returns one set of results, the ones on the left. Other times, two results, but not "Expanded Broad Match".

Is it possible to somehow do positioning reletive to </ul>? (ie grap the last keyword in the list, and then check each extraction against that in scripting?

Thanks!
erikcw
 
Posts: 34
Joined: Sun Nov 13, 2005 10:08 am

by Tech Support on Thu Dec 01, 2005 8:25 am

Can you post an example page where the above macro fails?
User avatar
Tech Support
 
Posts: 5003
Joined: Tue Sep 20, 2005 12:25 pm


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 3 guests

-->