IMPORTANT: Help Needed! :-(

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
soulgeek
Posts: 16
Joined: Sat Jan 23, 2010 6:25 am
Contact:

IMPORTANT: Help Needed! :-(

Post by soulgeek » Sun Jan 24, 2010 10:06 am

Folks,

I bought Imacro Scripting Edition, few weeks ago, sad to say, am not able to make even a sigle macro properely.

My Requirement:
I have CSV with URLs, i want to search each in one site and collect few data abt each.

Problem: (Refer to Output Below)
For few URls data comign correct, but for few its picking some other value, am sure its something to do with some relative positioning or something.

My Coding:
VERSION BUILD=6861208

SET !EXTRACT_TEST_POPUP NO
SET !DATASOURCE urls.csv
SET !DATASOURCE_COLUMNS 6
SET !LOOP 2
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO=https://www.majesticseo.com
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:urlSearch ATTR=ID:search_text CONTENT={{!COL1}}
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:urlSearch ATTR=VALUE:Submit
TAG POS=2 TYPE=FONT ATTR=TXT:* EXTRACT=TXT
TAG POS=46 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
TAG POS=47 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
TAG POS=48 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
TAG POS=49 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=E:\imacro\Datasources FILE=urls.csv




'WAIT SECONDS=1
TAB T=1
TAB CLOSEALLOTHERS


Ourput:
http://www.freer.com 6-Sep-09 7 2,043 34
http://www.123.co.uk 6-Sep-09 10 1,704 1,299
http://www.abc.com # Search result Crawl date /First seen ACRank
http://www.wow.co.de # Search result Crawl date /First seen ACRank

May someone please help OR may Imacro Team Reproduce the problem and provide me the Correct Cdoes?

Wish you all Good Day,
Soulgeek
Hannes, Tech Support

Re: IMPORTANT: Help Needed! :-(

Post by Hannes, Tech Support » Mon Jan 25, 2010 9:17 am

Code: Select all

TAG POS=46 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
TAG POS=47 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
TAG POS=48 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
TAG POS=49 TYPE=SMALL ATTR=TXT:* EXTRACT=TXT
These high POS values look suspicious. What data do you want to extract on the result page?
It's probably "relative positioning" that will solve this issue.

(As a registered customer you may as well open a support ticket if you encounter problems using iMacros. However, as this issue might help other users as well, this forum is also a good place to put it.)
soulgeek
Posts: 16
Joined: Sat Jan 23, 2010 6:25 am
Contact:

Re: IMPORTANT: Help Needed! :-(

Post by soulgeek » Tue Jan 26, 2010 2:52 am

What data do you want to extract on the result page?
When u search form a URl, in the website i mentioned, the resulting page will have one block, called, "Search Results", All data that i want to pick are from #1, (Row no 1), The data that i want to pick are:

Domain name (Just the URL)
Crawl date /First seen
ACRank
External
Ref Domains

(Note: These data belong to #1, (row no 1) in the Block, (Search Results)), I want to pick data from #1 Only, For each URL.

Your are Correct, its the Relative Positioning thats is creating problem....

PLEASE, Tell me what codes shall i use to Achieve this?

With you a Great day,
Soulgeek
Hannes, Tech Support

Re: IMPORTANT: Help Needed! :-(

Post by Hannes, Tech Support » Tue Jan 26, 2010 9:15 am

Here you are:

Code: Select all

VERSION BUILD=6880125     
TAB T=1     
TAB CLOSEALLOTHERS     
URL GOTO=http://www.majesticseo.com/search.php?folder=&q=123.co.uk     
TAG POS=1 TYPE=B ATTR=TXT:Search<SP>results   
TAG POS=R1 TYPE=TD ATTR=TXT:1   
TAG POS=R1 TYPE=P ATTR=TXT:*    
TAG POS=R1 TYPE=A ATTR=TXT:* EXTRACT=HREF
TAG POS=R1 TYPE=P ATTR=TXT:*<SP>*<SP>20* EXTRACT=TXT   
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT   
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT   
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
Sidenote: you can speed up your solution slightly, if you don't use the search field on the main site, but directly use the URL to search for in the URL (as done above). Saves two steps of automation per link.
soulgeek
Posts: 16
Joined: Sat Jan 23, 2010 6:25 am
Contact:

Re: IMPORTANT: Help Needed! :-(

Post by soulgeek » Tue Jan 26, 2010 12:49 pm

Thanks...

Is there a way that i can put the output in particular Columns? Suppose, above i have 5 extractions, then how to put each in Col2, Col3, Col4, Col5, Col6?
soulgeek
Posts: 16
Joined: Sat Jan 23, 2010 6:25 am
Contact:

OPSsssssss.... Just Found one Big Problem!

Post by soulgeek » Tue Jan 26, 2010 1:25 pm

Hello Team Iopus,

I just found a big problem in my description of the requirement, i missed one thing...

There can be few URLs, for which there are no data and no output and this is where the macro Fails. :(

An Example:
Observe the Output... It does not output into desired result, which is why the Macro Fails..

We need to put some Error Handler here... How can we do that?

Will !ENDOFTHEPAGE and !TAGSOURCEINDEX help?

What addition in codes do we need....?

Good Day,
Soulgeek
Hannes, Tech Support

Re: IMPORTANT: Help Needed! :-(

Post by Hannes, Tech Support » Tue Jan 26, 2010 2:36 pm

Concerning your first question: if you use SAVEAS only after the 5 extractions, each of them will be put into a separate column as SAVEAS takes all extractions and puts them into a single line.

Concerning your second question: this is, where a script would come into play. It would call your macro, and in case the macro returns an error, it would skip that extraction.

If you want to stay with a plain macro, you may use !ERRORIGNORE to make the macro continue even if there is no result. Then, you may need to "clean up" the result file (i.e. remove empty lines).
soulgeek
Posts: 16
Joined: Sat Jan 23, 2010 6:25 am
Contact:

Thanks You Hannes!

Post by soulgeek » Tue Jan 26, 2010 5:16 pm

:)
Post Reply