Extracting a URL from a simple page

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information:CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
Chib
Posts: 13
Joined: Wed Mar 04, 2009 3:56 am

Extracting a URL from a simple page

Post by Chib » Wed Mar 04, 2009 4:10 am

The entirety of the page consists of the following:

Code: Select all

Your URL is: 
<a href='http://x.com/abc'>http://x.com/abc</a> 
Note that's the complete source of the page, there is no <html>, <body> or anything like that.

The URL there changes depending on what is put into the previous page, I can't seem to find a way to extract that URL, I was wondering if anyone could help?
dharmendra2000
Posts: 214
Joined: Fri Jul 04, 2008 1:28 pm
Location: Ahmedabad
Contact:

Re: Extracting a URL from a simple page

Post by dharmendra2000 » Thu Mar 05, 2009 8:25 am

You can extract URL using below command.......
EXTRACT {{!URLCURRENT}}
Chib
Posts: 13
Joined: Wed Mar 04, 2009 3:56 am

Re: Extracting a URL from a simple page

Post by Chib » Fri Mar 06, 2009 6:03 am

That gets the URL of the page you're on (not what I want).

I want the URL that is shown in the source code I pasted
BrianR
Posts: 126
Joined: Wed Jun 11, 2008 4:13 pm

Re: Extracting a URL from a simple page

Post by BrianR » Fri Mar 06, 2009 1:51 pm

The quick answer is when you extract the TAG, use the HREF instead of the TXT.


EXTRACT=HREF

Here is an example from one of my macros. I am running a script and passing the next line to retrieve the link address.

Code: Select all

'----------------------
'Website Address
'----------------------
TAG POS={{NAMEPOS}}	 TYPE=A ATTR=CLASS:pagebody&&TXT:* EXTRACT=HREF 
Post Reply