Is this possible?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
acematti
Posts: 4
Joined: Wed Jun 18, 2008 8:25 am

Is this possible?

Post by acematti » Wed Jun 18, 2008 8:43 am

Hi there im pretty new to this so please bare with me.

I have a page with lots of email addresses and want to be able to extract them. I have had a play around and am unable to currently automate the extraction process. However the pages are set out with sub pages. Here is a quick break down of how they are set out.

index.htm:

This lists the companies like so (all being links):

Company 1
Company 2
Company 3

The code looks something like this:

Code: Select all

<br />
<a href="/directory_members_list.asp?CompanyID=1" target="_parent">Company 1</a><br />
        
<br />
<a href="/directory_members_list.asp?CompanyID=2" target="_parent">Company 2</a><br />
On clicking Company 1 it takes me to directory_members_list.asp

Which looks like this:

Company
Company 1 More Information
Foo Bar Manufacturer

More information is a link that takes me to directory_members_details.asp.

This page displays all of the information about the company.

The particular bit that I would like is:

Email: company1@company1.com

I worked out how to get this information by using relative to tag as Email is always on the page.

I have been able to record the extraction of 1 email address successfully however I have not been able to make it so that on the index.html it moves to the next link in the list then clicks on the pages to get to the emails. This automation is the bit that I am struggling on. I realise it may be harder due to none of the data being in tables.

Any help or advice would be greatly received and if you need any further details please just give me a shout.

Cheers in advance

Matt
jackmacokc
Posts: 9
Joined: Tue Jun 17, 2008 7:07 pm

Re: Is this possible?

Post by jackmacokc » Wed Jun 18, 2008 12:47 pm

You would need to write a script in order to iterate through all the links and emails like you're wanting to do.

And the data not being in tables really doesn't impact anything. You need to think of a way to iterate through those links, either reading the list of links into an array and iterating through that, or if the company names and/or links have ID numbers that are sequential you could attack it that way as well.

Hope that helps.
acematti
Posts: 4
Joined: Wed Jun 18, 2008 8:25 am

Re: Is this possible?

Post by acematti » Wed Jun 18, 2008 1:10 pm

Hi Thanks for helping its starting to make sense more now.

I think I understand it a little better now.

So the first thing I need to do on Index.html is read all the links on the page into an array. Then go through each one stepping through the two different pages.

Right I will try and get a script going along those lines. I think i understand it more now.
acematti
Posts: 4
Joined: Wed Jun 18, 2008 8:25 am

Re: Is this possible?

Post by acematti » Wed Jun 18, 2008 2:47 pm

OK things are moving on nicely now however I seem to be hitting a problem with my trying to grab all of the links on the index.htm

I have just noticed that they all start after a

Code: Select all

<br class="clear" />
This is the first and only instance of the tag on the whole of the index so I was hoping to try and base my extraction on this.

I started to record a macro and used the extraction wizard to test it. I put in the ATTR box of the wizard: ATTR:<br class="clear" />

However i just got an error Not Found (#EANF#) which I guess means im am doing something incorrect.

I have studied the tutorials and screen demos on the extraction however I cant seem to find anything relating to this. I will keep looking on the forum however if anyone has any ideas please fire away.

Thanks again

Matt
acematti
Posts: 4
Joined: Wed Jun 18, 2008 8:25 am

Re: Is this possible?

Post by acematti » Wed Jun 18, 2008 3:26 pm

I have had more success with this now.

I have managed to get the first item out of the list in a script by checking the starting part of the URL.

Cheers guys
Post Reply