Meta name Description, Keywords...to extract

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Meta name Description, Keywords...to extract

Post by Green Thumbs » Wed Jul 01, 2009 2:48 am

Hello,

Good application this IMacro!
I was so far able to get most of what I needed to automate but I am stuck for this one.

I need to extract the Meta name tags in the headers web page: Description, keyword, location...ETC.
I was able to extract the title and the URL of the visited site and save to a CSV file. So far so good for a new cummer!

Here is an example of the tags in the header:

Code: Select all

<meta name="Description" content="Description I would like to extract">
<meta name="Keywords" content="Keywords I would like to extract">
I have tried several ways, but none worked so far. I got wrong format for the tag or the "#EANF#" error code on any of the trials I did.

I did look on the Forum and found some related posts, but no solutions were given, even if possible.

Please help!
Many thanks
Hannes, Tech Support

Re: Meta name Description, Keywords...to extract

Post by Hannes, Tech Support » Thu Jul 02, 2009 10:19 am

Hi,

You will need a script, here.

1) Extract the tags' code, using EXTRACT=HTM (http://wiki.imacros.net/TAG#The_EXTRACT_Parameter).

2) Get the extraction via iimGetLastExtract() and perform any string manipulation your scripting/programming language provides
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Re: Meta name Description, Keywords...to extract

Post by Green Thumbs » Fri Jul 03, 2009 1:14 am

Hello,

Was out for a few days, too nice to be inside!
Thanks, I will try this and let you know.

Merci
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Re: Meta name Description, Keywords...to extract

Post by Green Thumbs » Fri Jul 03, 2009 3:23 pm

OK...That's out of my league so far!
I am very limited in scripting/programing. I learned so far iMacro by example and trial and errors, but ran out of luck....

I have tried all permutations and still getting "SyntaxError: wrong format of TAG command" or the #EANF# in the EXTRACT results.

Here is what I've got:

Code: Select all

VERSION BUILD=4201129     
TAB T=1     
ADD !EXTRACT {{!URLCURRENT}}
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT
TAG POS=1 TYPE=META:description ATTR=CONTENT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\Users\Marc\URLEXTRACT\ FILE=URL.csv
The last TAG scripting is the one that got me the furthest without syntax error but does not get any results.

Just even a hint would be appreciated.

Merci (Thanks)
Hannes, Tech Support

Re: Meta name Description, Keywords...to extract

Post by Hannes, Tech Support » Mon Jul 06, 2009 6:33 am

Green Thumbs wrote:

Code: Select all

TAG POS=1 TYPE=META:description ATTR=CONTENT:* EXTRACT=TXT
The last TAG scripting is the one that got me the furthest without syntax error but does not get any results.
The EXTRACT=TXT parameter tells the macro to extract the text that is enclosed by this HTML tag, so if the HTML read "<h1>heading</h1>", the string "heading" would be extracted. But what you need is part of the HTML tag itself. This data is only accessible through EXTRACT=HTM:

Code: Select all

TAG POS=1 TYPE=META ATTR=CONTENT:Description* EXTRACT=HTM
Which, however, means that the full HTML tag is extracted, and not just the part you need. That's why using a script would come in handy, as there you can "cut off" anything that you are not interested in.
pinktoes
Posts: 3
Joined: Thu Oct 20, 2011 12:52 am

Re: Meta name Description, Keywords...to extract

Post by pinktoes » Fri Oct 21, 2011 7:51 pm

Green Thumbs-

Did you ever get this resolved?

I am encountering the very same thing:

TAG POS=1 TYPE=META:description ATTR=CONTENT:* EXTRACT=TXT

returns a value of: #EANF#


Any help is really appreciated!
~Lori
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Re: Meta name Description, Keywords...to extract

Post by Green Thumbs » Sat Oct 22, 2011 9:25 pm

Hello Pinktoes,

No unfortunately I did not get any further. I ad to move further and put that aside... 3 years...might as well say that I forgot about it. It is still there waiting....

If you do find a script to do this, let me know!

Good luck!
Jay527
Posts: 1
Joined: Thu Mar 28, 2019 9:55 pm

Re: Meta name Description, Keywords...to extract

Post by Jay527 » Thu Mar 28, 2019 9:59 pm

I registered in 2019 to post a reply to this thread.

For description, you can use:
TAG POS=1 TYPE=META ATTR=Property:*description* EXTRACT=HTM

For TITLE, the usual:
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT

Thanks.
Post Reply