Meta name Description, Keywords...to extract

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Meta name Description, Keywords...to extract

Post by Green Thumbs » Wed Jul 01, 2009 2:48 am

Hello,

Good application this IMacro!
I was so far able to get most of what I needed to automate but I am stuck for this one.

I need to extract the Meta name tags in the headers web page: Description, keyword, location...ETC.
I was able to extract the title and the URL of the visited site and save to a CSV file. So far so good for a new cummer!

Here is an example of the tags in the header:

Code: Select all

<meta name="Description" content="Description I would like to extract">
<meta name="Keywords" content="Keywords I would like to extract">
I have tried several ways, but none worked so far. I got wrong format for the tag or the "#EANF#" error code on any of the trials I did.

I did look on the Forum and found some related posts, but no solutions were given, even if possible.

Please help!
Many thanks
Hannes, Tech Support

Re: Meta name Description, Keywords...to extract

Post by Hannes, Tech Support » Thu Jul 02, 2009 10:19 am

Hi,

You will need a script, here.

1) Extract the tags' code, using EXTRACT=HTM (http://wiki.imacros.net/TAG#The_EXTRACT_Parameter).

2) Get the extraction via iimGetLastExtract() and perform any string manipulation your scripting/programming language provides
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Re: Meta name Description, Keywords...to extract

Post by Green Thumbs » Fri Jul 03, 2009 1:14 am

Hello,

Was out for a few days, too nice to be inside!
Thanks, I will try this and let you know.

Merci
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Re: Meta name Description, Keywords...to extract

Post by Green Thumbs » Fri Jul 03, 2009 3:23 pm

OK...That's out of my league so far!
I am very limited in scripting/programing. I learned so far iMacro by example and trial and errors, but ran out of luck....

I have tried all permutations and still getting "SyntaxError: wrong format of TAG command" or the #EANF# in the EXTRACT results.

Here is what I've got:

Code: Select all

VERSION BUILD=4201129     
TAB T=1     
ADD !EXTRACT {{!URLCURRENT}}
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT
TAG POS=1 TYPE=META:description ATTR=CONTENT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\Users\Marc\URLEXTRACT\ FILE=URL.csv
The last TAG scripting is the one that got me the furthest without syntax error but does not get any results.

Just even a hint would be appreciated.

Merci (Thanks)
Hannes, Tech Support

Re: Meta name Description, Keywords...to extract

Post by Hannes, Tech Support » Mon Jul 06, 2009 6:33 am

Green Thumbs wrote:

Code: Select all

TAG POS=1 TYPE=META:description ATTR=CONTENT:* EXTRACT=TXT
The last TAG scripting is the one that got me the furthest without syntax error but does not get any results.
The EXTRACT=TXT parameter tells the macro to extract the text that is enclosed by this HTML tag, so if the HTML read "<h1>heading</h1>", the string "heading" would be extracted. But what you need is part of the HTML tag itself. This data is only accessible through EXTRACT=HTM:

Code: Select all

TAG POS=1 TYPE=META ATTR=CONTENT:Description* EXTRACT=HTM
Which, however, means that the full HTML tag is extracted, and not just the part you need. That's why using a script would come in handy, as there you can "cut off" anything that you are not interested in.
pinktoes
Posts: 3
Joined: Thu Oct 20, 2011 12:52 am

Re: Meta name Description, Keywords...to extract

Post by pinktoes » Fri Oct 21, 2011 7:51 pm

Green Thumbs-

Did you ever get this resolved?

I am encountering the very same thing:

TAG POS=1 TYPE=META:description ATTR=CONTENT:* EXTRACT=TXT

returns a value of: #EANF#


Any help is really appreciated!
~Lori
Green Thumbs
Posts: 4
Joined: Wed Jul 01, 2009 2:20 am

Re: Meta name Description, Keywords...to extract

Post by Green Thumbs » Sat Oct 22, 2011 9:25 pm

Hello Pinktoes,

No unfortunately I did not get any further. I ad to move further and put that aside... 3 years...might as well say that I forgot about it. It is still there waiting....

If you do find a script to do this, let me know!

Good luck!
Jay527
Posts: 1
Joined: Thu Mar 28, 2019 9:55 pm

Re: Meta name Description, Keywords...to extract

Post by Jay527 » Thu Mar 28, 2019 9:59 pm

I registered in 2019 to post a reply to this thread.

For description, you can use:
TAG POS=1 TYPE=META ATTR=Property:*description* EXTRACT=HTM

For TITLE, the usual:
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT

Thanks.
Post Reply