IMDB Text Extraction Problem

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
remisansfamille
Posts: 1
Joined: Tue May 23, 2006 12:07 am

IMDB Text Extraction Problem

Post by remisansfamille » Tue May 23, 2006 12:24 am

I want to extract data from the online imdb database. I am currently unable to extract simple text information.

For example, if you take the S.W.A.T information page on IMDB.com available at the following address :

http://www.imdb.com/title/tt0257076/combined

I want to extract the Tagline: information which is in this case "Even cops dial 911 ". The Content Extraction Wizard suggests me the following extraction expression:

EXTRACT POS=2 TYPE=TXT ATTR=<B<SP>class=ch>*

which only extract the following string "Tagline:". If I try the following expression in order to take into account:

EXTRACT POS=1 TYPE=TXT ATTR=<B<SP>class=ch><SP>*

I get an anchor not found message. Even if I try to extract the HTM, I also get an anchor not found message. I really miss the point here. If anyone can help me, I gratefully thank him in advance.


All the best, remisansfamille.
Post Reply