Extracting between two comment tags

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
Sasha

Extracting between two comment tags

Post by Sasha » Sun Dec 04, 2005 5:31 pm

Hey everyone, im having a little issue trying to extract a few articles from a website. I want to extract the html with it including the <p values and so forth from a certain location. The code goes like this for the article section :

Code: Select all


<!-- google_ad_section_start -->

<p>Article here it uses paragraph text</p> but does this multiple times for each article its different)

<!-- google_ad_section_end -->

so in the end result I would like to extract in between the two google ad section comments including the html but I think because there is line breaks its has trouble reading it or picking it up

I tried

<!-- google_ad_section_start -->*<!-- google_ad_section_end -->

and a few other variations but it doesnt seem to work. Any assistance would be greatly apprecaited.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Mon Dec 05, 2005 4:02 pm

In the EXTRACT command, you can use TYPE=HTM. This will preserve all HTML formatting. Does this solve this problem?
Sasha

No not really

Post by Sasha » Mon Dec 05, 2005 11:14 pm

No it really does not, I tried that many times, but it wont extract it because there are line spaces in between... so what happens is it wont read it. I think the program only reads it if the html is continuous... and line continuing but not multiple lines of html.
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Tue Dec 06, 2005 3:07 pm

Can you please post a link to the complete web page? Or email the html code of the page to support2 AT iopus.com
Ann
Post Reply