Extracting text with no html TYPE

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
christinapitt76
Posts: 6
Joined: Wed Nov 11, 2015 6:58 pm

Extracting text with no html TYPE

Post by christinapitt76 » Wed Sep 28, 2016 8:15 pm

1. What version of iMacros are you using?
VERSION BUILD=8961227 RECORDER=FX


2. What operating system are you using? (please also specify language)
Windows 7 English

3. Which browser(s) are you using? (include version numbers)
Firefox 49.0.1

4. Do the included demo macros work ok?
Yes

5. If reporting a problem with the Scripting Interface, please also test if the included VBS sample scripts run ok.
N/A

6. Website:

Code: Select all

https://www.thredup.com/product/women-cotton-talbots-blue-long-sleeve-button-down-shirt/17919214
iMacros code I am using now:

Code: Select all

TAG POS=1 TYPE=H1 ATTR=CLASS:brand-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title<SP>item-size EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:price EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:compare-price EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:savings-percentage EXTRACT=txt
TAG POS=1 TYPE=DIV ATTR=CLASS:final-sale EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Description EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Measurements EXTRACT=TXT
>>> TAG POS=R1 TYPE=BR ATTR=TXT:* EXTRACT=HTM 
TAG POS=1 TYPE=STRONG ATTR=TXT:Materials EXTRACT=TXT
>>> TAG POS=R1 TYPE=TXT ATTR=TXT:* EXTRACT=HTM
TAG POS=1 TYPE=STRONG ATTR=TXT:Condition EXTRACT=TXT
>>> TAG POS=R1 TYPE=HTM ATTR=TXT:* EXTRACT=TXT

7. Do you encounter the same problem with the iMacros Browser, iMacros for Internet Explorer and iMacros for Firefox?
Yes

Problem I am having is grabbing the text for Measurements, Materials and Condition on the website because it is not in span, div, etc.
Source code is

Code: Select all

<strong>Description</strong><ul><li>Long sleeve</li><li>Blue</li><li>Solid</li></ul></div><div><strong>Measurements</strong><br><!-- react-text: 906 -->44" Chest, <!-- /react-text --><!-- react-text: 907 -->25" Length<!-- /react-text --></div><div><strong>Materials</strong><br><!-- react-text: 911 -->100% Cotton<!-- /react-text --></div><div><strong>Condition</strong><br><!-- react-text: 915 -->This item is gently used with minor signs of wear (minor stain).<!-- /react-text --></div>   
the number after

Code: Select all

<!-- react-text:
changes with every page so I can't use it
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting text with no html TYPE

Post by chivracq » Wed Sep 28, 2016 11:20 pm

christinapitt76 wrote:1. What version of iMacros are you using?
VERSION BUILD=8961227 RECORDER=FX

2. What operating system are you using? (please also specify language)
Windows 7 English

3. Which browser(s) are you using? (include version numbers)
Firefox 49.0.1

4. Do the included demo macros work ok?
Yes

5. If reporting a problem with the Scripting Interface, please also test if the included VBS sample scripts run ok.
N/A

6. Website:

Code: Select all

https://www.thredup.com/product/women-cotton-talbots-blue-long-sleeve-button-down-shirt/17919214
iMacros code I am using now:

Code: Select all

TAG POS=1 TYPE=H1 ATTR=CLASS:brand-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title<SP>item-size EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:price EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:compare-price EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:savings-percentage EXTRACT=txt
TAG POS=1 TYPE=DIV ATTR=CLASS:final-sale EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Description EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Measurements EXTRACT=TXT
>>> TAG POS=R1 TYPE=BR ATTR=TXT:* EXTRACT=HTM 
TAG POS=1 TYPE=STRONG ATTR=TXT:Materials EXTRACT=TXT
>>> TAG POS=R1 TYPE=TXT ATTR=TXT:* EXTRACT=HTM
TAG POS=1 TYPE=STRONG ATTR=TXT:Condition EXTRACT=TXT
>>> TAG POS=R1 TYPE=HTM ATTR=TXT:* EXTRACT=TXT
7. Do you encounter the same problem with the iMacros Browser, iMacros for Internet Explorer and iMacros for Firefox?
Yes

Problem I am having is grabbing the text for Measurements, Materials and Condition on the website because it is not in span, div, etc.
Source code is

Code: Select all

<strong>Description</strong><ul><li>Long sleeve</li><li>Blue</li><li>Solid</li></ul></div><div><strong>Measurements</strong><br><!-- react-text: 906 -->44" Chest, <!-- /react-text --><!-- react-text: 907 -->25" Length<!-- /react-text --></div><div><strong>Materials</strong><br><!-- react-text: 911 -->100% Cotton<!-- /react-text --></div><div><strong>Condition</strong><br><!-- react-text: 915 -->This item is gently used with minor signs of wear (minor stain).<!-- /react-text --></div>   
the number after

Code: Select all

<!-- react-text:
changes with every page so I can't use it
Hum, funny to see that you registered about 1 year ago on the Forum but only today need to open a Thread, ah-ah...!
At least you are using the Forum perfectly...! Compliment.
Hum..., and when mentioning your FCI, you can usually use this simplified Format:

Code: Select all

iMacros for FF v8.9.6, FF49, Win7_Eng
OK, I tried to have a look at your Script and at this "very stupid" Site (starting with 6Mb per Page...!), but as soon as I start clicking anywhere on the Page to do some Recording, I get redirected to some "International" Page telling me that their "Service" is not available for my Country (= NL), based on my IP-Address, which is completely stupid as "you" could be on holiday or simply traveling abroad and still want to access their Site from your Laptop or Smartphone (well good luck with Data-Roaming with Pages of 6Mb per Page...!) and I don't have the time to go fighting against their Cookies and 3Mb of JavaScript Scripts or switch to some Proxy (for which Country are they meant btw, I can't tell directly from the ".com"...?), so could you upload an 'HTML Only' Saveas of the Page to your Thread (zipped, Max 256Kb)?
And do it maybe for 1 or 2 other Pages as well if you intend to reuse the same Script for other Pages, so that I can see the Differences between different Pages.
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
christinapitt76
Posts: 6
Joined: Wed Nov 11, 2015 6:58 pm

Re: Extracting text with no html TYPE

Post by christinapitt76 » Thu Sep 29, 2016 2:00 pm

Wow! Thanks for the super quick reply and compliment :D
I attached three pages in zip to this reply
Attachments
original.zip
requested save as pages
(71.94 KiB) Downloaded 276 times
christinapitt76
Posts: 6
Joined: Wed Nov 11, 2015 6:58 pm

Re: Extracting text with no html TYPE

Post by christinapitt76 » Mon Oct 10, 2016 2:16 pm

Any suggestions on how to do this?
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting text with no html TYPE

Post by chivracq » Mon Oct 10, 2016 4:21 pm

christinapitt76 wrote:Any suggestions on how to do this?
Oh yeah, sorry, I had tried to have a look at your Attachment when you had uploaded it but the Data you are interested in is not included in it, check your Saveas Offline before uploading it. and I was a bit busy during last week being away at some festival for 4 days and new Threads came to the Forum...
The Data you want is "more or less" present as Vars in some JavaScript Script in the Source, reused to dynamically build the HTML Form, at least I had the feeling to recognize some Fields/Values, as I only had a few seconds to see how your Page looked like the first time I tried to have a look at it, before it got automatically redirected, but pff..., it would be a bit cumbersome to want to extract it this way...
Last edited by chivracq on Tue Oct 11, 2016 4:11 pm, edited 1 time in total.
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
christinapitt76
Posts: 6
Joined: Wed Nov 11, 2015 6:58 pm

Re: Extracting text with no html TYPE

Post by christinapitt76 » Tue Oct 11, 2016 2:46 pm

Sorry about that. I did a save as from firefox, my main browser. This new attachment is save as html only from chrome. Hopefully this has what you need. And really really really thank you for even responding!! You are much better than my other experience with a paid online software product (sobipro if you are wondering)
Attachments
down-from-chrome.zip
requested save as html only, this time from chrome, not firefox
(76.83 KiB) Downloaded 269 times
iimfun
Posts: 239
Joined: Tue Jul 19, 2016 1:06 pm

Re: Extracting text with no html TYPE

Post by iimfun » Tue Oct 18, 2016 8:37 am

I don't know why nobody could help you so far. Therefore try the following code of mine

Code: Select all

TAG POS=1 TYPE=H1 ATTR=CLASS:brand-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title<SP>item-size EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:price EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:compare-price EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:savings-percentage EXTRACT=txt
TAG POS=1 TYPE=DIV ATTR=CLASS:final-sale EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Description EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT

SET tmpEXTRACT {{!EXTRACT}}
SET !EXTRACT NULL

TAG POS=1 TYPE=STRONG ATTR=TXT:Description
SET eMes Measurements
TAG POS=R1 TYPE=DIV ATTR=TXT:{{eMes}}* EXTRACT=TXT
SET eMes EVAL("'{{!EXTRACT}}'.replace('{{eMes}}', '').trim();")
SET !EXTRACT NULL

SET eMat Materials
TAG POS=R1 TYPE=DIV ATTR=TXT:{{eMat}}* EXTRACT=TXT
SET eMat EVAL("'{{!EXTRACT}}'.replace('{{eMat}}', '').trim();")
SET !EXTRACT NULL

SET eCon Condition
TAG POS=R1 TYPE=DIV ATTR=TXT:{{eCon}}* EXTRACT=TXT
SET eCon EVAL("'{{!EXTRACT}}'.replace('{{eCon}}', '').trim();")

SET !EXTRACT {{tmpEXTRACT}}[EXTRACT]{{eMes}}[EXTRACT]{{eMat}}[EXTRACT]{{eCon}}
christinapitt76
Posts: 6
Joined: Wed Nov 11, 2015 6:58 pm

Re: Extracting text with no html TYPE

Post by christinapitt76 » Tue Oct 18, 2016 6:46 pm

Thanks! That works perfectly!!
Post Reply