Extracting text with no html TYPE

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting text with no html TYPE

by christinapitt76 on Wed Sep 28, 2016 1:15 pm

1. What version of iMacros are you using?
VERSION BUILD=8961227 RECORDER=FX


2. What operating system are you using? (please also specify language)
Windows 7 English

3. Which browser(s) are you using? (include version numbers)
Firefox 49.0.1

4. Do the included demo macros work ok?
Yes

5. If reporting a problem with the Scripting Interface, please also test if the included VBS sample scripts run ok.
N/A

6. Website:
Code: Select all
https://www.thredup.com/product/women-cotton-talbots-blue-long-sleeve-button-down-shirt/17919214

iMacros code I am using now:
Code: Select all
TAG POS=1 TYPE=H1 ATTR=CLASS:brand-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title<SP>item-size EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:price EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:compare-price EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:savings-percentage EXTRACT=txt
TAG POS=1 TYPE=DIV ATTR=CLASS:final-sale EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Description EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Measurements EXTRACT=TXT
>>> TAG POS=R1 TYPE=BR ATTR=TXT:* EXTRACT=HTM
TAG POS=1 TYPE=STRONG ATTR=TXT:Materials EXTRACT=TXT
>>> TAG POS=R1 TYPE=TXT ATTR=TXT:* EXTRACT=HTM
TAG POS=1 TYPE=STRONG ATTR=TXT:Condition EXTRACT=TXT
>>> TAG POS=R1 TYPE=HTM ATTR=TXT:* EXTRACT=TXT



7. Do you encounter the same problem with the iMacros Browser, iMacros for Internet Explorer and iMacros for Firefox?
Yes

Problem I am having is grabbing the text for Measurements, Materials and Condition on the website because it is not in span, div, etc.
Source code is
Code: Select all
<strong>Description</strong><ul><li>Long sleeve</li><li>Blue</li><li>Solid</li></ul></div><div><strong>Measurements</strong><br><!-- react-text: 906 -->44" Chest, <!-- /react-text --><!-- react-text: 907 -->25" Length<!-- /react-text --></div><div><strong>Materials</strong><br><!-- react-text: 911 -->100% Cotton<!-- /react-text --></div><div><strong>Condition</strong><br><!-- react-text: 915 -->This item is gently used with minor signs of wear (minor stain).<!-- /react-text --></div>   


the number after
Code: Select all
<!-- react-text:
changes with every page so I can't use it
christinapitt76
 
Posts: 6
Joined: Wed Nov 11, 2015 11:58 am

Re: Extracting text with no html TYPE

by chivracq on Wed Sep 28, 2016 4:20 pm

christinapitt76 wrote:1. What version of iMacros are you using?
VERSION BUILD=8961227 RECORDER=FX

2. What operating system are you using? (please also specify language)
Windows 7 English

3. Which browser(s) are you using? (include version numbers)
Firefox 49.0.1

4. Do the included demo macros work ok?
Yes

5. If reporting a problem with the Scripting Interface, please also test if the included VBS sample scripts run ok.
N/A

6. Website:
Code: Select all
https://www.thredup.com/product/women-cotton-talbots-blue-long-sleeve-button-down-shirt/17919214

iMacros code I am using now:
Code: Select all
TAG POS=1 TYPE=H1 ATTR=CLASS:brand-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title<SP>item-size EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:price EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:compare-price EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:savings-percentage EXTRACT=txt
TAG POS=1 TYPE=DIV ATTR=CLASS:final-sale EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Description EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Measurements EXTRACT=TXT
>>> TAG POS=R1 TYPE=BR ATTR=TXT:* EXTRACT=HTM
TAG POS=1 TYPE=STRONG ATTR=TXT:Materials EXTRACT=TXT
>>> TAG POS=R1 TYPE=TXT ATTR=TXT:* EXTRACT=HTM
TAG POS=1 TYPE=STRONG ATTR=TXT:Condition EXTRACT=TXT
>>> TAG POS=R1 TYPE=HTM ATTR=TXT:* EXTRACT=TXT


7. Do you encounter the same problem with the iMacros Browser, iMacros for Internet Explorer and iMacros for Firefox?
Yes

Problem I am having is grabbing the text for Measurements, Materials and Condition on the website because it is not in span, div, etc.
Source code is
Code: Select all
<strong>Description</strong><ul><li>Long sleeve</li><li>Blue</li><li>Solid</li></ul></div><div><strong>Measurements</strong><br><!-- react-text: 906 -->44" Chest, <!-- /react-text --><!-- react-text: 907 -->25" Length<!-- /react-text --></div><div><strong>Materials</strong><br><!-- react-text: 911 -->100% Cotton<!-- /react-text --></div><div><strong>Condition</strong><br><!-- react-text: 915 -->This item is gently used with minor signs of wear (minor stain).<!-- /react-text --></div>   


the number after
Code: Select all
<!-- react-text:
changes with every page so I can't use it

Hum, funny to see that you registered about 1 year ago on the Forum but only today need to open a Thread, ah-ah...!
At least you are using the Forum perfectly...! Compliment.
Hum..., and when mentioning your FCI, you can usually use this simplified Format:
Code: Select all
iMacros for FF v8.9.6, FF49, Win7_Eng


OK, I tried to have a look at your Script and at this "very stupid" Site (starting with 6Mb per Page...!), but as soon as I start clicking anywhere on the Page to do some Recording, I get redirected to some "International" Page telling me that their "Service" is not available for my Country (= NL), based on my IP-Address, which is completely stupid as "you" could be on holiday or simply traveling abroad and still want to access their Site from your Laptop or Smartphone (well good luck with Data-Roaming with Pages of 6Mb per Page...!) and I don't have the time to go fighting against their Cookies and 3Mb of JavaScript Scripts or switch to some Proxy (for which Country are they meant btw, I can't tell directly from the ".com"...?), so could you upload an 'HTML Only' Saveas of the Page to your Thread (zipped, Max 256Kb)?
And do it maybe for 1 or 2 other Pages as well if you intend to reuse the same Script for other Pages, so that I can see the Differences between different Pages.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5846
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting text with no html TYPE

by christinapitt76 on Thu Sep 29, 2016 7:00 am

Wow! Thanks for the super quick reply and compliment :D
I attached three pages in zip to this reply
Attachments
original.zip
requested save as pages
(71.94 KIB) Downloaded 52 times
christinapitt76
 
Posts: 6
Joined: Wed Nov 11, 2015 11:58 am

Re: Extracting text with no html TYPE

by christinapitt76 on Mon Oct 10, 2016 7:16 am

Any suggestions on how to do this?
christinapitt76
 
Posts: 6
Joined: Wed Nov 11, 2015 11:58 am

Re: Extracting text with no html TYPE

by chivracq on Mon Oct 10, 2016 9:21 am

christinapitt76 wrote:Any suggestions on how to do this?

Oh yeah, sorry, I had tried to have a look at your Attachment when you had uploaded it but the Data you are interested in is not included in it, check your Saveas Offline before uploading it. and I was a bit busy during last week being away at some festival for 4 days and new Threads came to the Forum...
The Data you want is "more or less" present as Vars in some JavaScript Script in the Source, reused to dynamically build the HTML Form, at least I had the feeling to recognize some Fields/Values, as I only had a few seconds to see how your Page looked like the first time I tried to have a look at it, before it got automatically redirected, but pff..., it would be a bit cumbersome to want to extract it this way...
Last edited by chivracq on Tue Oct 11, 2016 9:11 am, edited 1 time in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5846
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting text with no html TYPE

by christinapitt76 on Tue Oct 11, 2016 7:46 am

Sorry about that. I did a save as from firefox, my main browser. This new attachment is save as html only from chrome. Hopefully this has what you need. And really really really thank you for even responding!! You are much better than my other experience with a paid online software product (sobipro if you are wondering)
Attachments
down-from-chrome.zip
requested save as html only, this time from chrome, not firefox
(76.83 KIB) Downloaded 61 times
christinapitt76
 
Posts: 6
Joined: Wed Nov 11, 2015 11:58 am

Re: Extracting text with no html TYPE

by iimfun on Tue Oct 18, 2016 1:37 am

I don't know why nobody could help you so far. Therefore try the following code of mine
Code: Select all
TAG POS=1 TYPE=H1 ATTR=CLASS:brand-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title EXTRACT=TXT
TAG POS=1 TYPE=H2 ATTR=CLASS:item-title<SP>item-size EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:price EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=CLASS:compare-price EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:savings-percentage EXTRACT=txt
TAG POS=1 TYPE=DIV ATTR=CLASS:final-sale EXTRACT=TXT
TAG POS=1 TYPE=STRONG ATTR=TXT:Description EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:* EXTRACT=TXT

SET tmpEXTRACT {{!EXTRACT}}
SET !EXTRACT NULL

TAG POS=1 TYPE=STRONG ATTR=TXT:Description
SET eMes Measurements
TAG POS=R1 TYPE=DIV ATTR=TXT:{{eMes}}* EXTRACT=TXT
SET eMes EVAL("'{{!EXTRACT}}'.replace('{{eMes}}', '').trim();")
SET !EXTRACT NULL

SET eMat Materials
TAG POS=R1 TYPE=DIV ATTR=TXT:{{eMat}}* EXTRACT=TXT
SET eMat EVAL("'{{!EXTRACT}}'.replace('{{eMat}}', '').trim();")
SET !EXTRACT NULL

SET eCon Condition
TAG POS=R1 TYPE=DIV ATTR=TXT:{{eCon}}* EXTRACT=TXT
SET eCon EVAL("'{{!EXTRACT}}'.replace('{{eCon}}', '').trim();")

SET !EXTRACT {{tmpEXTRACT}}[EXTRACT]{{eMes}}[EXTRACT]{{eMat}}[EXTRACT]{{eCon}}
iimfun
 
Posts: 201
Joined: Tue Jul 19, 2016 6:06 am

Re: Extracting text with no html TYPE

by christinapitt76 on Tue Oct 18, 2016 11:46 am

Thanks! That works perfectly!!
christinapitt76
 
Posts: 6
Joined: Wed Nov 11, 2015 11:58 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 5 guests

Website Monitoring