Extract portion of data

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
azbob
Posts: 85
Joined: Mon Sep 21, 2009 6:16 pm

Extract portion of data

Post by azbob » Sun Aug 14, 2016 11:36 pm

Configuration: Surface Pro 2, Win 10, Firefox 47.0.1, iMacros Standard Edition (x86) Version 11.0.246.4051

Hi,
I am trying to extract just the data highlighted in the CMA example. I have tried both normal POS as well as relative. Both will work but the problem is that it "extracts" all three categories and saves as one set of data IE: 121119013434. I have tried many iterations of the extract statement to try and drill down to specific data( 1901) but to no avail. For what it is worth I have attached the web pages HTML. Is there any way to accomplish this?
Thanks
Attachments
Web page HTML
Web page HTML
Web page with data to extract
Web page with data to extract
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract portion of data

Post by chivracq » Mon Aug 15, 2016 2:16 am

azbob wrote:Configuration:

Code: Select all

Surface Pro 2, Win 10, Firefox 47.0.1, iMacros Standard Edition (x86) Version 11.0.246.4051
Hi,
I am trying to extract just the data highlighted in the CMA example. I have tried both normal POS as well as relative. Both will work but the problem is that it "extracts" all three categories and saves as one set of data IE: 121119013434. I have tried many iterations of the extract statement to try and drill down to specific data( 1901) but to no avail. For what it is worth I have attached the web pages HTML. Is there any way to accomplish this?
Thanks
Yep, use 'EXTRACT=HTM' on your Cells which will contain the '<BR>' Tags and you can use 'EVAL()' + 'split()' on "<BR>" to isolate the Data that you want to keep.

Post the URL or upload an HTML Saveas of the Page (zipped, Max 256Kb) if you don't come out by yourself...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
azbob
Posts: 85
Joined: Mon Sep 21, 2009 6:16 pm

Re: Extract portion of data

Post by azbob » Fri Aug 19, 2016 4:11 am

Hi,
Followed your suggestions and it woks.
BTW is there a more elegant way to append data to extract file before creating csv then what I did in lines 14-18??

Code: Select all

VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET !EXTRACT EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT 
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var a='{{!EXTRACT}}';var pos2=a.indexOf(']');a=a.slice(0,pos2+1);a;")
'Slice out AVG $per SF from extract
SET !VAR2 EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
'Creat new Extract
SET !EXTRACT EVAL("var a='{{!VAR1}}'.concat('{{!VAR2}}');a;")
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled) 
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv
Thanks for your help.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract portion of data

Post by chivracq » Fri Aug 19, 2016 7:06 am

azbob wrote:Hi,
Followed your suggestions and it woks.
BTW is there a more elegant way to append data to extract file before creating csv then what I did in lines 14-18??

Code: Select all

VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET !EXTRACT EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT 
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var a='{{!EXTRACT}}';var pos2=a.indexOf(']');a=a.slice(0,pos2+1);a;")
'Slice out AVG $per SF from extract
SET !VAR2 EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
'Creat new Extract
SET !EXTRACT EVAL("var a='{{!VAR1}}'.concat('{{!VAR2}}');a;")
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled) 
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv
Thanks for your help.
OK, I see, and hum..., using 'indexOf()' + 'slice()' is indeed another Solution, even if I find it a bit cumbersome compared to 'split()' and your way of rebuilding the whole Content of the Extract with again 'indexOf()' + 'slice()' and + 'concat()' is funny and fairly cumbersome again instead of working on the raw '!EXTRACT' that you then can store in some Vars and you can reset to 'NULL' between 2 Extracts and that you can put again together using 'ADD'.

You didn't provide the URL nor uploaded an HTML Saveas of the Page like I had suggested but I can already guess a bit how the HTML Structure of the Page looks like from your Script and I would come up with stg like this:

Code: Select all

VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
'TAB CLOSEALLOTHERS
SET !EXTRACT_TEST_POPUP NO

SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24

'Average Approx SQFT:
TAG POS=1 TYPE=TD ATTR=TXT:*Overall*
TAG POS=R1 TYPE=TD ATTR=TXT:*Low*Avg*High*
SET !EXTRACT NULL
TAG POS=R4 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET SQFT_Avg EVAL("var s='{{!EXTRACT}}'; var x=split('<br>'); x[1];")

'Average Sold Price per Approx SQFT:
TAG POS=1 TYPE=TD ATTR=TXT:*Overall*
TAG POS=R1 TYPE=TD ATTR=TXT:*Low*Avg*High*
SET !EXTRACT NULL
TAG POS=R6 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET SQFT_Price_Avg EVAL("var s='{{!EXTRACT}}'; var x=split('<br>'); x[1];")

'Create new Extract:
SET !EXTRACT {{SQFT_Avg}}
ADD !EXTRACT {{SQFT_Price_Avg}}
PROMPT SQFT_Avg:<SP>_{{SQFT_Avg}}_<BR>SQFT_Price_Avg:<SP>_{{SQFT_Price_Avg}}_<BR><BR>TEMP_EXTRACT:<BR>_{{!EXTRACT}}_

'Total Comp sales:
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled) 
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT

FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv
(Not tested obviously...!)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
azbob
Posts: 85
Joined: Mon Sep 21, 2009 6:16 pm

Re: Extract portion of data

Post by azbob » Fri Aug 19, 2016 6:14 pm

Interesting..My ignorance of Setting extract to NULL forced me to that massage the data with the convoluted code...


Made some changes and here is what works:
VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
SET !EXTRACT NULL
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT
SET !EXTRACT NULL
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out AVG $ per SF from extract
SET !VAR2 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
'Create new Extract
SET !EXTRACT {{!VAR1}}
ADD !EXTRACT {{!VAR2}}
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled)
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv

I new there was a more straight forward way of handling just couldn't find it.
Thanks for your help.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract portion of data

Post by chivracq » Fri Aug 19, 2016 6:49 pm

azbob wrote:Interesting..My ignorance of Setting extract to NULL forced me to that massage the data with the convoluted code...


Made some changes and here is what works:

Code: Select all

VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
SET !EXTRACT NULL
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT 
SET !EXTRACT NULL
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out AVG $ per SF from extract
SET !VAR2 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
'Create new Extract
SET !EXTRACT {{!VAR1}}
ADD !EXTRACT {{!VAR2}}
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled) 
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv
I new there was a more straight forward way of handling just couldn't find it.
Thanks for your help.
Ah-ah...!, I had mentioned the 'split()' in my first Reply, and for manipulating '!EXTRACT', it comes automatically with a bit of Practice + some Wiki & Forum reading...

I guess my 'R4' + 'R6' was not finding the correct Cells, I find it more reliable than your 'R7' + 'R9' Relative to some Table Header as they won't work anymore if any Row gets added to the Table..., but I could only guess the 'Rn' Values without being able to play myself with the Page...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
azbob
Posts: 85
Joined: Mon Sep 21, 2009 6:16 pm

Re: Extract portion of data

Post by azbob » Fri Aug 19, 2016 6:55 pm

Good advice on the table header reference I will incorporate in the future.
Thanks again.
azbob
Posts: 85
Joined: Mon Sep 21, 2009 6:16 pm

Re: Extract portion of data

Post by azbob » Fri Aug 19, 2016 7:05 pm

Oh I missed the R4, R6 bit.
I tried it and it worked!
Thanks again for the tip.
Post Reply