Extract portion of data

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extract portion of data

by azbob on Sun Aug 14, 2016 4:36 pm

Configuration: Surface Pro 2, Win 10, Firefox 47.0.1, iMacros Standard Edition (x86) Version 11.0.246.4051

Hi,
I am trying to extract just the data highlighted in the CMA example. I have tried both normal POS as well as relative. Both will work but the problem is that it "extracts" all three categories and saves as one set of data IE: 121119013434. I have tried many iterations of the extract statement to try and drill down to specific data( 1901) but to no avail. For what it is worth I have attached the web pages HTML. Is there any way to accomplish this?
Thanks
Attachments
Web page HTML.jpg
Web page HTML
CMA example.jpg
Web page with data to extract
azbob
 
Posts: 76
Joined: Mon Sep 21, 2009 11:16 am

Re: Extract portion of data

by chivracq on Sun Aug 14, 2016 7:16 pm

azbob wrote:Configuration:
Code: Select all
Surface Pro 2, Win 10, Firefox 47.0.1, iMacros Standard Edition (x86) Version 11.0.246.4051


Hi,
I am trying to extract just the data highlighted in the CMA example. I have tried both normal POS as well as relative. Both will work but the problem is that it "extracts" all three categories and saves as one set of data IE: 121119013434. I have tried many iterations of the extract statement to try and drill down to specific data( 1901) but to no avail. For what it is worth I have attached the web pages HTML. Is there any way to accomplish this?
Thanks

Yep, use 'EXTRACT=HTM' on your Cells which will contain the '<BR>' Tags and you can use 'EVAL()' + 'split()' on "<BR>" to isolate the Data that you want to keep.

Post the URL or upload an HTML Saveas of the Page (zipped, Max 256Kb) if you don't come out by yourself...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract portion of data

by azbob on Thu Aug 18, 2016 9:11 pm

Hi,
Followed your suggestions and it woks.
BTW is there a more elegant way to append data to extract file before creating csv then what I did in lines 14-18??
Code: Select all
VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET !EXTRACT EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var a='{{!EXTRACT}}';var pos2=a.indexOf(']');a=a.slice(0,pos2+1);a;")
'Slice out AVG $per SF from extract
SET !VAR2 EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
'Creat new Extract
SET !EXTRACT EVAL("var a='{{!VAR1}}'.concat('{{!VAR2}}');a;")
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled)
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv


Thanks for your help.
azbob
 
Posts: 76
Joined: Mon Sep 21, 2009 11:16 am

Re: Extract portion of data

by chivracq on Fri Aug 19, 2016 12:06 am

azbob wrote:Hi,
Followed your suggestions and it woks.
BTW is there a more elegant way to append data to extract file before creating csv then what I did in lines 14-18??
Code: Select all
VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET !EXTRACT EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var a='{{!EXTRACT}}';var pos2=a.indexOf(']');a=a.slice(0,pos2+1);a;")
'Slice out AVG $per SF from extract
SET !VAR2 EVAL("var a='{{!EXTRACT}}';var pos1=a.indexOf('<br>'); var pos2=a.lastIndexOf('<br>'); a=a.slice(pos1,pos2);a;")
'Creat new Extract
SET !EXTRACT EVAL("var a='{{!VAR1}}'.concat('{{!VAR2}}');a;")
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled)
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv


Thanks for your help.

OK, I see, and hum..., using 'indexOf()' + 'slice()' is indeed another Solution, even if I find it a bit cumbersome compared to 'split()' and your way of rebuilding the whole Content of the Extract with again 'indexOf()' + 'slice()' and + 'concat()' is funny and fairly cumbersome again instead of working on the raw '!EXTRACT' that you then can store in some Vars and you can reset to 'NULL' between 2 Extracts and that you can put again together using 'ADD'.

You didn't provide the URL nor uploaded an HTML Saveas of the Page like I had suggested but I can already guess a bit how the HTML Structure of the Page looks like from your Script and I would come up with stg like this:
Code: Select all
VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
'TAB CLOSEALLOTHERS
SET !EXTRACT_TEST_POPUP NO

SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24

'Average Approx SQFT:
TAG POS=1 TYPE=TD ATTR=TXT:*Overall*
TAG POS=R1 TYPE=TD ATTR=TXT:*Low*Avg*High*
SET !EXTRACT NULL
TAG POS=R4 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET SQFT_Avg EVAL("var s='{{!EXTRACT}}'; var x=split('<br>'); x[1];")

'Average Sold Price per Approx SQFT:
TAG POS=1 TYPE=TD ATTR=TXT:*Overall*
TAG POS=R1 TYPE=TD ATTR=TXT:*Low*Avg*High*
SET !EXTRACT NULL
TAG POS=R6 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
SET SQFT_Price_Avg EVAL("var s='{{!EXTRACT}}'; var x=split('<br>'); x[1];")

'Create new Extract:
SET !EXTRACT {{SQFT_Avg}}
ADD !EXTRACT {{SQFT_Price_Avg}}
PROMPT SQFT_Avg:<SP>_{{SQFT_Avg}}_<BR>SQFT_Price_Avg:<SP>_{{SQFT_Price_Avg}}_<BR><BR>TEMP_EXTRACT:<BR>_{{!EXTRACT}}_

'Total Comp sales:
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled)
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT

FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv
(Not tested obviously...!)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract portion of data

by azbob on Fri Aug 19, 2016 11:14 am

Interesting..My ignorance of Setting extract to NULL forced me to that massage the data with the convoluted code...


Made some changes and here is what works:
VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
SET !EXTRACT NULL
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT
SET !EXTRACT NULL
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out AVG $ per SF from extract
SET !VAR2 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
'Create new Extract
SET !EXTRACT {{!VAR1}}
ADD !EXTRACT {{!VAR2}}
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled)
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv

I new there was a more straight forward way of handling just couldn't find it.
Thanks for your help.
azbob
 
Posts: 76
Joined: Mon Sep 21, 2009 11:16 am

Re: Extract portion of data

by chivracq on Fri Aug 19, 2016 11:49 am

azbob wrote:Interesting..My ignorance of Setting extract to NULL forced me to that massage the data with the convoluted code...


Made some changes and here is what works:
Code: Select all
VERSION BUILD=11.0.246.4051
SET !ERRORIGNORE YES
TAB T=1
TAB CLOSEALLOTHERS
SET !DATASOURCE C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
'SET !DATASOURCE_COLUMNS 24
'Average GLS SF and Average $/sf
TAG POS=1 TYPE=TH ATTR=TXT:Approx<SP>SQFT
SET !EXTRACT NULL
TAG POS=R7 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out avg SF from extract
SET !VAR1 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
TAG POS=1 TYPE=TH ATTR=TXT:Sold<SP>Price<SP>Per<SP>Approx<SP>SQFT
SET !EXTRACT NULL
TAG POS=R9 TYPE=TD ATTR=CLASS:right<SP>overall<SP>detail EXTRACT=HTM
'Slice out AVG $ per SF from extract
SET !VAR2 EVAL("var x='{{!EXTRACT}}'; var x=x.split('<br>'); x[1];")
'Create new Extract
SET !EXTRACT {{!VAR1}}
ADD !EXTRACT {{!VAR2}}
'Total Comp sales
TAG POS=1 TYPE=TD ATTR=TXT:Total<SP>#<SP>of<SP>Comparable<SP>Sales<SP>(Settled)
TAG POS=2 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:detail<SP>right EXTRACT=TXT
FILEDELETE NAME=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata\Freddiedata.csv
SAVEAS TYPE=EXTRACT FOLDER=C:\Users\Public\Documents\iMacros\datasources\FreddieCMAdata FILE=Freddiedata.csv


I new there was a more straight forward way of handling just couldn't find it.
Thanks for your help.

Ah-ah...!, I had mentioned the 'split()' in my first Reply, and for manipulating '!EXTRACT', it comes automatically with a bit of Practice + some Wiki & Forum reading...

I guess my 'R4' + 'R6' was not finding the correct Cells, I find it more reliable than your 'R7' + 'R9' Relative to some Table Header as they won't work anymore if any Row gets added to the Table..., but I could only guess the 'Rn' Values without being able to play myself with the Page...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract portion of data

by azbob on Fri Aug 19, 2016 11:55 am

Good advice on the table header reference I will incorporate in the future.
Thanks again.
azbob
 
Posts: 76
Joined: Mon Sep 21, 2009 11:16 am

Re: Extract portion of data

by azbob on Fri Aug 19, 2016 12:05 pm

Oh I missed the R4, R6 bit.
I tried it and it worked!
Thanks again for the tip.
azbob
 
Posts: 76
Joined: Mon Sep 21, 2009 11:16 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: Google [Bot] and 4 guests

-->