exclude extracted data from SAVEAS based on criteria ?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

exclude extracted data from SAVEAS based on criteria ?

by peppe1 on Sat Mar 05, 2016 5:59 am

OS: Windows 8.1 64 bit
iMacros 10.0.2

Good day,

My datasource is a .csv that contains the terms I am using to search in a website. The results of searches are individual pages, depending upon the search term.

It happens that for some of the search terms there exist no individual pages and the website where I am searching shows a "0 items exist" message.

iMacros still extracts data according with the its tag positioning and the DOM structure of the "0 items exist" page or gets #EANF# in the cases it does not find a tag match on the page.
Then, iMacros saves the extracted that in a row of the extraction csv.


I wish to exclude from
Code: Select all
SAVEAS TYPE=EXTRACT
.csv the rows that belong to the "0 items exist" webpage (in other words my extracted CSV to show only the complete data extracted from the website).

Question:
How do I prevent the rows from "items do not exist" to appear in the EXTRACT csv?
What is the condition to be placed looking like?

Thank you.
Last edited by peppe1 on Sun Mar 06, 2016 12:57 am, edited 1 time in total.
peppe1
 
Posts: 11
Joined: Fri Mar 04, 2016 12:32 pm

Re: exclude extracted data from SAVEAS based on criteria ?

by chivracq on Sat Mar 05, 2016 4:43 pm

peppe1 wrote:
Code: Select all
OS: Windows 8.1 64 bit
iMacros 10.0.2


Good day,

My datasource is a .csv that contains the terms I am using to search in a website. The results of searches are individual pages, depending upon the search term.

It happens that for some of the search terms there exist no individual pages and the website where I am searching shows a "0 items exist" message.

iMacros still extracts data according with the its tag positioning and the DOM structure of the "0 items exist" page or gets #EANF# in the cases it does not find a tag match on the page.
Then, iMacros saves the extracted that in a row of the extraction csv.


I wish to exclude from
Code: Select all
SAVEAS TYPE=EXTRACT
.csv the rows that belong to the "0 items exist" webpage (in other words my extracted CSV to show only the complete data extracted from the website).

Question:
How do I prevent the rows from "items do not exist" to appear in the EXTRACT csv?
What is the condition to be placed looking like?

Thank you.

OK, FCI now mentioned, previous Thread finished neatly, I can answer this one...!
Even if, like I mentioned at the end of your previous Thread, the Thread Title for current Thread is probably a bit too long and will probably get truncated in the Replies and won't allow you in the end if if (hopefully...!) gets solved to add some '[Solved]' to your Original Thread Title.
Mini Problem in this one is that you've spelled the 'SAVEAS' Command incorrectly... :roll: It is one Word...! Meaning that some other User searching the Forum with a similar Qt using the Keyword 'SAVEAS" won't find your Thread/Post because of the Typo. So it would be nice if you correct that...! (Because the only other place where you use "SAVEAS" (correctly this time!) is in your mini-Script, but the Forum Search Engine doesn't strangely look into Text contained between '[CODE]' Tags... :shock:

>>>

OK, concerning your Question, what you are looking for is a "Conditional SAVEAS", search my Posts on those Terms I would think, I think I have already posted a Solution (or at least given some Explanation for a Solution) in one or maybe more Thread(s). That was quite a while ago, so I don't remember exactly...

I let you search the Forum, see how far you come and post your Results (together with (a) Link(s) to which Thread(s) you used as a Ref) and rephrase/summarize what you understand on how to achieve it, and I'll guide you otherwise further if you get stuck... I use this "Conditional SAVEAS" myself in one of my Macros, but I think I have already posted it on the Forum...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: exclude extracted data from SAVEAS based on criteria ?

by peppe1 on Sun Mar 06, 2016 1:19 am

hi,

the closest match I could find is here: viewtopic.php?f=7&t=22650&p=57515&hilit=Conditional+SAVEAS#p57515

basically, it is about setting the extraction to a variable using an EVAL() with a conditional statement if {} nested inside of it.

Code: Select all
VERSION BUILD=8601111 RECORDER=FX
TAB T=1
SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP YES

URL GOTO=http://shop2.gzanders.com/mb-930-jm-pro-series-10-shot-24-vr-matte-black-synthetic.html
TAG POS=7 TYPE=SPAN FORM=ACTION:http://shop2.gzanders.com/checkout/cart/add/uenc/aHR0cDovL3Nob3AyLmd6YW5kZXJzLm* ATTR=* EXTRACT=TXT

SET !VAR1 EVAL("if (\"{{!EXTRACT}}\" == \"Out of stock\") {var s = \"\";} else {var s = \"C:\\\Windows\\\Media\\\chimes.wav\";} s;")
'PROMPT !EXTRACT:<SP>{{!EXTRACT}}<br>!VAR1:<SP>{{!VAR1}}

URL GOTO=file://{{!VAR1}}

URL GOTO=http://shop2.gzanders.com/mb-930-jm-pro-series-9-shot-22-vr-matte-black-synth.html
TAG POS=7 TYPE=SPAN FORM=ACTION:http://shop2.gzanders.com/checkout/cart/add/uenc/aHR0cDovL3Nob3AyLmd6YW5kZXJzLm* ATTR=* EXTRACT=TXT

SET !VAR1 EVAL("if (\"{{!EXTRACT}}\" == \"Out of stock\") {var s = \"\";} else {var s = \"C:\\\Windows\\\Media\\\chimes.wav\";} s;")
'PROMPT !EXTRACT:<SP>{{!EXTRACT}}<br>!VAR1:<SP>{{!VAR1}}

URL GOTO=file://{{!VAR1}}



My problems arise from multiple angles. To pinpoint:

1. I am extracting multiple elements from one page during a single run of the loop from the DATASOURCE.
2. (Good thing) only one element (H1) is relevant to verify the condition upon SAVEAS
3. In case that condition is evaluated to positive ( I mean text is "0 products exist" ) I need to stop extraction from that page and jump to the next string in the DATASOURCE csv and perform a new search

Dear Master wizard, help me.

Later edit:
I found a closer example in the Eval.iim
Code: Select all
SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\", d = parseFloat(s); if(d > 99 && d < 101) d; else MacroError(\"Value is not in the set range\")")

how can I switch MacroError to some GOTO script line number or something.

Thank you very much.
peppe1
 
Posts: 11
Joined: Fri Mar 04, 2016 12:32 pm

Re: exclude extracted data from SAVEAS based on criteria ?

by chivracq on Mon Mar 07, 2016 5:40 am

peppe1 wrote:hi,

the closest match I could find is here: viewtopic.php?f=7&t=22650&p=57515&hilit=Conditional+SAVEAS#p57515

basically, it is about setting the extraction to a variable using an EVAL() with a conditional statement if {} nested inside of it.

Code: Select all
VERSION BUILD=8601111 RECORDER=FX
TAB T=1
SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP YES

URL GOTO=http://shop2.gzanders.com/mb-930-jm-pro-series-10-shot-24-vr-matte-black-synthetic.html
TAG POS=7 TYPE=SPAN FORM=ACTION:http://shop2.gzanders.com/checkout/cart/add/uenc/aHR0cDovL3Nob3AyLmd6YW5kZXJzLm* ATTR=* EXTRACT=TXT

SET !VAR1 EVAL("if (\"{{!EXTRACT}}\" == \"Out of stock\") {var s = \"\";} else {var s = \"C:\\\Windows\\\Media\\\chimes.wav\";} s;")
'PROMPT !EXTRACT:<SP>{{!EXTRACT}}<br>!VAR1:<SP>{{!VAR1}}

URL GOTO=file://{{!VAR1}}

URL GOTO=http://shop2.gzanders.com/mb-930-jm-pro-series-9-shot-22-vr-matte-black-synth.html
TAG POS=7 TYPE=SPAN FORM=ACTION:http://shop2.gzanders.com/checkout/cart/add/uenc/aHR0cDovL3Nob3AyLmd6YW5kZXJzLm* ATTR=* EXTRACT=TXT

SET !VAR1 EVAL("if (\"{{!EXTRACT}}\" == \"Out of stock\") {var s = \"\";} else {var s = \"C:\\\Windows\\\Media\\\chimes.wav\";} s;")
'PROMPT !EXTRACT:<SP>{{!EXTRACT}}<br>!VAR1:<SP>{{!VAR1}}

URL GOTO=file://{{!VAR1}}



My problems arise from multiple angles. To pinpoint:

1. I am extracting multiple elements from one page during a single run of the loop from the DATASOURCE.
2. (Good thing) only one element (H1) is relevant to verify the condition upon SAVEAS
3. In case that condition is evaluated to positive ( I mean text is "0 products exist" ) I need to stop extraction from that page and jump to the next string in the DATASOURCE csv and perform a new search

Dear Master wizard, help me.

Later edit:
I found a closer example in the Eval.iim
Code: Select all
SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\", d = parseFloat(s); if(d > 99 && d < 101) d; else MacroError(\"Value is not in the set range\")")

how can I switch MacroError to some GOTO script line number or something.

Thank you very much.

OK..., the Thread you found was/is for a Conditional Sound, but the Principle is more or less the same like what you'll need, at least to make your Script in pure '.iim' without using a '.js' Script which is anyway always the "Standard" way if you want to include some Conditional Behaviour in your Script.

I managed to locate another Thread which is a bit closer, from 2 years ago as well, but I didn't give much Details:
chivracq wrote:
sleepysentry wrote:...
5. If <div class="messageText">No matches found</div> appears on the page, add TICKET_NUMBER to a list of failed searches
...

...
5- Use 'SAVEAS' and 'EVAL' for a Conditional Save to a local .TXT File based on 'TAG' + 'EXTRACT' or use a .js Script for the Conditional part.
...

In a Nutshell, the Principle for a Conditional Sound is using 'EVAL()' to compute a String that will be the Path to a Sound File (or an empty String for no Sound) that will be reused and played with 'URL GOTO'.
For a Conditional 'SAVEAS', the Principle is the same, the String that you will compute with 'EVAL()' will be the File Name (with Extension, "Products.csv" for example) or an Empty String or a different Temp File Name for the 'Product not found' Pages, that you'll be able to delete from time to time, and that Computed String will be used for the 'FILE' Parameter for the 'SAVEAS'.

3. In case that condition is evaluated to positive ( I mean text is "0 products exist" ) I need to stop extraction from that page and jump to the next string in the DATASOURCE csv and perform a new search

Hum, there are 3 ways to get to the Result that you want...:
1- Like you mention, to "stop" the Extraction, using 'EVAL()' + 'MacroError()' like you found, and/but you'll need a main '.js' Script to handle the Looping.
2- Keep doing the "Fake" Extraction even if there is nothing to extract, and the Conditional SAVEAS with my Method will save or not the EXTRACT, or save it to 2 different Files. If you are going to get several/many "#EANF#", you'll need to use a short '!TIMEOUT_STEP' (0 or 1) to speed up the fake Extractions, together with '!ERRORIGNORE'.
3- If once you've identified the first '#EANF#' or "0 products exist", it is possible using 'EVAL()' to compute a '1 / 0' to use for 'TAG POS=n' for all Extracts you do after that, 'TAG POS=0' won't do/extract anything...

I would go (and I already use it in one of my Macros...!) for Method 2, but/because I never use 'js' Scripts...
I actually don't remember if I use an Empty String or a String for a Temp File to delete from time to time for the Records I don't want to keep, I haven't checked that folder for months, I might have some huge Temp File to delete..., ah-ah...!, oops...!
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: exclude extracted data from SAVEAS based on criteria ?

by peppe1 on Tue Mar 08, 2016 6:21 am

Thank you very much.

I just noticed there exist a third case too, when the search shows multiple possible results. I have been thinking, I do not have any encompassing condition to put in the H1 to catch this third condition , other than may be, counting the number of characters (but this is limited).

I would need to go lots into Javascript and some advanced Regex-fu for which I do not have much time available. It'd be good, but I gotta make it work.

I will extract the complete data exactly as it goes out of iMacros and I will handle it via VBA/Excel afterwards. It gets simpler this way. A macro for .csv import, filtering on columns , copy/paste to additional worksheets, eventually adding some language descriptive headings.

Once more, thank you for all the valuable hints.

Will post the VBA solution here, if anyone might be interested.

All the best,
George
peppe1
 
Posts: 11
Joined: Fri Mar 04, 2016 12:32 pm


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 2 guests

-->