szechuansauce wrote:Hello,
I'm using
Code: Select all
iMacro's build 9030808 for Firefox 53.0.2 (32-bit) on Windows 7.
I've been using iMacros to scrape social media profiles for certain pieces of information. Because information on the platform is self-reported, certain pieces of info are often missing, resulting in the tag failing. Ultimately, this means a lot of work for me on the back end sorting the data so that each piece of information is in the correct column. Instead, I'd like to extract a blank each time the program fails to find a piece of information.
I've looked all over the forums, but have yet to find a solution for this yet. This one appears to come close, but was never resolved.
http://forum.imacros.net/viewtopic.php?f=7&t=25503
I imagine the solution would involve the EVAL function, but I don't have any experience in javascript. Is there an obvious way to do this? I've pasted an example script below for context:
Code: Select all
VERSION BUILD=9030808 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES
SET !DATASOURCE Leadership.csv
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!loop}}
URL GOTO={{!COL1}}
WAIT SECONDS=5
TAG POS=1 TYPE=h1 ATTR=CLASS:* EXTRACT=TXT
TAG POS=1 TYPE=h3 ATTR=class:"Sans-17px-black-85%-semibold" EXTRACT=TXT
TAG POS=1 TYPE=span ATTR=class:"pv-position-entity__secondary-title pv-entity__secondary-title Sans-15px-black-55%" EXTRACT=TXT
TAG POS=1 TYPE=h4 ATTR=class:"pv-entity__date-range Sans-**px-black-**%" EXTRACT=TXT
TAG POS=1 TYPE=h4 ATTR=class:"pv-entity__location Sans-**px-black-**% block" EXTRACT=TXT
TAG POS=2 TYPE=h3 ATTR=class:"Sans-17px-black-85%-semibold" EXTRACT=TXT
TAG POS=2 TYPE=span ATTR=class:"pv-position-entity__secondary-title pv-entity__secondary-title Sans-15px-black-55%" EXTRACT=TXT
TAG POS=2 TYPE=h4 ATTR=class:"pv-entity__date-range Sans-**px-black-**%" EXTRACT=TXT
TAG POS=2 TYPE=h4 ATTR=class:"pv-entity__location Sans-**px-black-**% block" EXTRACT=TXT
TAG POS=1 TYPE=h3 ATTR=class:"pv-entity__school-name Sans-17px-black-85%-semibold" EXTRACT=TXT
TAG POS=1 TYPE=p ATTR=class:"pv-education-entity__secondary-title pv-entity__degree-name pv-entity__secondary-title Sans-**px-black-**%" EXTRACT=TXT
TAG POS=1 TYPE=p ATTR=class:"pv-education-entity__secondary-title pv-entity__fos pv-entity__secondary-title Sans-**px-black-**%" EXTRACT=TXT
TAG POS=1 TYPE=p ATTR=class:"pv-education-entity__date pv-entity__dates Sans-**px-black-**%" EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Leadership.csv
WAIT SECONDS=5
Thanks!
Yep, pity indeed the User from the Thread you mention never mentioned their FCI (you are already doing a better "job" on that part, very good...!) and didn't bother to follow up, the Solution in that "Case" was simply to use '!TIMEOUT_STEP' with a short Value of '1' or '0' to shorten the Tag Waiting Time. This would be the Solution for you as well, well..., at least partially...
Because that other Thread was from Decb 2015, and I guess at that time v8.9.2 or v8.9.5 or v8.9.7 for FF were the "current" Versions while you are now (May 2017) using v9.0.3 for FF.
And some "fundamental" Behaviour changed with v9.0.3 compared to v8.9.x related to the 'EXTRACT' Functionality, and that is that when using '!ERRORIGNORE' like you do, 'EXTRACT' will "skip" an HTML Element if it is not found, instead of storing "#EANF#" in previous Versions. I'm not sure if this is an intentional Change or a Bug in v9.0.3 as it is not mentioned in the Release Notes for v9.0.3, there are a few Threads about this "Feature" on the Forum...
You have a few Options you can choose...!:
1- v9.0.3 is a bit Buggy and limited anyway, compared to v8.9.x, and the stable Version at this moment, which still works on FF53 is still v8.9.7 for FF. (Make sure to disable Automatic Updates for iMacros if you "downdate" to v8.9.7 or iMacros will want to update itself again to v9.0.3, ah-ah...!) With v8.9.7, you will get the "old" Behaviour of always getting "#EANF#" when a Field is not found, whether '!ERRORIGNORE' is enabled or not...
2- Option 2 is (in your current v9.0.3 for FF Version) to disable '!ERRORIGNORE' before doing your Extracts, which will return the '#EANF#' Values when the Fields are not found, and will conserve your Table Structure in your 'SAVEAS'.
(When using the 'EXTRACT' Command, your Script will never trigger a RuntimeError on a 'TAG' Statement if the Field is not found, if you were wondering...!)
3- Option 3 is indeed if you are not already "happy" with the '#EANF#' Values, to "transform" them using a fairly simple 'EVAL()' Statement to an empty String or any String you would prefer...
I modify your Script a bit to include Options 2 + 3 (with some easy Config-Switch for the String you would like at the beginning of the Script):
Code: Select all
VERSION BUILD=9030808 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 0
'Easy Access:
SET EANF_String ""
SET !DATASOURCE Leadership.csv
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!loop}}
URL GOTO={{!COL1}}
WAIT SECONDS=5
'Disable '!ERRORIGNORE' to prevent iMacros from skipping Fields not found in v9.0.3 for FF:
SET !ERRORIGNORE NO
TAG POS=1 TYPE=h1 ATTR=CLASS:* EXTRACT=TXT
TAG POS=1 TYPE=h3 ATTR=class:"Sans-17px-black-85%-semibold" EXTRACT=TXT
TAG POS=1 TYPE=span ATTR=class:"pv-position-entity__secondary-title pv-entity__secondary-title Sans-15px-black-55%" EXTRACT=TXT
TAG POS=1 TYPE=h4 ATTR=class:"pv-entity__date-range Sans-**px-black-**%" EXTRACT=TXT
TAG POS=1 TYPE=h4 ATTR=class:"pv-entity__location Sans-**px-black-**% block" EXTRACT=TXT
TAG POS=2 TYPE=h3 ATTR=class:"Sans-17px-black-85%-semibold" EXTRACT=TXT
TAG POS=2 TYPE=span ATTR=class:"pv-position-entity__secondary-title pv-entity__secondary-title Sans-15px-black-55%" EXTRACT=TXT
TAG POS=2 TYPE=h4 ATTR=class:"pv-entity__date-range Sans-**px-black-**%" EXTRACT=TXT
TAG POS=2 TYPE=h4 ATTR=class:"pv-entity__location Sans-**px-black-**% block" EXTRACT=TXT
TAG POS=1 TYPE=h3 ATTR=class:"pv-entity__school-name Sans-17px-black-85%-semibold" EXTRACT=TXT
TAG POS=1 TYPE=p ATTR=class:"pv-education-entity__secondary-title pv-entity__degree-name pv-entity__secondary-title Sans-**px-black-**%" EXTRACT=TXT
TAG POS=1 TYPE=p ATTR=class:"pv-education-entity__secondary-title pv-entity__fos pv-entity__secondary-title Sans-**px-black-**%" EXTRACT=TXT
TAG POS=1 TYPE=p ATTR=class:"pv-education-entity__date pv-entity__dates Sans-**px-black-**%" EXTRACT=TXT
'Re-enable '!ERRORIGNORE':
SET !ERRORIGNORE YES
'Replace '#EANF#' Values with 'EANF_String' defined at beginning of Script:
SET Extracted_Data EVAL("var s='{{!EXTRACT}}'; var eanf='{{EANF_String}}'; var z; z=s.split('#EANF#').join(eanf); z;")
'>
'Debug:
PROMPT Original_EXTRACT:<BR>{{!EXTRACT}}<BR><BR>Cleaned_EXTRACT:<BR>{{Extracted_Data}}
SET !EXTRACT {{Extracted_Data}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Leadership.csv
WAIT SECONDS=5
(Tested on iMacros for FF v8.8.2, Pale Moon v26.3.3 (=FF47), Win10-x64.)
The 'split().join()' Syntax I used is similar to using a Global 'replace()' which uses 'REGEX' but I don't like (= don't master!) 'REGEX' so I prefer to use this "Trick" with 'split()' + 'join()' that I find easier to use and even works directly with Special Characters that would require some Escaping in 'REGEX'...

- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...