Extract a json page

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
chivracq
Posts: 9588
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 10:04 pm

But hum..., OK, I'm able to reproduce with v10.0.2 'Free' for FF on FF v83.0 'Portable' (well the stupid Browser first needed to update itself x2, from v72.0.1 => v72.0.2 => v83.0...!), the Page gets displayed like I described for FF55, with the 3 Tabs, but the Script then simply hangs on any of the 'TAG' Statements. and also without an 'EXTRACT' Command, and also with '!ERRORIGNORE' and the "Ignore Unsupported Commands" Setting both activated... The Script simply hangs... Oops...! :oops:

Well, yep indeed, this is supposed to be "by Design", don't ask me...!, v10.0.2 for FF has/had a Check on the URL-Type and only allows a mini-Set of "supported" HTML Types for URL's/Webpages, and apparently, raw '.json' Pages are not "in the List", ah-ah...! :(
The same happens also when you have any 'EVAL()' Statement and the Script happens to land on "such a page", also when you get a Connection or Server Not Found Error...
I found that a Blocking Bug during the Beta Testing Phase, now 2 years ago, I think, but nobody has ever "complained" about it, then tja...!, fair enough, ah-ah...! :|

Then OK, that means it won't work at all using iMacros for FF v10.0.2 'Free'/'PE'. :( :shock:
... But it seems to work in CR using iMacros for CR v10.1.0 ('Free') like I tested previously... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 13
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Sat Dec 05, 2020 3:16 am

Thanks. I'll try it in cr tomorrow.
Rehabman2020
Posts: 13
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Sat Dec 05, 2020 8:58 pm

I am unable to get the chrome version operating. Chrome runs fine, but no side bar, no extensions listed for iMacros.
i went to my license area and downloaded and installed the "File Access for iMacros Extensions"
Ran that file.
Still cannot get iMacros to show as a sidebar in chrome . cannot ger the scriping in enterprise edition 12+ to work in chrome either. if just does not call it and locks up.

I can get it to work in firefox, but of course the code you shared, will not work.

I am unable to try the newest code, in chrome, because i cant get -cr to work.

I'll have to call someone on monday, i guess
chivracq
Posts: 9588
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Sat Dec 05, 2020 9:15 pm

Rehabman2020 wrote:
Sat Dec 05, 2020 8:58 pm
I am unable to get the chrome version operating. Chrome runs fine, but no side bar, no extensions listed for iMacros.
i went to my license area and downloaded and installed the "File Access for iMacros Extensions"
Ran that file.
Still cannot get iMacros to show as a sidebar in chrome . cannot ger the scriping in enterprise edition 12+ to work in chrome either. if just does not call it and locks up.

I can get it to work in firefox, but of course the code you shared, will not work.

I am unable to try the newest code, in chrome, because i cant get -cr to work.

I'll have to call someone on monday, i guess

Hum, I'm not very "knowledgeable" about Installation Issues, this is more @TechSup's "Area of Expertise", ah-ah...!
=> I would say, open a separate/stand-alone Thread about that part in the 'Installation and Licensing' Sub-Forum, => @TechSup will get some Automatic Notification about your Thread and will be able to help you "directly", on Monday also probably... (Although they sometimes react also even during the WE, oh-oh...! 8) )

Make sure to select the "correct" Sub-Forum, to give your Thread a Descriptive Title, and to mention your FCI... :idea:
(And you may want to link to this current Thread maybe... But I was going to "ping" them anyway, about my Post where I mention that the 'TAG' Mode hangs in v10.0.2 for FF "(more or less) by Design" on a '.json' Page...)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 13
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Sat Dec 05, 2020 9:36 pm

:D
Thank you
Rehabman2020
Posts: 13
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Sat Dec 05, 2020 11:23 pm

Hey chivracq!
i went to youtube found out how to add the imacros ext to chrome. (https://www.youtube.com/watch?v=G0vYG41 ... ingChannel)

downloaded and installed it. pasted in your code and vooalahh it works, you are a master, thank you. i will have more questions in the future.

Monday i will install it in the scripting version, and make it work . then on to my next issue.


FYI, you have to go to chome app store
downloand the imacro extension
and install it
go to chrome, and add the extension.
then you have to enable the extension.

i could not find this information at imacros. /progress.
Rehabman2020
Posts: 13
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Sun Dec 06, 2020 7:48 pm

Well I snuck back to my office computer to try it out.
when i run the code from imacros chrome, the page text comes up in the prompt
now i am trying to get the code to work in the script (MS ACCESS VBA), here is what i have


customCode = "VERSION BUILD=8820413 RECORDER=CR" & vbNewLine & _
"'TAB T=1" & vbNewLine & _
"URL GOTO=https://us-street.api.smartystreets.com ... ch=invalid" & vbNewLine & _
"TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT" & vbNewLine & _
"SET !VAR1 EVAL(" & Chr(34) & "var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\" & Chr(34) & "Active\" & Chr(34) & ":'); y=x[1]; z=y.split('\" & Chr(34) & "'); z[1];" & Chr(34) & ") " & vbNewLine & _
"PROMPT Active_Status:<SP>_{{!VAR1}}_" & vbNewLine & _
"ADD !EXTRACT Active_Status:<SP>_{{!VAR1}}"


I get error -1001, and the value of !VAR1 and EXTRACT are empty
can you see where i am going wrong?
chivracq
Posts: 9588
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Sun Dec 06, 2020 9:15 pm

Rehabman2020 wrote:
Sun Dec 06, 2020 7:48 pm
Well I snuck back to my office computer to try it out.
when i run the code from imacros chrome, the page text comes up in the prompt
now i am trying to get the code to work in the script (MS ACCESS VBA), here is what i have

Code: Select all

  customCode = "VERSION BUILD=8820413 RECORDER=CR" & vbNewLine & _
                    "'TAB T=1" & vbNewLine & _
                    "URL GOTO=https://us-street.api.smartystreets.com/street-address?auth-id=334384c1-46b9-f327-cd99-992659366164&auth-token=rPhLq2qs2I7pFR625Cy2&candidates=10&street=880%20duke%20rd&city=columbus&state=oh&zipcode=43213&match=invalid" & vbNewLine & _
                    "TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT" & vbNewLine & _
                    "SET !VAR1 EVAL(" & Chr(34) & "var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\" & Chr(34) & "Active\" & Chr(34) & ":'); y=x[1]; z=y.split('\" & Chr(34) & "'); z[1];" & Chr(34) & ") " & vbNewLine & _
                    "PROMPT Active_Status:<SP>_{{!VAR1}}_" & vbNewLine & _   
                   "ADD !EXTRACT Active_Status:<SP>_{{!VAR1}}"

I get error -1001, and the value of !VAR1 and EXTRACT are empty
can you see where i am going wrong?

Poufff...!, wouff-wouff...! Well, your 'EVAL()' Line already contains Single + Double Quotes + Escape Chars that probably need to be escaped (again) or converted to some "Chr(n)" Char like you seem to have done already for 1 Char with "Chr(34)", I'm not sure what this one is supposed to do, 5 times in the 'EVAL()' Statement...

What might help already would be to use Single Quotes for your "customCode" String to build the Macro on the fly, at least for that Line about 'EVAL()'. Alternating Single and Double Quotes often removes the Need to escape those Chars, like I do in 'EVAL()' Statements in pure '.iim', except inside 'split()' where Double Quotes still need to be escaped and your extracted Data used some Double Quotes and I also included them in the String I used for the 'split()'...

But hum..., 'EVAL()' is "meant" to be able to use JS inside a pure '.iim' Script, I consider it pretty cumbersome and rather "Bad Practice" to use 'EVAL()' in an on-the-fly Macro generated from a '.js' or '.vba' Script, you could "better" only pass the raw '!EXTRACT' to your '.vba' Script and do that same Data Manipulation directly from VBA, you don't need 'EVAL()' for that... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 9588
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Sun Dec 06, 2020 9:25 pm

Posting this part in a separate Post...: :wink:
chivracq wrote:
Fri Dec 04, 2020 10:04 pm
But hum..., OK, I'm able to reproduce with v10.0.2 'Free' for FF on FF v83.0 'Portable' (well the stupid Browser first needed to update itself x2, from v72.0.1 => v72.0.2 => v83.0...!), the Page gets displayed like I described for FF55, with the 3 Tabs, but the Script then simply hangs on any of the 'TAG' Statements. and also without an 'EXTRACT' Command, and also with '!ERRORIGNORE' and the "Ignore Unsupported Commands" Setting both activated... The Script simply hangs... Oops...! :oops:

Well, yep indeed, this is supposed to be "(more or less) by Design", don't ask me...!, v10.0.2 for FF has/had a Check on the URL-Type and only allows a mini-Set of "supported" HTML Types for URL's/Webpages, and apparently, raw '.json' Pages are not "in the List", ah-ah...! :(
The same happens also when you have any 'EVAL()' Statement and the Script happens to land on "such a page", also when you get a Connection or Server Not Found Error...
I found that a Blocking Bug during the Beta Testing Phase, now 2 years ago, I think, but nobody has ever "complained" about it, then tja...!, fair enough, ah-ah...! :|

And in the "meantime", I guess I was "a bit bored", ah-ah...!, but I found a Workaround to get the Func from the previous Script I had posted, (=> to extract the '.json' Page and isolate any Data), to work in v10.0.2 for FF... :D , but hum..., it's pretty cumbersome, and actually a whole "succession" of Workaround on Workaround on Workaround, ah-ah...!! :shock: :twisted:

Alright, the (main) Principle is fairly simple, but ah-ah...!, you can better "fasten your seat-belt", ah-ah...!: :twisted:

1- Save the '.json' Page locally using the 'SAVEAS TYPE=HTM' or '=TXT' Command.
OK, I couldn't test in v10.0.2 for FF 'Free', as the 'SAVEAS' Command requires the 'PE' Version.
But using v8.9.7 on FF v55.0.3, the Page then gets saved in the Default 'Downloads' Folder as "street-address.htm" or "street-address.txt". The File Extension gets (luckily) automatically added by iMacros.

2- Then in a New Tab, open that saved '.htm' or '.txt' File using the 'URL GOTO' Command and run the rest of the Script for the Extraction on the Local Page.
And yep indeed in v10.0.2, the Extraction then works fine, with all 4 'TYPE=*/HTML/BODY/PRE' I had previously mentioned, and for both '.htm' or '.txt'. Woaw...!, very good...!

BUT...!, we then encounter the "Blocking Bug" I had already mentioned about 'EVAL()' hanging when the Script happens to be on a "privileged" URL for the '.htm' File opened locally. Grrr...! BUT it works with the '.txt' File. No Block... Then OK, we "focus" on the '.txt' and forget about the '.htm' one.
Even if I actually have a Workaround for that stupid Bug ("by Design") with 'EVAL()', but it requires a 3rd Tab, etc..., we simply take the '.txt' File as we have the "luxury" to have 2 Options...

The Script will hang on 'EVAL()' on the Local '.htm' File. We at least get some "decent" Runtime Error if trying to run the 'EVAL()' on the original (online) '.json' Page:
"MacroError: Missing host permission for the tab, line: 18"

3- To open the just previously saved File (as '.txt' thus) on TAB_1, I had some "nice Idea" with '!FOLDER_DOWNLOAD' to dynamically reconstruct the Path to that File to reuse with 'URL GOTO', but I couldn't test because again, '!FOLDER_DOWNLOAD' is only supported in the 'PE' Version even if I tried to hard-code it. Nope, not supported in the 'Free' Version, grrr...!

4- Then the "easiest" part, I/you would think, simply open that Local ("street-address.txt") File using the 'URL GOTO=file:///' Protocol, with the Path hard-coded in my Test... BUT NOPE..., the "file:///" Protocol/URL is apparently also a "privileged" URL and v10.0.2 for FF refuses to open it, without any "Explanation"/Error, the Script simply hangs, ah-ah...!! Re-grrr...!!

5- Then OK, then what "works", Workaround on Workaround on Workaround I said, ah-ah...!, is to have a TAB_2 already opened (manually) on that Local '.txt' Page (from some previous manual Save, or a 'SAVEAS' from iMacros), and a 'REFRESH' Command will be able to (re)load the newly saved Version of the Page/File (with the same Name thus), and the Script is then able to proceed and finish...

And that would give stg like...:

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
SET !ERRORIGNORE YES
TAB T=1
'URL GOTO=https://us-street.api.smartystreets.com/street-address?auth-id=334384c1-46b9-f327-cd99-992659366164&auth-token=rPhLq2qs2I7pFR625Cy2&candidates=10&street=880%20duke%20rd&city=columbus&state=oh&zipcode=43213&match=invalid

'Save '.json' Page locally:
'(Both 'SAVEAS' + '!FOLDER_DOWNLOAD' only supported in v10.0.2 for FF 'PE', not supported in the 'Free' Version.)
'SAVEAS TYPE=TXT FOLDER=* FILE=+{{!NOW:yyyymmdd_hhhnn}}
'SAVEAS TYPE=TXT FOLDER=* FILE=*
'SAVEAS TYPE=HTM FOLDER=* FILE=*

'They all work, but need to use '.txt' File:
SAVEAS TYPE=TXT FOLDER=* FILE=*

'file:///D:/TEMP/iMacros/Current/iMacros/Downloads/street-address.txt
SET !VAR2 D:/TEMP/iMacros/Current/iMacros/Downloads/
'SET !FOLDER_DOWNLOAD D:/TEMP/iMacros/Current/iMacros/Downloads/
'PROMPT _{{!FOLDER_DOWNLOAD}}_
'PAUSE

'TAB OPEN
TAB T=2
'URL GOTO=file:///{{!FOLDER_DOWNLOAD}}/street-address.txt
'URL GOTO=file:///{{!VAR2}}street-address.txt
'URL GOTO=file:///D:/TEMP/iMacros/Current/iMacros/Downloads/street-address.txt
'URL GOTO=file:///D:/TEMP/iMacros/Current/iMacros/Downloads/street-address.htm

'URL GOTO' doesn't work in v10.0.2 for FF with 'file:///' Protocol/URL.
'But 'REFRESH' works if the Page/File is already loaded in 'TAB_2':
REFRESH
PAUSE

TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
PROMPT Active_Status:<SP>_{{!VAR1}}_
PAUSE

'SET Active_Status EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
'PROMPT Active_Status:<SP>_{{Active_Status}}_

'4 Types we can use:
TAG POS=1 TYPE=* ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=HTML ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=BODY ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT
(Tested on a "Mix" of [v10.0.2 for FF 'Free' + FF83] and [v8.9.7 for FF + FF v55.0.3], on Win10_Pro_x64.)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 13
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Mon Dec 07, 2020 2:34 pm

Chivracq,
Thanks for all your work. I finally got the script version to run, and able to collect the info i needed.

1st major problem is that it had to run in Chrome iMacros, which i had a very difficult time finding how to do
2nd problem was converting the manual iMacros code to the script version.

both solved.

Thank you very much!
Tom, Tech Support
Posts: 3637
Joined: Mon May 31, 2010 4:59 pm

Re: Extract a json page

Post by Tom, Tech Support » Tue Dec 08, 2020 6:54 pm

Rehabman2020 wrote:
Sat Dec 05, 2020 11:23 pm
FYI, you have to go to chome app store
downloand the imacro extension
and install it
go to chrome, and add the extension.
then you have to enable the extension.

i could not find this information at imacros. /progress.

https://wiki.imacros.net/iMacros_for_Chrome#Installation

https://wiki.imacros.net/Webextensions#Installation

Rehabman2020 wrote:
Mon Dec 07, 2020 2:34 pm
2nd problem was converting the manual iMacros code to the script version.
It is not necessary to embed the macro code directly inside your VBA code. You could have simply called the external macro file directly using iimPlay("<filename>").
Regards,

Tom, iMacros Support
Post Reply