Extract a json page

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Rehabman2020
Posts: 14
Joined: Fri Dec 04, 2020 1:27 pm

Extract a json page

Post by Rehabman2020 » Fri Dec 04, 2020 2:04 pm

I am able to get to the proper page using an api, I need to extract a couple elements on the page or the whole page as text, (not a screenshot, which is .png). Its hard to show how this is done, unless you have your own api for smartystreets.com. I've searched this site and others for a solution. Am I missing something simple? Is there somewhere i can find out how to do this?

BTW i have been using IMacros over 20 years( when it was IMM , iopus, before it was ipswitch, and now progress.) I use it in conjunction with MS Access, so i use the most current scripting edition.

Thank You in advance for any help!
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 2:12 pm

Rehabman2020 wrote:
Fri Dec 04, 2020 2:04 pm
I am able to get to the proper page using an api, I need to extract a couple elements on the page or the whole page as text, (not a screenshot, which is .png). Its hard to show how this is done, unless you have your own api for smartystreets.com. I've searched this site and others for a solution. Am I missing something simple? Is there somewhere i can find out how to do this?

BTW i have been using IMacros over 20 years( when it was IMM , iopus, before it was ipswitch, and now progress.) I use it in conjunction with MS Access, so i use

Code: Select all

the most current scripting edition.
Thank You in advance for any help!

(F)CIM...! :mrgreen: (Read my Sig...)
=> "... so I use the most current scripting edition..." is not "good enough", simply mention all exact Versions about your FCI...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 3:53 pm

And...!?, it shouldn't be that "complicated" to mention 3 Versions about your FCI, I would think...? :o

Alright, "hurry up"...!, I already have 5 Solutions... Everything I "try" works directly, except one "thing", but I still got it to work after some minor-Modif... :P
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 14
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Fri Dec 04, 2020 3:57 pm

Sorry, i am useing Imacros Enterprise v12.6.502.4525 on a windows 10 desktop
also using ms access current version on office 365
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 4:10 pm

Rehabman2020 wrote:
Fri Dec 04, 2020 3:57 pm
Sorry, i am useing

Code: Select all

Imacros Enterprise v12.6.502.4525 on a windows 10 desktop
also using ms access current version on office 365

OK, good, FCI mentioned, even if 'Full'/'Trial' is still missing, but OK, that's "good enough"... :D

Then OK, you didn't post or upload any '.json' Example, so I used the "playlist.json" Example installed with the Demo ('.iim') Macros for iMB v12.6 'Trial'. and all those 5 Implementations work directly to extract the whole Content of the '.json' File which actually "behaves" exactly like a (Local) '.TXT' File.

First 4 using the 'TAG' Mode:

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
TAB T=1
'URL GOTO=file:///C:/Users/JFC/Documents/iMacros/Macros/Demo/playlist.json

'Recorded:
'TAG POS=1 TYPE=PRE ATTR=TXT:[<SP>{<SP>"macro":<SP>"Demo\\FillForm.iim",<SP>"loop":<SP>1*

TAG POS=1 TYPE=* ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=HTML ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=BODY ATTR=TXT:* EXTRACT=TXT
And then using the 'EVENT' Mode:

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
TAB T=1
'URL GOTO=file:///C:/Users/JFC/Documents/iMacros/Macros/Demo/playlist.json

'Recorded:
'EVENT TYPE=CLICK SELECTOR="HTML>BODY>PRE" BUTTON=0
'EVENTS TYPE=KEYPRESS SELECTOR="HTML>BODY" CHARS="aa" MODIFIERS="Ctrl"

'Hum..., not working...!?:
'EVENTS TYPE=KEYPRESS SELECTOR=* CHARS="aa" MODIFIERS=Ctrl

'Working:
EVENT TYPE=KEYPRESS SELECTOR=* CHAR="a" MODIFIERS="Ctrl"
EVENT TYPE=KEYPRESS SELECTOR=* CHAR="c" MODIFIERS="Ctrl"
PROMPT _{{!CLIPBOARD}}_
I'm "not sure" why combining the 'Ctrl^a' + 'Ctrl^c' in just one Statement didn't work, I had to "separate" them...

(Tested on iMacros for FF v8.8.2, PM v26.3.3, Win10_Pro_x64.)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 4:19 pm

Well, meant as 'EDIT', but posting in a new Reply...:

But the 'Select All' + 'Copy' in just one Statement actually works...!, I had a Typo in it, ah-ah...! (=> Was "aa" while it should be "ac"...!) :oops:

Code: Select all

EVENTS TYPE=KEYPRESS SELECTOR=* CHARS="ac" MODIFIERS=Ctrl
PROMPT _{{!CLIPBOARD}}_
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 14
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Fri Dec 04, 2020 4:25 pm

OK, thanks
their url will not cooperate with imacros, have to use the firefox version.

here is what shows up on the page when I call it. if i could get all this text if can decipher the extract in MS Access VBA with no problem

[{"input_index":0,"candidate_index":0,"delivery_line_1":"880 Duke Rd","last_line":"Columbus OH 43213-2477","delivery_point_barcode":"432132477809","components":{"primary_number":"880","street_name":"Duke","street_suffix":"Rd","city_name":"Columbus","default_city_name":"Columbus","state_abbreviation":"OH","zipcode":"43213","plus4_code":"2477","delivery_point":"80","delivery_point_check_digit":"9"},"metadata":{"record_type":"S","zip_type":"Standard","county_fips":"39049","county_name":"Franklin","carrier_route":"C021","congressional_district":"03","rdi":"Residential","elot_sequence":"0241","elot_sort":"D","latitude":39.959460,"longitude":-82.864880,"precision":"Zip9","time_zone":"Eastern","utc_offset":-5,"dst":true},"analysis":{"dpv_match_code":"Y","dpv_footnotes":"AABB","dpv_cmra":"N","dpv_vacant":"N","active":"Y"}}]
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 4:41 pm

Rehabman2020 wrote:
Fri Dec 04, 2020 4:25 pm
OK, thanks
their url will not cooperate with imacros, have to use the firefox version.

here is what shows up on the page when I call it. if i could get all this text if can decipher the extract in MS Access VBA with no problem

Code: Select all

[{"input_index":0,"candidate_index":0,"delivery_line_1":"880 Duke Rd","last_line":"Columbus OH 43213-2477","delivery_point_barcode":"432132477809","components":{"primary_number":"880","street_name":"Duke","street_suffix":"Rd","city_name":"Columbus","default_city_name":"Columbus","state_abbreviation":"OH","zipcode":"43213","plus4_code":"2477","delivery_point":"80","delivery_point_check_digit":"9"},"metadata":{"record_type":"S","zip_type":"Standard","county_fips":"39049","county_name":"Franklin","carrier_route":"C021","congressional_district":"03","rdi":"Residential","elot_sequence":"0241","elot_sort":"D","latitude":39.959460,"longitude":-82.864880,"precision":"Zip9","time_zone":"Eastern","utc_offset":-5,"dst":true},"analysis":{"dpv_match_code":"Y","dpv_footnotes":"AABB","dpv_cmra":"N","dpv_vacant":"N","active":"Y"}}]

Hum..., could be because "New in iMB v12.6", that Version now supports Playlists, and those Playlists are '.json' Files and I reckon, the '.json' File Extension has possibly been associated with the iMacros Browser...

But once you manage to extract the whole Content of the File, you'll be able to re-split the whole Data using 'REGEX', or I prefer 'split()'...

When using FF, if you had to install / have to use iMacros for FF v10.0.2 'PE'/'Free', that Version doesn't support the 'EVENT' Mode, so you'll have to use one of the 4 Implementations using the 'TAG' Mode... :idea:
'TYPE=PRE' will give the "cleanest" Extract, but it might not always work..., but 'TYPE=HTML' or '=BODY' should always work...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 14
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Fri Dec 04, 2020 5:33 pm

I cannot get any of these to work either.

IM-Browser wont cooperate with it.

FF browser will go to the page, display the data, but can figure how to extract it..
is there anyone that can show us how to do this on the smartystreets.com website?.
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 5:59 pm

Rehabman2020 wrote:
Fri Dec 04, 2020 5:33 pm
I cannot get any of these to work either.

IM-Browser wont cooperate with it.

FF browser will go to the page, display the data, but can figure how to extract it..
is there anyone that can show us how to do this on the smartystreets.com website?.

Yeah well, then post a concrete/precise URL, with some Steps to reproduce and what you want (to extract) exactly..., "is there anyone that can show us how to do this on the smartystreets.com website?" is "too vague" for me... :idea:

I see they have a "Demo", can you give the "Steps" to follow from/on this Page, if that's the "correct" one...?
And give some exact/valid Input Data, I am from NL (Europe), I don't know anything about US Postal/Address System... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 14
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Fri Dec 04, 2020 7:01 pm

Here is the url with the sample api

URL GOTO=https://us-street.api.smartystreets.com ... ch=invalid

when you run it, the api returns this in the browser(this is from Chrome):


[{"input_index":0,"candidate_index":0,"delivery_line_1":"880 Duke Rd","last_line":"Columbus OH 43213-2477","delivery_point_barcode":"432132477809","components":{"primary_number":"880","street_name":"Duke","street_suffix":"Rd","city_name":"Columbus","default_city_name":"Columbus","state_abbreviation":"OH","zipcode":"43213","plus4_code":"2477","delivery_point":"80","delivery_point_check_digit":"9"},"metadata":{"record_type":"S","zip_type":"Standard","county_fips":"39049","county_name":"Franklin","carrier_route":"C021","congressional_district":"03","rdi":"Residential","elot_sequence":"0241","elot_sort":"D","latitude":39.959460,"longitude":-82.864880,"precision":"Zip9","time_zone":"Eastern","utc_offset":-5,"dst":true},"analysis":{"dpv_match_code":"Y","dpv_footnotes":"AABB","dpv_cmra":"N","dpv_vacant":"N","active":"Y"}}]


I need to extract everything, (then i can parse out the items i want using vba)

or

collect maybe just collect 1 item for instance at the end of the line is "active": "Y" ,


does this make sense?
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 8:42 pm

Rehabman2020 wrote:
Fri Dec 04, 2020 7:01 pm
Here is the url with the sample api

URL GOTO=https://us-street.api.smartystreets.com ... ch=invalid

when you run it, the api returns this in the browser(this is from Chrome):

Code: Select all

[{"input_index":0,"candidate_index":0,"delivery_line_1":"880 Duke Rd","last_line":"Columbus OH 43213-2477","delivery_point_barcode":"432132477809","components":{"primary_number":"880","street_name":"Duke","street_suffix":"Rd","city_name":"Columbus","default_city_name":"Columbus","state_abbreviation":"OH","zipcode":"43213","plus4_code":"2477","delivery_point":"80","delivery_point_check_digit":"9"},"metadata":{"record_type":"S","zip_type":"Standard","county_fips":"39049","county_name":"Franklin","carrier_route":"C021","congressional_district":"03","rdi":"Residential","elot_sequence":"0241","elot_sort":"D","latitude":39.959460,"longitude":-82.864880,"precision":"Zip9","time_zone":"Eastern","utc_offset":-5,"dst":true},"analysis":{"dpv_match_code":"Y","dpv_footnotes":"AABB","dpv_cmra":"N","dpv_vacant":"N",[b][b]"active":[/b]"Y[/b]"}}]

I need to extract everything, (then i can parse out the items i want using vba)

or

collect maybe just collect 1 item for instance at the end of the line is "active": "Y" ,


does this make sense?

Alright, everything works fine for me, ah-ah...! :P

First I tried to open your URL in PM26, my Default Browser, but hum, I would only get some "Unauthorized (1587658904)" (without the Double Quotes), that I would then be able to extract using all 4 Statements I had posted using the 'TAG' Mode and the 4 different 'TYPE' Values.

Then I opened the URL in CR (v76), and ah...!!, I do get the whole JSON Content..., that I also can extract with all 4 Statements.

I tried also in FF (v55.0.3) and I also get the whole JSON Content, x3 actually, as FF adds some Wrapper and Formatting to very neatly display the "Page" in 3 Tabs: 'JSON' + 'Raw Data' + 'Headers', and all 3 Tabs with some Sub-Menu and different Options/Buttons (Copy/Save/Print/etc...), but again, the 4 Statements all work fine, the one with 'TYPE=PRE' is the "cleanest" as the 3 other ones with 'TYPE=*/HTML/BODY' also extract that extra Formatting.

And then using 'EVAL()', piece of cake to extract the "active" Value with for example:

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
TAB T=1
'URL GOTO=https://us-street.api.smartystreets.com/street-address?auth-id=334384c1-46b9-f327-cd99-992659366164&auth-token=rPhLq2qs2I7pFR625Cy2&candidates=10&street=880%20duke%20rd&city=columbus&state=oh&zipcode=43213&match=invalid

TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
PROMPT Active_Status:<SP>_{{!VAR1}}_
PAUSE

'SET Active_Status EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
'PROMPT Active_Status:<SP>_{{Active_Status}}_
PAUSE

TAG POS=1 TYPE=* ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=HTML ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=BODY ATTR=TXT:* EXTRACT=TXT
=> Will return in the 'PROMPT':

Code: Select all

Active_Status: _Y_
Tested in:

Code: Select all

- iMacros for FF v8.9.7, FF v55.0.3, Win10_x64.
- iMacros for CR v10.1.0 'Free', CR76, Win10_x64.
(I had to use '!VAR1' in CR because I "only" have the 'Free' Version, for which User-Defined Vars are not supported...)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Rehabman2020
Posts: 14
Joined: Fri Dec 04, 2020 1:27 pm

Re: Extract a json page

Post by Rehabman2020 » Fri Dec 04, 2020 9:21 pm

i still cannot get this one to work.
in the "apparently free" fx version, it hangs up on the "TAG POS..." command

in my enterprise edition(VERSION BUILD=12.6.505.4525) it wont even run, just skips the code at the url goto line




VERSION BUILD=8820413 RECORDER=FX
TAB T=1
'URL GOTO=https://us-street.api.smartystreets.com ... ch=invalid

TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
PROMPT Active_Status:<SP>_{{!VAR1}}_
PAUSE

'SET Active_Status EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
'PROMPT Active_Status:<SP>_{{Active_Status}}_
PAUSE
chivracq
Posts: 9683
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a json page

Post by chivracq » Fri Dec 04, 2020 9:25 pm

Rehabman2020 wrote:
Fri Dec 04, 2020 9:21 pm
i still cannot get this one to work.
in the "apparently free" fx version, it hangs up on the "TAG POS..." command

in my enterprise edition(VERSION BUILD=12.6.505.4525) it wont even run, just skips the code at the url goto line

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
TAB T=1
'URL GOTO=https://us-street.api.smartystreets.com/street-address?auth-id=334384c1-46b9-f327-cd99-992659366164&auth-token=rPhLq2qs2I7pFR625Cy2&candidates=10&street=880%20duke%20rd&city=columbus&state=oh&zipcode=43213&match=invalid

TAG POS=1 TYPE=PRE ATTR=TXT:* EXTRACT=TXT
SET !VAR1 EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
PROMPT Active_Status:<SP>_{{!VAR1}}_
PAUSE

'SET Active_Status EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('\"active\":'); y=x[1]; z=y.split('\"'); z[1];")
'PROMPT Active_Status:<SP>_{{Active_Status}}_
PAUSE

Yeah well, the 'URL GOTO' Line in my Script(s) is always commented out, you need to reactivate it, or to load the Page "manually", ah-ah...! :wink:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Post Reply