Extract a URL from a link and then download PDF it links to?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
iMacrosScripter
Posts: 8
Joined: Thu Apr 27, 2017 3:10 pm

Extract a URL from a link and then download PDF it links to?

Post by iMacrosScripter » Fri May 05, 2017 6:23 pm

iMacros: VERSION BUILD=8970419
OS: Windows 7
Browser: Firefox 47
Demos Work: Yes
VBS Scripting: Yes
URL: N/A
IMacros work on other version: N/A

I am trying to extract a URL from a link, and then tell iMacros to "download"/"save target as" the URL it extracts, and download the file from that URL.
Specifically, I am trying to retrieve the second link (the PDF file), named "*":
Image

Specifically, when I use the "record" feature in iMacros and click on the link it retrieves the link as:

Code: Select all

TAG POS=1 TYPE=A ATTR=ID:URL$1
TAB T=2
(Where the second line has the PDF opening in the second tab as an opened PDF file)

But when I mouse over the link it retrieves the "name" of the link as: "URLID" and the actual URL as: "https://*/*.PDF".

Using the code:

Code: Select all

TAG POS=1 TYPE=A ATTR=ID:URL$1 EXTRACT=HREF
Allows me to extract the actual URL as: "https://*/*.PDF"

But what code do I use to tell iMacros to:
1) take that actual URL it extracts and
2) use that actual URL it extracts, and then "download"/"save target as" that actual URL, so that the PDF that actual URL links to (since the actual URL ends in ".PDF") actually downloads the PDF file?

EDIT 1: Updated to edit out some extraneous information.
EDIT 2: Chivracq, I updated my original post to remove some extraneous information if you would please edit your reply to requote my updated edited post to reflect the extraneous information removed I would appreciate. Once your reply is edited to requote my updated edited post; I can then post my solution to this post that I was able to solve thanks to you, so others can benefit.
Last edited by iMacrosScripter on Wed Feb 28, 2018 12:25 pm, edited 5 times in total.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a URL from a link and then download PDF it links

Post by chivracq » Fri May 05, 2017 8:50 pm

iMacroScripter wrote:

Code: Select all

iMacros: VERSION BUILD=8970419
OS: Windows 7
Browser: Firefox 47
Demos Work: Yes
VBS Scripting: Yes
URL: N/A
IMacros work on other version: N/A
I am trying to extract a URL from a link, and then tell iMacros to "download"/"save target as" the URL it extracts, and download the file from that URL.
Specifically, I am trying to retrieve the second link (the PDF file), named "*":
Image

Specifically, when I use the "record" feature in iMacros and click on the link it retrieves the link as:

Code: Select all

TAG POS=1 TYPE=A ATTR=ID:URL$1
TAB T=2
(Where the second line has the PDF opening in the second tab as an opened PDF file)

But when I mouse over the link it retrieves the "name" of the link as: "URLID" and the actual URL as: "https://*/*.PDF".

Using the code:

Code: Select all

TAG POS=1 TYPE=A ATTR=ID:URL$1 EXTRACT=HREF
Allows me to extract the actual URL as: "https://*/*.PDF"

But what code do I use to tell iMacros to:
1) take that actual URL it extracts and
2) use that actual URL it extracts, and then "download"/"save target as" that actual URL, so that the PDF that actual URL links to (since the actual URL ends in ".PDF") actually downloads the PDF file?
iMacroScripter wrote:EDIT 1: Updated to edit out some extraneous information.
EDIT 2: Chivracq, I updated my original post to remove some extraneous information if you would please edit your reply to requote my updated edited post to reflect the extraneous information removed I would appreciate. Once your reply is edited to requote my updated edited post; I can then post my solution to this post that I was able to solve thanks to you, so others can benefit.
EDIT 2018-03-01: Quote(s) adapted to reflect EDITs in OP... (To remove all URL's from Text and Screenshot...)

>>>

Original Reply:

I would think the 'SAVETARGETAS' 'EVENT' Parameter is what you want..., try this:

Code: Select all

ONDOWNLOAD FOLDER=* FILE=* WAIT=YES
TAG POS=1 TYPE=A ATTR=ID:URL$1 CONTENT=EVENT:SAVETARGETAS
Another Solution could be to use the 'SAVEITEM' Command in your 2nd Tab:

Code: Select all

TAG POS=1 TYPE=A ATTR=ID:URL$1
TAB T=2
ONDOWNLOAD FOLDER=* FILE=* WAIT=YES
SAVEITEM
WAIT SECONDS=1
>

(No need to open Duplicate Threads btw when you want to start a Thread, but I guess this was an involuntary Mistake... (I've deleted your Duplicate...))
Last edited by chivracq on Thu Mar 01, 2018 6:44 pm, edited 1 time in total.
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
iMacrosScripter
Posts: 8
Joined: Thu Apr 27, 2017 3:10 pm

Re: Extract a URL from a link and then download PDF it links

Post by iMacrosScripter » Thu Mar 01, 2018 2:02 pm

EDIT: Chivracq, Please edit your above reply to requote my updated edited original post as I have removed the extraneous information that needed to be removed.

The solution that solved this problem is listed below with thanks to Chivracq.

The insight here was to realize that Firefox will not populate a "download save to" window if the preference in Options>Applications is not set to "Always Ask" as that is what caused the PDF from not downloading and instead opening the PDF file in Firefox directly:
Firefox Options>Applications
Image

Script

Code: Select all

VERSION BUILD=8970419 RECORDER=FX
TAB T=1
URL GOTO=*URL*
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=1 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"

FRAME F=0
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=2 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"
WAIT SECONDS=1
TAG POS=1 TYPE=A ATTR=ID:*LinkName1*
WAIT SECONDS=2
TAG POS=1 TYPE=A ATTR=ID:*SubLinkName1*
WAIT SECONDS=2
ONDOWNLOAD FOLDER=* FILE=*filename* WAIT=YES
TAG POS=1 TYPE=A ATTR=ID:URL$1
WAIT SECONDS=2
Last edited by iMacrosScripter on Thu Mar 01, 2018 6:57 pm, edited 1 time in total.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract a URL from a link and then download PDF it links

Post by chivracq » Thu Mar 01, 2018 6:53 pm

iMacroScripter wrote:EDIT: Chivracq, Please edit your above reply to requote my updated edited original post as I have removed the extraneous information that needed to be removed.
Yep, done..., even if that was a bit of a hassle..., and a bit useless in my Opinion as you keep posting your Screenshots on some external Image Hosting Server where your original Screenshot can still be found + about 300 Backups if they make 1 Backup per day, ah-ah...! :roll:
And your Data was "safe", anybody trying to access your '.PDF' Report would first need to log in to that PeopleSoft Server...
iMacroScripter wrote:The solution that solved this problem is listed below with thanks to Chivracq.

The insight here was to realize that Firefox will not populate a "download save to" window if the preference in Options>Applications is not set to "Always Ask" as that is what caused the PDF from not downloading and instead opening in Firefox directly:
Firefox Options>Applications
Image

Script

Code: Select all

VERSION BUILD=8970419 RECORDER=FX
TAB T=1
URL GOTO=*URL*
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=1 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"

FRAME F=0
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=2 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"
WAIT SECONDS=1
TAG POS=1 TYPE=A ATTR=ID:*LinkName1*
WAIT SECONDS=2
TAG POS=1 TYPE=A ATTR=ID:*SubLinkName1*
WAIT SECONDS=2
ONDOWNLOAD FOLDER=* FILE=*filename* WAIT=YES
TAG POS=1 TYPE=A ATTR=ID:URL$1
WAIT SECONDS=2
OK, good, and Thanks for sharing the Solution and your final Script (10 months later, ah-ah...!), and yep, the "Always ask" is indeed mentioned in the FAQ... :D
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply