Extract a URL from a link and then download PDF it links to?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extract a URL from a link and then download PDF it links to?

by iMacrosScripter on Fri May 05, 2017 11:23 am

iMacros: VERSION BUILD=8970419
OS: Windows 7
Browser: Firefox 47
Demos Work: Yes
VBS Scripting: Yes
URL: N/A
IMacros work on other version: N/A

I am trying to extract a URL from a link, and then tell iMacros to "download"/"save target as" the URL it extracts, and download the file from that URL.
Specifically, I am trying to retrieve the second link (the PDF file), named "*":
Image

Specifically, when I use the "record" feature in iMacros and click on the link it retrieves the link as:
Code: Select all
TAG POS=1 TYPE=A ATTR=ID:URL$1
TAB T=2

(Where the second line has the PDF opening in the second tab as an opened PDF file)

But when I mouse over the link it retrieves the "name" of the link as: "URLID" and the actual URL as: "https://*/*.PDF".

Using the code:
Code: Select all
TAG POS=1 TYPE=A ATTR=ID:URL$1 EXTRACT=HREF


Allows me to extract the actual URL as: "https://*/*.PDF"

But what code do I use to tell iMacros to:
1) take that actual URL it extracts and
2) use that actual URL it extracts, and then "download"/"save target as" that actual URL, so that the PDF that actual URL links to (since the actual URL ends in ".PDF") actually downloads the PDF file?

EDIT 1: Updated to edit out some extraneous information.
EDIT 2: Chivracq, I updated my original post to remove some extraneous information if you would please edit your reply to requote my updated edited post to reflect the extraneous information removed I would appreciate. Once your reply is edited to requote my updated edited post; I can then post my solution to this post that I was able to solve thanks to you, so others can benefit.
Last edited by iMacrosScripter on Wed Feb 28, 2018 5:25 am, edited 5 times in total.
iMacrosScripter
 
Posts: 8
Joined: Thu Apr 27, 2017 8:10 am

Re: Extract a URL from a link and then download PDF it links

by chivracq on Fri May 05, 2017 1:50 pm

iMacroScripter wrote:
Code: Select all
iMacros: VERSION BUILD=8970419
OS: Windows 7
Browser: Firefox 47
Demos Work: Yes
VBS Scripting: Yes
URL: N/A
IMacros work on other version: N/A


I am trying to extract a URL from a link, and then tell iMacros to "download"/"save target as" the URL it extracts, and download the file from that URL.
Specifically, I am trying to retrieve the second link (the PDF file), named "*":
Image

Specifically, when I use the "record" feature in iMacros and click on the link it retrieves the link as:
Code: Select all
TAG POS=1 TYPE=A ATTR=ID:URL$1
TAB T=2

(Where the second line has the PDF opening in the second tab as an opened PDF file)

But when I mouse over the link it retrieves the "name" of the link as: "URLID" and the actual URL as: "https://*/*.PDF".

Using the code:
Code: Select all
TAG POS=1 TYPE=A ATTR=ID:URL$1 EXTRACT=HREF


Allows me to extract the actual URL as: "https://*/*.PDF"

But what code do I use to tell iMacros to:
1) take that actual URL it extracts and
2) use that actual URL it extracts, and then "download"/"save target as" that actual URL, so that the PDF that actual URL links to (since the actual URL ends in ".PDF") actually downloads the PDF file?

iMacroScripter wrote:EDIT 1: Updated to edit out some extraneous information.
EDIT 2: Chivracq, I updated my original post to remove some extraneous information if you would please edit your reply to requote my updated edited post to reflect the extraneous information removed I would appreciate. Once your reply is edited to requote my updated edited post; I can then post my solution to this post that I was able to solve thanks to you, so others can benefit.

EDIT 2018-03-01: Quote(s) adapted to reflect EDITs in OP... (To remove all URL's from Text and Screenshot...)

>>>

Original Reply:

I would think the 'SAVETARGETAS' 'EVENT' Parameter is what you want..., try this:
Code: Select all
ONDOWNLOAD FOLDER=* FILE=* WAIT=YES
TAG POS=1 TYPE=A ATTR=ID:URL$1 CONTENT=EVENT:SAVETARGETAS


Another Solution could be to use the 'SAVEITEM' Command in your 2nd Tab:
Code: Select all
TAG POS=1 TYPE=A ATTR=ID:URL$1
TAB T=2
ONDOWNLOAD FOLDER=* FILE=* WAIT=YES
SAVEITEM
WAIT SECONDS=1


>

(No need to open Duplicate Threads btw when you want to start a Thread, but I guess this was an involuntary Mistake... (I've deleted your Duplicate...))
Last edited by chivracq on Thu Mar 01, 2018 11:44 am, edited 1 time in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6953
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract a URL from a link and then download PDF it links

by iMacrosScripter on Thu Mar 01, 2018 7:02 am

EDIT: Chivracq, Please edit your above reply to requote my updated edited original post as I have removed the extraneous information that needed to be removed.

The solution that solved this problem is listed below with thanks to Chivracq.

The insight here was to realize that Firefox will not populate a "download save to" window if the preference in Options>Applications is not set to "Always Ask" as that is what caused the PDF from not downloading and instead opening the PDF file in Firefox directly:
Firefox Options>Applications
Image

Script
Code: Select all
VERSION BUILD=8970419 RECORDER=FX
TAB T=1
URL GOTO=*URL*
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=1 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"

FRAME F=0
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=2 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"
WAIT SECONDS=1
TAG POS=1 TYPE=A ATTR=ID:*LinkName1*
WAIT SECONDS=2
TAG POS=1 TYPE=A ATTR=ID:*SubLinkName1*
WAIT SECONDS=2
ONDOWNLOAD FOLDER=* FILE=*filename* WAIT=YES
TAG POS=1 TYPE=A ATTR=ID:URL$1
WAIT SECONDS=2
Last edited by iMacrosScripter on Thu Mar 01, 2018 11:57 am, edited 1 time in total.
iMacrosScripter
 
Posts: 8
Joined: Thu Apr 27, 2017 8:10 am

Re: Extract a URL from a link and then download PDF it links

by chivracq on Thu Mar 01, 2018 11:53 am

iMacroScripter wrote:EDIT: Chivracq, Please edit your above reply to requote my updated edited original post as I have removed the extraneous information that needed to be removed.

Yep, done..., even if that was a bit of a hassle..., and a bit useless in my Opinion as you keep posting your Screenshots on some external Image Hosting Server where your original Screenshot can still be found + about 300 Backups if they make 1 Backup per day, ah-ah...! :roll:
And your Data was "safe", anybody trying to access your '.PDF' Report would first need to log in to that PeopleSoft Server...

iMacroScripter wrote:The solution that solved this problem is listed below with thanks to Chivracq.

The insight here was to realize that Firefox will not populate a "download save to" window if the preference in Options>Applications is not set to "Always Ask" as that is what caused the PDF from not downloading and instead opening in Firefox directly:
Firefox Options>Applications
Image

Script
Code: Select all
VERSION BUILD=8970419 RECORDER=FX
TAB T=1
URL GOTO=*URL*
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=1 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"

FRAME F=0
TAG POS=1 TYPE=A ATTR=ID:*FOLDER1*
TAG POS=2 TYPE=A ATTR=TXT:*SUBFOLDER1*
FRAME NAME="*FrameName*"
WAIT SECONDS=1
TAG POS=1 TYPE=A ATTR=ID:*LinkName1*
WAIT SECONDS=2
TAG POS=1 TYPE=A ATTR=ID:*SubLinkName1*
WAIT SECONDS=2
ONDOWNLOAD FOLDER=* FILE=*filename* WAIT=YES
TAG POS=1 TYPE=A ATTR=ID:URL$1
WAIT SECONDS=2

OK, good, and Thanks for sharing the Solution and your final Script (10 months later, ah-ah...!), and yep, the "Always ask" is indeed mentioned in the FAQ... :D
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6953
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 3 guests

-->