SAVE PDF - Download not finished

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
COSMOS
Posts: 20
Joined: Thu Jul 28, 2016 5:11 am

SAVE PDF - Download not finished

Post by COSMOS » Thu Sep 01, 2016 6:12 am

Hey guys,

I am currently using IMacros V11.1.495.5175, Firefox 48, Windows 7 German.

I have a problem with a website where I am trying to download a pdf and the saved file is not completed.
Instead of a pdf file I get a file which has the extension "Datei" and the size it's lower than what it should be (eg. 200kb downloaded out of 344kb).

My code is:

var xpath = "//ul[@class='techRefContentList']/li[" + i + "]/div/a"; //I am using xpath to find the <a> tag of the pdf
var pdf_extract_macro = "CODE: ONDOWNLOAD FOLDER=* FILE=myfile.pdf WAIT=YES" + "\n";//I set the default download folder because I know there are some issues with it if I try to put a preferential folder (pls tell me otherwise..I really need a preferential folder)
pdf_extract_macro += "TAG XPATH=" + '"' + xpath + '"' + " CONTENT=EVENT:SAVEITEM" + "\n";//saving the pdf when the download begins
iimPlay(pdf_extract_macro);

When I look in the download folder, I find a "Datei" file that I can't use so I'm guessing the download stops mid way.
This code is being run on two other websites and the pdf is downloaded fine on those.
On this particular site the download link is not in the href but in an onaction property. I don't know if that is the reason why the download is stopping.

What could I do to make it wait until it's finished ?

WAIT #ONDOWNLOADCOMPLETE# it's not supported
I was thinking of using !DOWNLOADED_SIZE to check if the download is finished but I don't know how to get the full size of the pdf to compare them.

I'm kind of stuck right now.

My solution was to set the browser to save the pdf in the download folder when the download begins and just TAG the <a>.
This works but I can't set the files name and it's a pain to go trough all the pdfs to find the one I'm looking for.

Any help is greatly appreciated guys.

Thank you !
iimfun
Posts: 239
Joined: Tue Jul 19, 2016 1:06 pm

Re: SAVE PDF - Download not finished

Post by iimfun » Thu Sep 01, 2016 7:27 am

Perhaps you should try the SAVETARGETAS event ?
COSMOS
Posts: 20
Joined: Thu Jul 28, 2016 5:11 am

Re: SAVE PDF - Download not finished

Post by COSMOS » Thu Sep 01, 2016 7:50 am

I have tried with SAVEITEM and SAVETARGETAS. Both do the same thing and I end up with an incomplete file.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: SAVE PDF - Download not finished

Post by chivracq » Thu Sep 01, 2016 3:40 pm

COSMOS wrote:Hey guys,

I am currently using

Code: Select all

IMacros V11.1.495.5175, Firefox 48, Windows 7 German.
I have a problem with a website where I am trying to download a pdf and the saved file is not completed.
Instead of a pdf file I get a file which has the extension "Datei" and the size it's lower than what it should be (eg. 200kb downloaded out of 344kb).

My code is:

Code: Select all

var xpath = "//ul[@class='techRefContentList']/li[" + i + "]/div/a"; //I am using xpath to find the <a> tag of the pdf 
var pdf_extract_macro = "CODE: ONDOWNLOAD FOLDER=* FILE=myfile.pdf WAIT=YES" + "\n";//I set the default download folder because I know there are some issues with it if I try to put a preferential folder (pls tell me otherwise..I really need a preferential folder)
pdf_extract_macro += "TAG XPATH=" + '"' + xpath + '"' + " CONTENT=EVENT:SAVEITEM" + "\n";//saving the pdf when the download begins
iimPlay(pdf_extract_macro);
When I look in the download folder, I find a "Datei" file that I can't use so I'm guessing the download stops mid way.
This code is being run on two other websites and the pdf is downloaded fine on those.
On this particular site the download link is not in the href but in an onaction property. I don't know if that is the reason why the download is stopping.

What could I do to make it wait until it's finished ?

WAIT #ONDOWNLOADCOMPLETE# it's not supported
I was thinking of using !DOWNLOADED_SIZE to check if the download is finished but I don't know how to get the full size of the pdf to compare them.

I'm kind of stuck right now.

My solution was to set the browser to save the pdf in the download folder when the download begins and just TAG the <a>.
This works but I can't set the files name and it's a pain to go trough all the pdfs to find the one I'm looking for.

Any help is greatly appreciated guys.

Thank you !

Code: Select all

I have a problem with a website...
Sorry, mention the Site or explain why you don't post the URL, I don't dig in such Threads anymore, sorry but I didn't read further, but good luck with @iimfun, you're lucky, he's good...! :roll:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
COSMOS
Posts: 20
Joined: Thu Jul 28, 2016 5:11 am

Re: SAVE PDF - Download not finished

Post by COSMOS » Fri Sep 02, 2016 5:26 am

I didn't post the site because I didn't think it's relevant.
I made this thread in case someone else has had a similar problem.
I don't expect people to do the work for me..that's my job :P...I only need an advice.
The website is http://de.rs-online.com/
You can visit any article page and inspect the source code for the pdf link.
I'm interested to know if it's a WAIT SECONDS= //more than what I have at the moment..I tried with 30 sec and same thing..or maybe it's something else. This has happened to me on another site when I was trying to save a pdf and I used png instead by mystake.
This is not the case now. I have tried everything and I get the same result.
What strikes me is that this code works on 2 other sites that respond the same way as this one when clicking the pdf download link.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: SAVE PDF - Download not finished

Post by chivracq » Fri Sep 02, 2016 2:13 pm

COSMOS wrote:I didn't post the site because I didn't think it's relevant.
I made this thread in case someone else has had a similar problem.
I don't expect people to do the work for me..that's my job :P...I only need an advice.
The website is http://de.rs-online.com/
You can visit any article page and inspect the source code for the pdf link.
I'm interested to know if it's a WAIT SECONDS= //more than what I have at the moment..I tried with 30 sec and same thing..or maybe it's something else. This has happened to me on another site when I was trying to save a pdf and I used png instead by mystake.
This is not the case now. I have tried everything and I get the same result.
What strikes me is that this code works on 2 other sites that respond the same way as this one when clicking the pdf download link.
Mentioning the Site/URL is always relevant if we want to do any Testing... And hum, I can't find any PDF Download, but never mind...

But hum, from reading your OP, isn't '!DOWNLOADED_SIZE' what you are looking for...?
Ah hum..., but I guess you won't know in advance the exact Size of the File you want to download, so that won't be very useful, I'm afraid...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
COSMOS
Posts: 20
Joined: Thu Jul 28, 2016 5:11 am

Re: SAVE PDF - Download not finished

Post by COSMOS » Mon Sep 05, 2016 10:00 am

chivracq wrote: Mentioning the Site/URL is always relevant if we want to do any Testing... And hum, I can't find any PDF Download, but never mind...
But hum, from reading your OP, isn't '!DOWNLOADED_SIZE' what you are looking for...?
Ah hum..., but I guess you won't know in advance the exact Size of the File you want to download, so that won't be very useful, I'm afraid...
Exactly ! I'm glad we are on the same page. (literraly :P)
I used the DOWNLOADED_SIZE for extracting the rohs because I was sometimes getting size 0, and that worked like a charm.
But in this case, indeed I don't have anything to compare the downloaded size with.
I'm sure there must be a logical explanation for why does it stop mid download and what I could do to prevent it.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: SAVE PDF - Download not finished

Post by chivracq » Mon Sep 05, 2016 10:51 pm

COSMOS wrote:
chivracq wrote: Mentioning the Site/URL is always relevant if we want to do any Testing... And hum, I can't find any PDF Download, but never mind...
But hum, from reading your OP, isn't '!DOWNLOADED_SIZE' what you are looking for...?
Ah hum..., but I guess you won't know in advance the exact Size of the File you want to download, so that won't be very useful, I'm afraid...
Exactly ! I'm glad we are on the same page. (literraly :P)
I used the DOWNLOADED_SIZE for extracting the rohs because I was sometimes getting size 0, and that worked like a charm.
But in this case, indeed I don't have anything to compare the downloaded size with.
I'm sure there must be a logical explanation for why does it stop mid download and what I could do to prevent it.
Yep-yep-yep...! And hum..., there is no Info mentioned on the Page about the (approx) Size of the File to download that could help you?, either in plain Text on the Page or in some 'ALT/TITLE' Tag that you could extract and then compare with the Value in '!DOWNLOADED_SIZE'?, and /or if that Info is not available from the Site, ask the Site Admin/Owner to provide it, this is common Practice anyway to provide that Info and/or the 'CHECKSUM', and/or if you expect let's say that your PDF usually are between 150-200Kb, the "Margin" for faulty Downloads would already shrink a lot.

Maybe there is a way to use the 'CHECKSUM' Parameter from the 'ONDOWNLOAD' Command...

Another Method you could use would be to do the Download twice (in 2 different Locations) and to compare the 2 '!DOWNLOADED_SIZE' of the 2 Downloads.

Another Method could be, using some Browser Plugin/Add-on in FF to try to open your just downloaded PDF in a second TAB and try extracting some Content, if the File didn't download correctly, the Open File in TAB_2 and/or the Extract won't work.
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
COSMOS
Posts: 20
Joined: Thu Jul 28, 2016 5:11 am

Re: SAVE PDF - Download not finished

Post by COSMOS » Wed Sep 07, 2016 1:02 pm

The only method that I found to work was just to tag that pdf specific <a> tag and to set the browser to save files to the downloads folder.
The only problem in doing this is that I can't set the file name even if I do have ONDOWNLOAD setup. It saves a weird name like 090908870 and I can't know from which article it came from. Is there a way to do this ?

Here is an example:

This is one of the links that I'm trying to download the pdf from: http://docs-europe.electrocomponents.co ... 02e7ef.pdf
If anyone manages to save this pdf in the imacros downloads folder and manages to change it's name, please enlighten me as to how you managed to do that.

Thank you !
Post Reply