SAVE PDF - Download not finished

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

SAVE PDF - Download not finished

by COSMOS on Wed Aug 31, 2016 11:12 pm

Hey guys,

I am currently using IMacros V11.1.495.5175, Firefox 48, Windows 7 German.

I have a problem with a website where I am trying to download a pdf and the saved file is not completed.
Instead of a pdf file I get a file which has the extension "Datei" and the size it's lower than what it should be (eg. 200kb downloaded out of 344kb).

My code is:

var xpath = "//ul[@class='techRefContentList']/li[" + i + "]/div/a"; //I am using xpath to find the <a> tag of the pdf
var pdf_extract_macro = "CODE: ONDOWNLOAD FOLDER=* FILE=myfile.pdf WAIT=YES" + "\n";//I set the default download folder because I know there are some issues with it if I try to put a preferential folder (pls tell me otherwise..I really need a preferential folder)
pdf_extract_macro += "TAG XPATH=" + '"' + xpath + '"' + " CONTENT=EVENT:SAVEITEM" + "\n";//saving the pdf when the download begins
iimPlay(pdf_extract_macro);

When I look in the download folder, I find a "Datei" file that I can't use so I'm guessing the download stops mid way.
This code is being run on two other websites and the pdf is downloaded fine on those.
On this particular site the download link is not in the href but in an onaction property. I don't know if that is the reason why the download is stopping.

What could I do to make it wait until it's finished ?

WAIT #ONDOWNLOADCOMPLETE# it's not supported
I was thinking of using !DOWNLOADED_SIZE to check if the download is finished but I don't know how to get the full size of the pdf to compare them.

I'm kind of stuck right now.

My solution was to set the browser to save the pdf in the download folder when the download begins and just TAG the <a>.
This works but I can't set the files name and it's a pain to go trough all the pdfs to find the one I'm looking for.

Any help is greatly appreciated guys.

Thank you !
COSMOS
 
Posts: 20
Joined: Wed Jul 27, 2016 10:11 pm

Re: SAVE PDF - Download not finished

by iimfun on Thu Sep 01, 2016 12:27 am

Perhaps you should try the SAVETARGETAS event ?
iimfun
 
Posts: 239
Joined: Tue Jul 19, 2016 6:06 am

Re: SAVE PDF - Download not finished

by COSMOS on Thu Sep 01, 2016 12:50 am

I have tried with SAVEITEM and SAVETARGETAS. Both do the same thing and I end up with an incomplete file.
COSMOS
 
Posts: 20
Joined: Wed Jul 27, 2016 10:11 pm

Re: SAVE PDF - Download not finished

by chivracq on Thu Sep 01, 2016 8:40 am

COSMOS wrote:Hey guys,

I am currently using
Code: Select all
IMacros V11.1.495.5175, Firefox 48, Windows 7 German.


I have a problem with a website where I am trying to download a pdf and the saved file is not completed.
Instead of a pdf file I get a file which has the extension "Datei" and the size it's lower than what it should be (eg. 200kb downloaded out of 344kb).

My code is:

Code: Select all
var xpath = "//ul[@class='techRefContentList']/li[" + i + "]/div/a"; //I am using xpath to find the <a> tag of the pdf
var pdf_extract_macro = "CODE: ONDOWNLOAD FOLDER=* FILE=myfile.pdf WAIT=YES" + "\n";//I set the default download folder because I know there are some issues with it if I try to put a preferential folder (pls tell me otherwise..I really need a preferential folder)
pdf_extract_macro += "TAG XPATH=" + '"' + xpath + '"' + " CONTENT=EVENT:SAVEITEM" + "\n";//saving the pdf when the download begins
iimPlay(pdf_extract_macro);


When I look in the download folder, I find a "Datei" file that I can't use so I'm guessing the download stops mid way.
This code is being run on two other websites and the pdf is downloaded fine on those.
On this particular site the download link is not in the href but in an onaction property. I don't know if that is the reason why the download is stopping.

What could I do to make it wait until it's finished ?

WAIT #ONDOWNLOADCOMPLETE# it's not supported
I was thinking of using !DOWNLOADED_SIZE to check if the download is finished but I don't know how to get the full size of the pdf to compare them.

I'm kind of stuck right now.

My solution was to set the browser to save the pdf in the download folder when the download begins and just TAG the <a>.
This works but I can't set the files name and it's a pain to go trough all the pdfs to find the one I'm looking for.

Any help is greatly appreciated guys.

Thank you !

Code: Select all
I have a problem with a website...

Sorry, mention the Site or explain why you don't post the URL, I don't dig in such Threads anymore, sorry but I didn't read further, but good luck with @iimfun, you're lucky, he's good...! :roll:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: SAVE PDF - Download not finished

by COSMOS on Thu Sep 01, 2016 10:26 pm

I didn't post the site because I didn't think it's relevant.
I made this thread in case someone else has had a similar problem.
I don't expect people to do the work for me..that's my job :P...I only need an advice.
The website is http://de.rs-online.com/
You can visit any article page and inspect the source code for the pdf link.
I'm interested to know if it's a WAIT SECONDS= //more than what I have at the moment..I tried with 30 sec and same thing..or maybe it's something else. This has happened to me on another site when I was trying to save a pdf and I used png instead by mystake.
This is not the case now. I have tried everything and I get the same result.
What strikes me is that this code works on 2 other sites that respond the same way as this one when clicking the pdf download link.
COSMOS
 
Posts: 20
Joined: Wed Jul 27, 2016 10:11 pm

Re: SAVE PDF - Download not finished

by chivracq on Fri Sep 02, 2016 7:13 am

COSMOS wrote:I didn't post the site because I didn't think it's relevant.
I made this thread in case someone else has had a similar problem.
I don't expect people to do the work for me..that's my job :P...I only need an advice.
The website is http://de.rs-online.com/
You can visit any article page and inspect the source code for the pdf link.
I'm interested to know if it's a WAIT SECONDS= //more than what I have at the moment..I tried with 30 sec and same thing..or maybe it's something else. This has happened to me on another site when I was trying to save a pdf and I used png instead by mystake.
This is not the case now. I have tried everything and I get the same result.
What strikes me is that this code works on 2 other sites that respond the same way as this one when clicking the pdf download link.

Mentioning the Site/URL is always relevant if we want to do any Testing... And hum, I can't find any PDF Download, but never mind...

But hum, from reading your OP, isn't '!DOWNLOADED_SIZE' what you are looking for...?
Ah hum..., but I guess you won't know in advance the exact Size of the File you want to download, so that won't be very useful, I'm afraid...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: SAVE PDF - Download not finished

by COSMOS on Mon Sep 05, 2016 3:00 am

chivracq wrote:Mentioning the Site/URL is always relevant if we want to do any Testing... And hum, I can't find any PDF Download, but never mind...
But hum, from reading your OP, isn't '!DOWNLOADED_SIZE' what you are looking for...?
Ah hum..., but I guess you won't know in advance the exact Size of the File you want to download, so that won't be very useful, I'm afraid...

Exactly ! I'm glad we are on the same page. (literraly :P)
I used the DOWNLOADED_SIZE for extracting the rohs because I was sometimes getting size 0, and that worked like a charm.
But in this case, indeed I don't have anything to compare the downloaded size with.
I'm sure there must be a logical explanation for why does it stop mid download and what I could do to prevent it.
COSMOS
 
Posts: 20
Joined: Wed Jul 27, 2016 10:11 pm

Re: SAVE PDF - Download not finished

by chivracq on Mon Sep 05, 2016 3:51 pm

COSMOS wrote:
chivracq wrote:Mentioning the Site/URL is always relevant if we want to do any Testing... And hum, I can't find any PDF Download, but never mind...
But hum, from reading your OP, isn't '!DOWNLOADED_SIZE' what you are looking for...?
Ah hum..., but I guess you won't know in advance the exact Size of the File you want to download, so that won't be very useful, I'm afraid...

Exactly ! I'm glad we are on the same page. (literraly :P)
I used the DOWNLOADED_SIZE for extracting the rohs because I was sometimes getting size 0, and that worked like a charm.
But in this case, indeed I don't have anything to compare the downloaded size with.
I'm sure there must be a logical explanation for why does it stop mid download and what I could do to prevent it.

Yep-yep-yep...! And hum..., there is no Info mentioned on the Page about the (approx) Size of the File to download that could help you?, either in plain Text on the Page or in some 'ALT/TITLE' Tag that you could extract and then compare with the Value in '!DOWNLOADED_SIZE'?, and /or if that Info is not available from the Site, ask the Site Admin/Owner to provide it, this is common Practice anyway to provide that Info and/or the 'CHECKSUM', and/or if you expect let's say that your PDF usually are between 150-200Kb, the "Margin" for faulty Downloads would already shrink a lot.

Maybe there is a way to use the 'CHECKSUM' Parameter from the 'ONDOWNLOAD' Command...

Another Method you could use would be to do the Download twice (in 2 different Locations) and to compare the 2 '!DOWNLOADED_SIZE' of the 2 Downloads.

Another Method could be, using some Browser Plugin/Add-on in FF to try to open your just downloaded PDF in a second TAB and try extracting some Content, if the File didn't download correctly, the Open File in TAB_2 and/or the Extract won't work.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: SAVE PDF - Download not finished

by COSMOS on Wed Sep 07, 2016 6:02 am

The only method that I found to work was just to tag that pdf specific <a> tag and to set the browser to save files to the downloads folder.
The only problem in doing this is that I can't set the file name even if I do have ONDOWNLOAD setup. It saves a weird name like 090908870 and I can't know from which article it came from. Is there a way to do this ?

Here is an example:

This is one of the links that I'm trying to download the pdf from: http://docs-europe.electrocomponents.co ... 02e7ef.pdf
If anyone manages to save this pdf in the imacros downloads folder and manages to change it's name, please enlighten me as to how you managed to do that.

Thank you !
COSMOS
 
Posts: 20
Joined: Wed Jul 27, 2016 10:11 pm


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: Bing [Bot] and 4 guests

-->