Download pdf/audio/video from page, click next page do same

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Download pdf/audio/video from page, click next page do same

by Program on Wed Jan 10, 2018 12:04 pm

Dear friends, I hope I can find some help, is my first time with imacros (I know algorithms, but I've never used imacros).

I am a participant in a website with videos and study material in pdf, and I want to download them so I can watch and study offline.

So, I'm logged into the page, and I'm going to the course I want to download, link:
https: //www.****.com/course/cod/sc%6CTU1EaL7Z%9W/v/OQlrb2uhlnS%9W/c/KvcHnMerTdz%9W

I tried, with Chrome Imacros recording function of imacros to save each action and repeating it to do the operation in next pages, I'll attach the script so you understand.

In the page, I download the video, I use the Adobe HDS / HLS Video Saver addon (withou imacros i can download any m3u8 with this addon), the link to open the addon is:
chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html

Is possible record actions of a another addon? even if this addon is just a new tab with specifc functions?
In this link, the m3u8 file is brought by clicking on the link inside the addon, the addon joins the parts and the download of a unified file of the video.
Then I click clear and close the tab.

I go back to the first page, and I click to download the pdf booklet and then click to download the pdf slide and after downlaod the audiofile.

Then I click on the next page, where there same structure page with a new video and other materials and I have re-created the process on this new page.

However the script is static, I want to leave it dynamic, to go on going on each page by clicking the "next page" button and the materials being downloaded in each one.

Another detail, that in the static there are some problems, in this part:
GOTO URL = chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html

imacros adds "http://" and ignores after the "extension" the ":" and try to open in the new tab like this, imacros try open this:
http://chrome-extension//pibndofbpkoaip ... nload.html

1 - How do I open it in the new tab exactly "chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html" ?

2 - inside this addon page which is a new tab, it seems that imacros did not save the command to click on the link, which makes the addon download the m3u8 and merge all the videos into a single file.
So in this way, as the addon is not opened as per the link above, I can not download the m3u8 video. Its possible use this addon with imacros? if not, any another suggestions to download m3u8 using imacros?

3 - When imacros downloads the contents of this first page, it goes to the next page according to the next page button, but it does not downloads the contents of the second page, but download the contents of the same first page again.
How to make it go to the next page as the button and download the contents of the next page and not the first?

That is, I want to make this dynamic, I did the procedure 3 times to be easy to understand the static imacro to try to make it dynamic.
So I want do a dynamic imacro that will always click "next page" automatically regardless of how many pages it is and download the contents of each one of them.

In short, for the time being this static script can download the pdfs and the audio normally, is only with the problem that I mentioned in not downloading the contents of the next page and too the problem of not being able to download the m3u8 video with imacros using the HLS addon.
Script:
VERSION BUILD=1001 RECORDER=CR
URL GOTO=https://www.****.com.br/course/cod/sc%6CTU1EaL7Z%9W/v/OQlrb2uhlnS%9W/c/KvcHnMerTdz%9W
TAB OPEN
TAB T=2
URL GOTO=chrome://newtab/
URL GOTO=chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>Booklet<SP>pdf
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>slide<SP>lesson
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>audiolesson
TAG POS=1 TYPE=A ATTR=TXT:Download<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Close
TAG POS=1 TYPE=A ATTR=TXT:Next<SP>page<SP>lesson
TAB OPEN
TAB T=2
URL GOTO=chrome://newtab/
URL GOTO=chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>Booklet<SP>pdf
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>slide<SP>lesson
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>audiolesson
TAG POS=1 TYPE=A ATTR=TXT:Download<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Close
TAG POS=1 TYPE=A ATTR=TXT:Next<SP>page<SP>lesson
TAB OPEN
TAB T=2
URL GOTO=chrome://newtab/
URL GOTO=chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html
TAB T=1
TAG POS=1 TYPE=B ATTR=TXT:Booklet
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>slide<SP>lesson
TAB T=2
TAB T=1
TAG POS=1 TYPE=A ATTR=TXT:Audiolesson<SP>Download<SP>audiolesson
TAG POS=1 TYPE=A ATTR=TXT:Download<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Close
Program
 
Posts: 5
Joined: Wed Jan 10, 2018 9:05 am

Re: Download pdf/audio/video from page, click next page do s

by chivracq on Wed Jan 10, 2018 12:58 pm

Program wrote:Dear friends, I hope I can find some help, is my first time with imacros (I know algorithms, but I've never used imacros).

I am a participant in a website with videos and study material in pdf, and I want to download them so I can watch and study offline.

So, I'm logged into the page, and I'm going to the course I want to download, link:
https: //www.****.com/course/cod/sc%6CTU1EaL7Z%9W/v/OQlrb2uhlnS%9W/c/KvcHnMerTdz%9W

I tried, with Chrome Imacros recording function of imacros to save each action and repeating it to do the operation in next pages, I'll attach the script so you understand.

In the page, I download the video, I use the Adobe HDS / HLS Video Saver addon (withou imacros i can download any m3u8 with this addon), the link to open the addon is:
chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html

Is possible record actions of a another addon? even if this addon is just a new tab with specifc functions?
In this link, the m3u8 file is brought by clicking on the link inside the addon, the addon joins the parts and the download of a unified file of the video.
Then I click clear and close the tab.

I go back to the first page, and I click to download the pdf booklet and then click to download the pdf slide and after downlaod the audiofile.

Then I click on the next page, where there same structure page with a new video and other materials and I have re-created the process on this new page.

However the script is static, I want to leave it dynamic, to go on going on each page by clicking the "next page" button and the materials being downloaded in each one.

Another detail, that in the static there are some problems, in this part:
GOTO URL = chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html

imacros adds "http://" and ignores after the "extension" the ":" and try to open in the new tab like this, imacros try open this:
http://chrome-extension//pibndofbpkoaip ... nload.html

1 - How do I open it in the new tab exactly "chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html" ?

2 - inside this addon page which is a new tab, it seems that imacros did not save the command to click on the link, which makes the addon download the m3u8 and merge all the videos into a single file.
So in this way, as the addon is not opened as per the link above, I can not download the m3u8 video. Its possible use this addon with imacros? if not, any another suggestions to download m3u8 using imacros?

3 - When imacros downloads the contents of this first page, it goes to the next page according to the next page button, but it does not downloads the contents of the second page, but download the contents of the same first page again.
How to make it go to the next page as the button and download the contents of the next page and not the first?

That is, I want to make this dynamic, I did the procedure 3 times to be easy to understand the static imacro to try to make it dynamic.
So I want do a dynamic imacro that will always click "next page" automatically regardless of how many pages it is and download the contents of each one of them.

In short, for the time being this static script can download the pdfs and the audio normally, is only with the problem that I mentioned in not downloading the contents of the next page and too the problem of not being able to download the m3u8 video with imacros using the HLS addon.
Script:
Code: Select all
VERSION BUILD=1001 RECORDER=CR
URL GOTO=https://www.****.com.br/course/cod/sc%6CTU1EaL7Z%9W/v/OQlrb2uhlnS%9W/c/KvcHnMerTdz%9W
TAB OPEN
TAB T=2
URL GOTO=chrome://newtab/
URL GOTO=chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>Booklet<SP>pdf
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>slide<SP>lesson
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>audiolesson
TAG POS=1 TYPE=A ATTR=TXT:Download<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Close
TAG POS=1 TYPE=A ATTR=TXT:Next<SP>page<SP>lesson
TAB OPEN
TAB T=2
URL GOTO=chrome://newtab/
URL GOTO=chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>Booklet<SP>pdf
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>slide<SP>lesson
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>audiolesson
TAG POS=1 TYPE=A ATTR=TXT:Download<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Close
TAG POS=1 TYPE=A ATTR=TXT:Next<SP>page<SP>lesson
TAB OPEN
TAB T=2
URL GOTO=chrome://newtab/
URL GOTO=chrome-extension://pibndofbpkoaipoidbkephfhhnapkccn/download.html
TAB T=1
TAG POS=1 TYPE=B ATTR=TXT:Booklet
TAB T=2
TAB T=1
TAG POS=1 TYPE=SMALL ATTR=TXT:Download<SP>slide<SP>lesson
TAB T=2
TAB T=1
TAG POS=1 TYPE=A ATTR=TXT:Audiolesson<SP>Download<SP>audiolesson
TAG POS=1 TYPE=A ATTR=TXT:Download<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Close

Hum..., nice to see a "High Quality" Post from time to time, you've put some nice Effort in your Post, I'm "impressed", ah-ah...! :D

But...: FCIM...! :mrgreen: (Read my Sig...)
=> iMacros for CR v10.0...?, CR v...?, OS...?

Make sure to use the ']CODE[' Meta-Tags around your Script like I did in my Quote to make the Thread easier to read... :idea:

OK, concerning your Qt(s), I'm not sure I follow everything from not knowing this Add-on you are using and not being able to follow exactly all the Steps you take, but concerning the "chrome-extension" and "http" part, have a look at the following related Thread from about 1 year ago:
- [SOLVED] URL Goto appends HTTP:// to URL, can it be stopped?
I think in your Case you are encountering the same Pb like in this other Thread because of the Dash ("-") in "chrome-extension" that was forcing the "http://" on iMacros for FF, but I reckon iMacros for CR will act the same way...

The User in that Thread went for a Solution involving modifying the Add-on (for FF), which is quite a "drastic" Workaround and pretty High Level, but I had mentioned several other possible Solutions/Workarounds that might work in your Case, even if most were rather meant for iMacros for FF.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6957
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Download pdf/audio/video from page, click next page do s

by chivracq on Mon Jan 15, 2018 5:21 pm

Hum..., a bit less "impressed" by the (lack of) Follow-up, 5 or 6 days later... I guess mentioning 3 Versions about your Environment was probably a bit too complicated, ah-ah...!, OK, never mind... :(

For those interested, parallel Thread on SOF:
- Download pdf/audio/video from page, click next page and do same dynamically
(No Replies/Answers, except one Comment about some incorrect Meta-Tag... which got lucky to get some Follow-up... :wink: )
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6957
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Download pdf/audio/video from page, click next page do s

by Program on Tue Jan 16, 2018 10:38 am

Dear chivracq and all of forum,

Sorry, i have a problem and return now. thanks for all

Imacros Version: 10.0.1
Chrome: Versão 63.0.3239.132
OS: Windows 10 64 bits

In these last days kept trying other ways, so it took me a long time to get back. I hope my doubt helps several people.
Sorry if my post is big, I always try to be very detailed in doubts on the internet so that everyone can take advantage of the knowledge that only a forum can bring with mutual help.

I decided to use another approach (so forget the first post, now is similar, but more simple) and will no longer use the HLS addon that I quoted in the first post.

I try again and i will put a print screen to understand better how is the site and buttons:
https://drive.google.com/file/d/1KbQ0Is ... sp=sharing

Code: Select all
TAG POS=1 TYPE=B ATTR=TXT:Lesson<SP>01<SP>-<SP>Adminis
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor-superior<SP>fa-file-text&&TXT:
TAB T=2
TAB T=1
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor-superior<SP>fa-desktop&&TXT:
TAB T=2
TAB T=1
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor-superior<SP>fa-volume-up&&TXT:
TAG POS=1 TYPE=A ATTR=TXT:Download<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Close
TAB OPEN
TAB T=2
REFRESH
TAB T=1
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-arrow-right&&TXT:


I put this above in a loop, and work ok, just few problems as will describe below:

in this first command (as in print, step 1):
Code: Select all
TAG POS=1 TYPE=B ATTR=TXT:Lesson<SP>01<SP>-<SP>Adminis


I select the title of the video and press ctrl + c to copy the name, and I would like to save that name somewhere, or in a csv file.
Doubt 1: How can I do this?

In this command, I am clicking on the pdf download button and download the slide pdf and the audio (as print, step 2):
Code: Select all
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor-superior<SP>fa-file-text&&TXT:

and
Code: Select all
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor-superior<SP>fa-desktop&&TXT:

and
Code: Select all
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor-superior<SP>fa-volume-up&&TXT:
TAG POS=1 TYPE=A ATTR=TXT:Baixar<SP>aula<SP>em<SP>áudio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Fechar


But there is a problem, i just record the acts, but not all pages have these 3 buttons, some have only two buttons.
Some dont have this "fa-file-text" and only the "fa-desktop" and "fa-volume-up", so will the script will happen this sometime:

"RuntimeError: element I specified by CLASS:fa<SP>fa-4x<SP>cor-superior<SP>fa-file-text&&TXT: was not found, line: 2"

So I need to put a conditional, when the script does not find the specific button, to move on to the next command.
Doubt 2: How to do this?


Before the step 3 i try to access the source code of the page to search for a word and copy the entire line where it has this word.
I tried to do recording the action, but IMacros did not recognize it when I pressed the "ctrl + u" shortcut to open the view of sourcecode.
The Imacros just recognize:
Code: Select all
TAB OPEN
TAB T=2
REFRESH


When opening the page:
view-source:https://www.****.com/course/cod/sc%6CTU1EaL7Z%9W/v/OQlrb2uhlnS%9W/c/KvcHnMerTdz%9W

I press "ctrl + f" and search for the word:
"m3u8"

And then I find the line:

file: 'https://*****/04918i/t/0y018376t3q37c1m09e8265q476mjy143/t814e3rv4g75w28j34mv14bbyb8624kq/mp4:t814e3rv4g75w28j34mv14bbyb8624kq.mp4/chunklist.m3u8',

I want to copy all this line and save in csv (befose the step 3), in a column next to Title on the same line and then on the next page repeat the process and save the title and m3u8 in the line below.
Doubt 3: How i can do this? which command?

And in this command below I am clicking the button if I move to the next page (and then I set the loop and the process repeats).
(As print, step 3):

Code: Select all
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-arrow-right&&TXT:


It's almost all working ok, I can download the pdf lesson, the pdf slide and the multipage audio, and do a loop of this, until I come to a page that does not have one of these buttons, so as i say above i nedd a conditional.
And I still can not copy the page title, and I can not even copy the row in viewsource that appears the word m3u8.


Thanks in advanced
Last edited by Program on Thu Jan 18, 2018 9:36 am, edited 1 time in total.
Program
 
Posts: 5
Joined: Wed Jan 10, 2018 9:05 am

Re: Download pdf/audio/video from page, click next page do s

by Program on Wed Jan 17, 2018 5:39 pm

Dear

I continued doing my research and resolved all my doubts, now I can do what I mentioned in the previous post, and I will explain each step to my solution.

The only problem I'm having is this:
RuntimeError: SAVEAS requires File IO interface installed, line: 10

I'm using IMacros Chrome addon from here:
https://chrome.google.com/webstore/deta ... joopmnlemp

I already installed this:
http://download.imacros.net/iMacros-for ... up-Win.exe

But still not working.
And because of this, i dont know if my csv is saving each in one column and after in the loop in each row.
Anyone can help me in this my final doubt.

To help others i will share my script and explain all:

Code: Select all
SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP NO
TAG SELECTOR="#play-aula>DIV>H3>B" EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SAVEAS TYPE=EXTRACT FOLDER=C:\test\extract FILE=test.csv
SET !EXTRACT NULL
SEARCH SOURCE=REGEXP:"(http.*?\.m3u8[^&",]+)" EXTRACT=""
SAVEAS TYPE=EXTRACT FOLDER=C:\test\extract FILE=test.csv
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor<SP>fa-file-text&&TXT:
TAB T=2
TAB T=1
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor<SP>fa-desktop&&TXT:
TAB T=2
TAB T=1
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-4x<SP>cor<SP>fa-volume-up&&TXT:
TAG POS=1 TYPE=A ATTR=TXT:Down<SP>lesson<SP>audio
TAB T=2
TAB T=1
TAG POS=1 TYPE=BUTTON ATTR=TXT:Fechar
TAB T=2
TAB T=1
TAG POS=1 TYPE=I ATTR=CLASS:fa<SP>fa-arrow-right&&TXT:


When the "next page" dont have the same buttons (of download pdf) of the previous page - SOLUTION:
SET !ERRORIGNORE YES
This will ignore the erros and go to next command.

To get the title i try to record using the another way of recording, so I mix the two ways of recording, i this command of TAG SELECTOR works 100% for me:
TAG SELECTOR="#play-aula>DIV>H3>B" EXTRACT=TXT

So to save in a variable I use this:
SET !VAR1 {{!EXTRACT}}
and to save in the csv, i think i must use this (but i have the problem as I say above):
SAVEAS TYPE=EXTRACT FOLDER=C:\test\extract FILE=test.csv

After I clear the variable:
SET !EXTRACT NULL

and to get the info in the source i use this command:
SEARCH SOURCE=REGEXP:"(http.*?\.m3u8[^&",]+)" EXTRACT=""

If anyone need more info how to use REGEXP, I use this site regex101.com to try many options. And after i try save in the csv:
SAVEAS TYPE=EXTRACT FOLDER=C:\test\extract FILE=test.csv

I dont know if IMacros will be save automatically in another column in the same row, because i need solve this problem yet of creation of csv with imacros.
Anyone know how to solve this?

Thanks in advanced
Program
 
Posts: 5
Joined: Wed Jan 10, 2018 9:05 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: Google [Bot] and 5 guests

-->