Scraping dynamic list of links with download filename

Discussions and Tech Support related to automating the iMacros Browser or Internet Explorer from any scripting and programming language, such as VBS (WSH), VBA, VB, Perl, Delphi, C# or C++.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Scraping dynamic list of links with download filename

by thinkbui on Wed Jan 03, 2018 1:29 pm

This might be a dumb question, but is iMacros capable of saving all links of a particular class on a page and use the filenames specified in the "download" attribute? Basically I'm using an eBay image scraping tool to generate a list of links to each of a listing's images at maximum resolution, but because of CORS limitations, I must right-click each link to get the save dialog box to use the filename in the "download" attribute. This is fine if there are only a few images, but it gets tedious if a listing as more than a dozen images, so I'm hoping to use a macro to automate this. Is iMacros what I want? How would I got about ensuring that the specified filenames are used?

The HTML code looks like this:

<a href="https://i.ebayimg.com/images/g/l5kAAOSwa81aA2vI/s-l1600.jpg" class="img_link" id="img_link_0" download="322871381840 s-l1600 0.jpg">322871381840 s-l1600 0.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/KBgAAOSwRbtaA2sm/s-l1600.jpg" class="img_link" id="img_link_1" download="322871381840 s-l1600 1.jpg">322871381840 s-l1600 1.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/CgwAAOSwQcJaA2yl/s-l1600.jpg" class="img_link" id="img_link_2" download="322871381840 s-l1600 2.jpg">322871381840 s-l1600 2.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/gSQAAOSwY~1aA2pA/s-l1600.jpg" class="img_link" id="img_link_3" download="322871381840 s-l1600 3.jpg">322871381840 s-l1600 3.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/xE8AAOSwVtZaA2pD/s-l1600.jpg" class="img_link" id="img_link_4" download="322871381840 s-l1600 4.jpg">322871381840 s-l1600 4.jpg</a>
<br>
.
.
.
thinkbui
 
Posts: 2
Joined: Wed Jan 03, 2018 12:47 pm

Re: Scraping dynamic list of links with download filename

by chivracq on Thu Jan 04, 2018 4:20 am

thinkbui wrote:This might be a dumb question, but is iMacros capable of saving all links of a particular class on a page and use the filenames specified in the "download" attribute? Basically I'm using an eBay image scraping tool to generate a list of links to each of a listing's images at maximum resolution, but because of CORS limitations, I must right-click each link to get the save dialog box to use the filename in the "download" attribute. This is fine if there are only a few images, but it gets tedious if a listing as more than a dozen images, so I'm hoping to use a macro to automate this. Is iMacros what I want? How would I got about ensuring that the specified filenames are used?

The HTML code looks like this:

Code: Select all
<a href="https://i.ebayimg.com/images/g/l5kAAOSwa81aA2vI/s-l1600.jpg" class="img_link" id="img_link_0" download="322871381840 s-l1600 0.jpg">322871381840 s-l1600 0.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/KBgAAOSwRbtaA2sm/s-l1600.jpg" class="img_link" id="img_link_1" download="322871381840 s-l1600 1.jpg">322871381840 s-l1600 1.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/CgwAAOSwQcJaA2yl/s-l1600.jpg" class="img_link" id="img_link_2" download="322871381840 s-l1600 2.jpg">322871381840 s-l1600 2.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/gSQAAOSwY~1aA2pA/s-l1600.jpg" class="img_link" id="img_link_3" download="322871381840 s-l1600 3.jpg">322871381840 s-l1600 3.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/xE8AAOSwVtZaA2pD/s-l1600.jpg" class="img_link" id="img_link_4" download="322871381840 s-l1600 4.jpg">322871381840 s-l1600 4.jpg</a>
<br>
.
.
.

Try to select the correct Sub-Forum when you open a Thread, your Post has nothing to do with the "Scripting Interface" (and you probably don't know what it is). Correct Sub-Forum would have been the 'Data Extraction' one, or the 'General' one when in doubt... (But no need to duplicate now..., your Thread will one day get moved to the 'Data Extraction' Sub-Forum...)

"Dumb Question"...?, well a little bit indeed, as if you found this Forum, then that means you already have/know the Answer...
=> Yep, typical Scenario and Usecase for iMacros... => The Answer is YES...! :D
iMacros can extract absolutely anything contained in the HTML Source Code of any Web-Page. If you can open that Page in one of the 4 Browsers supported by iMacros, then you can extract (any part of) its Content...

OK, I was going to ask a "stupid Qt" myself, as I've never heard of "CORS (Limitations)", but OK, I googled it... :wink:
Never heard this "CORS" Term, always seen it referred to as "Cross Domain", or "3rd Party" Elements/Objects/Components/Scripts/Frameworks...

And what your "eBay image scraping tool" is doing could iMacros do as well btw... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6564
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Scraping dynamic list of links with download filename

by thinkbui on Thu Jan 04, 2018 11:10 am

chivracq wrote:
thinkbui wrote:This might be a dumb question, but is iMacros capable of saving all links of a particular class on a page and use the filenames specified in the "download" attribute? Basically I'm using an eBay image scraping tool to generate a list of links to each of a listing's images at maximum resolution, but because of CORS limitations, I must right-click each link to get the save dialog box to use the filename in the "download" attribute. This is fine if there are only a few images, but it gets tedious if a listing as more than a dozen images, so I'm hoping to use a macro to automate this. Is iMacros what I want? How would I got about ensuring that the specified filenames are used?

The HTML code looks like this:

Code: Select all
<a href="https://i.ebayimg.com/images/g/l5kAAOSwa81aA2vI/s-l1600.jpg" class="img_link" id="img_link_0" download="322871381840 s-l1600 0.jpg">322871381840 s-l1600 0.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/KBgAAOSwRbtaA2sm/s-l1600.jpg" class="img_link" id="img_link_1" download="322871381840 s-l1600 1.jpg">322871381840 s-l1600 1.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/CgwAAOSwQcJaA2yl/s-l1600.jpg" class="img_link" id="img_link_2" download="322871381840 s-l1600 2.jpg">322871381840 s-l1600 2.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/gSQAAOSwY~1aA2pA/s-l1600.jpg" class="img_link" id="img_link_3" download="322871381840 s-l1600 3.jpg">322871381840 s-l1600 3.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/xE8AAOSwVtZaA2pD/s-l1600.jpg" class="img_link" id="img_link_4" download="322871381840 s-l1600 4.jpg">322871381840 s-l1600 4.jpg</a>
<br>
.
.
.

Try to select the correct Sub-Forum when you open a Thread, your Post has nothing to do with the "Scripting Interface" (and you probably don't know what it is). Correct Sub-Forum would have been the 'Data Extraction' one, or the 'General' one when in doubt... (But no need to duplicate now..., your Thread will one day get moved to the 'Data Extraction' Sub-Forum...)

"Dumb Question"...?, well a little bit indeed, as if you found this Forum, then that means you already have/know the Answer...
=> Yep, typical Scenario and Usecase for iMacros... => The Answer is YES...! :D
iMacros can extract absolutely anything contained in the HTML Source Code of any Web-Page. If you can open that Page in one of the 4 Browsers supported by iMacros, then you can extract (any part of) its Content...

OK, I was going to ask a "stupid Qt" myself, as I've never heard of "CORS (Limitations)", but OK, I googled it... :wink:
Never heard this "CORS" Term, always seen it referred to as "Cross Domain", or "3rd Party" Elements/Objects/Components/Scripts/Frameworks...

And what your "eBay image scraping tool" is doing could iMacros do as well btw... :idea:


I suppose I could have been more specific. Because of CORS, when I click on one of the links, the browser (in my case Chrome) tries to download the image as "s-l1600.jpg" since the remote domain is "i.ebayimg.com" while the domain of the page itself is "www.ebay.com", but I want it to save as "[item number] s-l1600.jpg [i].jpg" as specified by the "download" attr. I can force it to use that filename by right-clicking and selecting "Save Link As...", so my question was if I were to, say, use the SAVETARGETAS event, would the save dialog window use "s-l1600.jpg" or "[item number] s-l1600.jpg [i].jpg"? That's why I thought that this sub-forum was a better place for this discussion since it relates to script behavior. Based on what I've been reading so far, I know that I can write a script that right-clicks on each link by class name in a loop, but before I spend the time learning a new language, I want to be sure that the filenames will be what I want them to be. Will it do that by default or would I need to write the script to extract that attr on a separate line and paste it into the save dialog window?

On a side note, I do see value in expanding the iMacros script to do what my other tool is already doing. Right now I just have it in a language more familiar to me (JS) mainly because I need to modify the urls to point to the highest resolution copy of each image (ex. "../s-l500.jpg" vs "../s-l1600.jpg") and as I get more comfortable with iMacros, I would want to include that in the iMacros script too, but one step at a time.
thinkbui
 
Posts: 2
Joined: Wed Jan 03, 2018 12:47 pm

Re: Scraping dynamic list of links with download filename

by chivracq on Thu Jan 04, 2018 11:54 am

thinkbui wrote:I suppose I could have been more specific. Because of CORS, when I click on one of the links, the browser (in my case Chrome) tries to download the image as "s-l1600.jpg" since the remote domain is "i.ebayimg.com" while the domain of the page itself is "www.ebay.com", but I want it to save as "[item number] s-l1600.jpg [i].jpg" as specified by the "download" attr. I can force it to use that filename by right-clicking and selecting "Save Link As...", so my question was if I were to, say, use the SAVETARGETAS event, would the save dialog window use "s-l1600.jpg" or "[item number] s-l1600.jpg [i].jpg"? That's why I thought that this sub-forum was a better place for this discussion since it relates to script behavior. Based on what I've been reading so far, I know that I can write a script that right-clicks on each link by class name in a loop, but before I spend the time learning a new language, I want to be sure that the filenames will be what I want them to be. Will it do that by default or would I need to write the script to extract that attr on a separate line and paste it into the save dialog window?

On a side note, I do see value in expanding the iMacros script to do what my other tool is already doing. Right now I just have it in a language more familiar to me (JS) mainly because I need to modify the urls to point to the highest resolution copy of each image (ex. "../s-l500.jpg" vs "../s-l1600.jpg") and as I get more comfortable with iMacros, I would want to include that in the iMacros script too, but one step at a time.

If you don't like the Default Filename that iMacros will use for 'SAVETARGETAS', you can control it using the 'ONDOWNLOAD' Command...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6564
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Scripting and Command Line Interface

Who is online

Users browsing this forum: No registered users and 1 guest

-->