Scraping dynamic list of links with download filename

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
thinkbui
Posts: 2
Joined: Wed Jan 03, 2018 7:47 pm

Scraping dynamic list of links with download filename

Post by thinkbui » Wed Jan 03, 2018 8:29 pm

This might be a dumb question, but is iMacros capable of saving all links of a particular class on a page and use the filenames specified in the "download" attribute? Basically I'm using an eBay image scraping tool to generate a list of links to each of a listing's images at maximum resolution, but because of CORS limitations, I must right-click each link to get the save dialog box to use the filename in the "download" attribute. This is fine if there are only a few images, but it gets tedious if a listing as more than a dozen images, so I'm hoping to use a macro to automate this. Is iMacros what I want? How would I got about ensuring that the specified filenames are used?

The HTML code looks like this:

<a href="https://i.ebayimg.com/images/g/l5kAAOSw ... -l1600.jpg" class="img_link" id="img_link_0" download="322871381840 s-l1600 0.jpg">322871381840 s-l1600 0.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/KBgAAOSw ... -l1600.jpg" class="img_link" id="img_link_1" download="322871381840 s-l1600 1.jpg">322871381840 s-l1600 1.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/CgwAAOSw ... -l1600.jpg" class="img_link" id="img_link_2" download="322871381840 s-l1600 2.jpg">322871381840 s-l1600 2.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/gSQAAOSw ... -l1600.jpg" class="img_link" id="img_link_3" download="322871381840 s-l1600 3.jpg">322871381840 s-l1600 3.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/xE8AAOSw ... -l1600.jpg" class="img_link" id="img_link_4" download="322871381840 s-l1600 4.jpg">322871381840 s-l1600 4.jpg</a>
<br>
.
.
.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Scraping dynamic list of links with download filename

Post by chivracq » Thu Jan 04, 2018 11:20 am

thinkbui wrote:This might be a dumb question, but is iMacros capable of saving all links of a particular class on a page and use the filenames specified in the "download" attribute? Basically I'm using an eBay image scraping tool to generate a list of links to each of a listing's images at maximum resolution, but because of CORS limitations, I must right-click each link to get the save dialog box to use the filename in the "download" attribute. This is fine if there are only a few images, but it gets tedious if a listing as more than a dozen images, so I'm hoping to use a macro to automate this. Is iMacros what I want? How would I got about ensuring that the specified filenames are used?

The HTML code looks like this:

Code: Select all

<a href="https://i.ebayimg.com/images/g/l5kAAOSwa81aA2vI/s-l1600.jpg" class="img_link" id="img_link_0" download="322871381840 s-l1600 0.jpg">322871381840 s-l1600 0.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/KBgAAOSwRbtaA2sm/s-l1600.jpg" class="img_link" id="img_link_1" download="322871381840 s-l1600 1.jpg">322871381840 s-l1600 1.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/CgwAAOSwQcJaA2yl/s-l1600.jpg" class="img_link" id="img_link_2" download="322871381840 s-l1600 2.jpg">322871381840 s-l1600 2.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/gSQAAOSwY~1aA2pA/s-l1600.jpg" class="img_link" id="img_link_3" download="322871381840 s-l1600 3.jpg">322871381840 s-l1600 3.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/xE8AAOSwVtZaA2pD/s-l1600.jpg" class="img_link" id="img_link_4" download="322871381840 s-l1600 4.jpg">322871381840 s-l1600 4.jpg</a>
<br>
.
.
.
Try to select the correct Sub-Forum when you open a Thread, your Post has nothing to do with the "Scripting Interface" (and you probably don't know what it is). Correct Sub-Forum would have been the 'Data Extraction' one, or the 'General' one when in doubt... (But no need to duplicate now..., your Thread will one day get moved to the 'Data Extraction' Sub-Forum...)

"Dumb Question"...?, well a little bit indeed, as if you found this Forum, then that means you already have/know the Answer...
=> Yep, typical Scenario and Usecase for iMacros... => The Answer is YES...! :D
iMacros can extract absolutely anything contained in the HTML Source Code of any Web-Page. If you can open that Page in one of the 4 Browsers supported by iMacros, then you can extract (any part of) its Content...

OK, I was going to ask a "stupid Qt" myself, as I've never heard of "CORS (Limitations)", but OK, I googled it... :wink:
Never heard this "CORS" Term, always seen it referred to as "Cross Domain", or "3rd Party" Elements/Objects/Components/Scripts/Frameworks...

And what your "eBay image scraping tool" is doing could iMacros do as well btw... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
thinkbui
Posts: 2
Joined: Wed Jan 03, 2018 7:47 pm

Re: Scraping dynamic list of links with download filename

Post by thinkbui » Thu Jan 04, 2018 6:10 pm

chivracq wrote:
thinkbui wrote:This might be a dumb question, but is iMacros capable of saving all links of a particular class on a page and use the filenames specified in the "download" attribute? Basically I'm using an eBay image scraping tool to generate a list of links to each of a listing's images at maximum resolution, but because of CORS limitations, I must right-click each link to get the save dialog box to use the filename in the "download" attribute. This is fine if there are only a few images, but it gets tedious if a listing as more than a dozen images, so I'm hoping to use a macro to automate this. Is iMacros what I want? How would I got about ensuring that the specified filenames are used?

The HTML code looks like this:

Code: Select all

<a href="https://i.ebayimg.com/images/g/l5kAAOSwa81aA2vI/s-l1600.jpg" class="img_link" id="img_link_0" download="322871381840 s-l1600 0.jpg">322871381840 s-l1600 0.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/KBgAAOSwRbtaA2sm/s-l1600.jpg" class="img_link" id="img_link_1" download="322871381840 s-l1600 1.jpg">322871381840 s-l1600 1.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/CgwAAOSwQcJaA2yl/s-l1600.jpg" class="img_link" id="img_link_2" download="322871381840 s-l1600 2.jpg">322871381840 s-l1600 2.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/gSQAAOSwY~1aA2pA/s-l1600.jpg" class="img_link" id="img_link_3" download="322871381840 s-l1600 3.jpg">322871381840 s-l1600 3.jpg</a>
<br>
<a href="https://i.ebayimg.com/images/g/xE8AAOSwVtZaA2pD/s-l1600.jpg" class="img_link" id="img_link_4" download="322871381840 s-l1600 4.jpg">322871381840 s-l1600 4.jpg</a>
<br>
.
.
.
Try to select the correct Sub-Forum when you open a Thread, your Post has nothing to do with the "Scripting Interface" (and you probably don't know what it is). Correct Sub-Forum would have been the 'Data Extraction' one, or the 'General' one when in doubt... (But no need to duplicate now..., your Thread will one day get moved to the 'Data Extraction' Sub-Forum...)

"Dumb Question"...?, well a little bit indeed, as if you found this Forum, then that means you already have/know the Answer...
=> Yep, typical Scenario and Usecase for iMacros... => The Answer is YES...! :D
iMacros can extract absolutely anything contained in the HTML Source Code of any Web-Page. If you can open that Page in one of the 4 Browsers supported by iMacros, then you can extract (any part of) its Content...

OK, I was going to ask a "stupid Qt" myself, as I've never heard of "CORS (Limitations)", but OK, I googled it... :wink:
Never heard this "CORS" Term, always seen it referred to as "Cross Domain", or "3rd Party" Elements/Objects/Components/Scripts/Frameworks...

And what your "eBay image scraping tool" is doing could iMacros do as well btw... :idea:
I suppose I could have been more specific. Because of CORS, when I click on one of the links, the browser (in my case Chrome) tries to download the image as "s-l1600.jpg" since the remote domain is "i.ebayimg.com" while the domain of the page itself is "www.ebay.com", but I want it to save as "[item number] s-l1600.jpg .jpg" as specified by the "download" attr. I can force it to use that filename by right-clicking and selecting "Save Link As...", so my question was if I were to, say, use the SAVETARGETAS event, would the save dialog window use "s-l1600.jpg" or "[item number] s-l1600.jpg .jpg"? That's why I thought that this sub-forum was a better place for this discussion since it relates to script behavior. Based on what I've been reading so far, I know that I can write a script that right-clicks on each link by class name in a loop, but before I spend the time learning a new language, I want to be sure that the filenames will be what I want them to be. Will it do that by default or would I need to write the script to extract that attr on a separate line and paste it into the save dialog window?

On a side note, I do see value in expanding the iMacros script to do what my other tool is already doing. Right now I just have it in a language more familiar to me (JS) mainly because I need to modify the urls to point to the highest resolution copy of each image (ex. "../s-l500.jpg" vs "../s-l1600.jpg") and as I get more comfortable with iMacros, I would want to include that in the iMacros script too, but one step at a time.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Scraping dynamic list of links with download filename

Post by chivracq » Thu Jan 04, 2018 6:54 pm

thinkbui wrote:I suppose I could have been more specific. Because of CORS, when I click on one of the links, the browser (in my case Chrome) tries to download the image as "s-l1600.jpg" since the remote domain is "i.ebayimg.com" while the domain of the page itself is "www.ebay.com", but I want it to save as "[item number] s-l1600.jpg .jpg" as specified by the "download" attr. I can force it to use that filename by right-clicking and selecting "Save Link As...", so my question was if I were to, say, use the SAVETARGETAS event, would the save dialog window use "s-l1600.jpg" or "[item number] s-l1600.jpg .jpg"? That's why I thought that this sub-forum was a better place for this discussion since it relates to script behavior. Based on what I've been reading so far, I know that I can write a script that right-clicks on each link by class name in a loop, but before I spend the time learning a new language, I want to be sure that the filenames will be what I want them to be. Will it do that by default or would I need to write the script to extract that attr on a separate line and paste it into the save dialog window?

On a side note, I do see value in expanding the iMacros script to do what my other tool is already doing. Right now I just have it in a language more familiar to me (JS) mainly because I need to modify the urls to point to the highest resolution copy of each image (ex. "../s-l500.jpg" vs "../s-l1600.jpg") and as I get more comfortable with iMacros, I would want to include that in the iMacros script too, but one step at a time.

If you don't like the Default Filename that iMacros will use for 'SAVETARGETAS', you can control it using the 'ONDOWNLOAD' Command...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply