Help: Lazy Load Image URL Extraction

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
icaro9
Posts: 1
Joined: Fri Apr 06, 2018 10:06 pm

Help: Lazy Load Image URL Extraction

Post by icaro9 » Fri Apr 06, 2018 10:43 pm

Hi,
I'm using Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Please help me to extract urls from images in a object list, each one is a "lazy load scheme"
Html code is as follows:
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg"
...
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg"
...

My goal is extract image urls https://www.com/Ineedthis.jpg, https://www.com/Ineedthis2.jpg and so but I have tried some commands with no luck so far
TAG POS=1 TYPE=IMG ATTR=CLASS:entry-thumb<SP>sld EXTRACT=HREF returns a url but from a random object (not first one) from the list
TAG POS=1 TYPE=IMG ATTR=HREF:https://www.com*.jpg EXTRACT=HREF returns a #EAN error

Thanks in advance
FCI: Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Help: Lazy Load Image URL Extraction

Post by chivracq » Sat Apr 07, 2018 1:41 am

icaro9 wrote:Hi,
I'm using

Code: Select all

Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Please help me to extract urls from images in a object list, each one is a "lazy load scheme"
Html code is as follows:

Code: Select all

<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
    <div class="td-module-image">
        <div class="entry-thumb99">
            <div class="td-module-thumb">
                <a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
                    <img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg"
...
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
    <div class="td-module-image">
        <div class="entry-thumb99">
            <div class="td-module-thumb">
                <a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
                    <img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg"
...
My goal is extract image urls https://www.com/Ineedthis.jpg, https://www.com/Ineedthis2.jpg and so but I have tried some commands with no luck so far

Code: Select all

TAG POS=1 TYPE=IMG ATTR=CLASS:entry-thumb<SP>sld EXTRACT=HREF returns a url but from a random object (not first one) from the list
TAG POS=1 TYPE=IMG ATTR=HREF:https://www.com*.jpg EXTRACT=HREF returns a #EAN error
Thanks in advance
FCI:

Code: Select all

Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823
Yes...., good-good-good about "FCI", but you only mention the Browser Versions => PM26.5 + FF56.0, hum, and iMB v10.0 (this one not a very "standard" Version, but OK, but hum again then..., then I have to read your Qt... [...], OK, read it, simple Extraction Qt, then if you've been using iMB10, released about 5y ago, you should be answering all Threads on the Forum by now... :? )
=> OK, you mention your Browsers Versions but not which Version of iMacros for FF you have on each Browser...
Even if that won't play a Role, OK..., you did your best already..., I give you a "better" Answer, ah-ah...!

[...] Yeah, but the URL's you provided are not for your Site for me to have a look, yeah then without a Site to be able to check/"play with", pfff, no time now then to do the more "elaborate" Thinking, which I actually nearly never do anymore, I don't like to "work" from some HTML Source Excerpt, I would need to reconstruct a "real" HTML Page to do my Testing, the 'POS=n' won't work for you, and we will end up 2 Pages later (of the Thread) with you posting the URL of your Site or L&P or uploading the HTML Page, then, pfff..., do "the necessary" directly, ah-ah...! (Sorry, the "ah-ah" is for some Indian User always mentioning "do the necessary" each time they start a new Thread and I each time get completely "pissed" at them because of that Formulation...)

But OK, that was my 15min available for the whole WE, next chance in 2 or 3 days... :P
I wanted to be "nice" to you because you had mentioned your FCI and I thought I could have a look quickly, but nope, too much Info still missing...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
User avatar
thecoder2012
Posts: 446
Joined: Sat Aug 15, 2015 5:14 pm
Location: Internet
Contact:

Re: Help: Lazy Load Image URL Extraction

Post by thecoder2012 » Thu May 03, 2018 2:34 am

icaro9 wrote:I'm using Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Example with iMacros 8.9.7, Win8.1 and Waterfox 55:

Code: Select all

SET !EXTRACT_TEST_POPUP NO
TAG POS=1 TYPE=* ATTR=* EXTRACT=HTM
SET !EXTRACT NULL
URL GOTO=javascript:var<SP>input<SP>=<SP>document.createElement('textarea');input.id="xyz";document.body.appendChild(input);document.focus();
URL GOTO=javascript:var<SP>newx<SP>=<SP>"";var<SP>imgs<SP>=<SP>document.getElementsByClassName("entry-thumb<SP>sld<SP>initial<SP>lazyloaded");for(var<SP>i=0;i<imgs.length;i++){newx<SP>+=imgs[i].src+"\n";}document.getElementById("xyz").value=newx;document.focus();
TAG POS=1 TYPE=TEXTAREA ATTR=ID:xyz EXTRACT=TXT
URL GOTO=javascript:var<SP>node=document.getElementById('xyz');node.parentNode.removeChild(node);document.focus();
PROMPT {{!EXTRACT}}
SET !EXTRACT NULL
HTML-Code:

Code: Select all

<html>
<body>
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg">

<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg">
</body>
</html>
Join 9kw.eu Captcha Service now and let your iMacros continue downloads and scripts while you sleep. - Custom iMacros? Contact me! :idea:
Post Reply