Help: Lazy Load Image URL Extraction

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.
Post Reply
icaro9
Posts: 1
Joined: Fri Apr 06, 2018 10:06 pm

Help: Lazy Load Image URL Extraction

Post by icaro9 » Fri Apr 06, 2018 10:43 pm

Hi,
I'm using Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Please help me to extract urls from images in a object list, each one is a "lazy load scheme"
Html code is as follows:
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg"
...
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg"
...

My goal is extract image urls https://www.com/Ineedthis.jpg, https://www.com/Ineedthis2.jpg and so but I have tried some commands with no luck so far
TAG POS=1 TYPE=IMG ATTR=CLASS:entry-thumb<SP>sld EXTRACT=HREF returns a url but from a random object (not first one) from the list
TAG POS=1 TYPE=IMG ATTR=HREF:https://www.com*.jpg EXTRACT=HREF returns a #EAN error

Thanks in advance
FCI: Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Help: Lazy Load Image URL Extraction

Post by chivracq » Sat Apr 07, 2018 1:41 am

icaro9 wrote:Hi,
I'm using

Code: Select all

Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Please help me to extract urls from images in a object list, each one is a "lazy load scheme"
Html code is as follows:

Code: Select all

<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
    <div class="td-module-image">
        <div class="entry-thumb99">
            <div class="td-module-thumb">
                <a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
                    <img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg"
...
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
    <div class="td-module-image">
        <div class="entry-thumb99">
            <div class="td-module-thumb">
                <a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
                    <img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg"
...
My goal is extract image urls https://www.com/Ineedthis.jpg, https://www.com/Ineedthis2.jpg and so but I have tried some commands with no luck so far

Code: Select all

TAG POS=1 TYPE=IMG ATTR=CLASS:entry-thumb<SP>sld EXTRACT=HREF returns a url but from a random object (not first one) from the list
TAG POS=1 TYPE=IMG ATTR=HREF:https://www.com*.jpg EXTRACT=HREF returns a #EAN error
Thanks in advance
FCI:

Code: Select all

Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823
Yes...., good-good-good about "FCI", but you only mention the Browser Versions => PM26.5 + FF56.0, hum, and iMB v10.0 (this one not a very "standard" Version, but OK, but hum again then..., then I have to read your Qt... [...], OK, read it, simple Extraction Qt, then if you've been using iMB10, released about 5y ago, you should be answering all Threads on the Forum by now... :? )
=> OK, you mention your Browsers Versions but not which Version of iMacros for FF you have on each Browser...
Even if that won't play a Role, OK..., you did your best already..., I give you a "better" Answer, ah-ah...!

[...] Yeah, but the URL's you provided are not for your Site for me to have a look, yeah then without a Site to be able to check/"play with", pfff, no time now then to do the more "elaborate" Thinking, which I actually nearly never do anymore, I don't like to "work" from some HTML Source Excerpt, I would need to reconstruct a "real" HTML Page to do my Testing, the 'POS=n' won't work for you, and we will end up 2 Pages later (of the Thread) with you posting the URL of your Site or L&P or uploading the HTML Page, then, pfff..., do "the necessary" directly, ah-ah...! (Sorry, the "ah-ah" is for some Indian User always mentioning "do the necessary" each time they start a new Thread and I each time get completely "pissed" at them because of that Formulation...)

But OK, that was my 15min available for the whole WE, next chance in 2 or 3 days... :P
I wanted to be "nice" to you because you had mentioned your FCI and I thought I could have a look quickly, but nope, too much Info still missing...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
User avatar
thecoder2012
Posts: 248
Joined: Sat Aug 15, 2015 5:14 pm
Location: Internet
Contact:

Re: Help: Lazy Load Image URL Extraction

Post by thecoder2012 » Thu May 03, 2018 2:34 am

icaro9 wrote:I'm using Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Example with iMacros 8.9.7, Win8.1 and Waterfox 55:

Code: Select all

SET !EXTRACT_TEST_POPUP NO
TAG POS=1 TYPE=* ATTR=* EXTRACT=HTM
SET !EXTRACT NULL
URL GOTO=javascript:var<SP>input<SP>=<SP>document.createElement('textarea');input.id="xyz";document.body.appendChild(input);document.focus();
URL GOTO=javascript:var<SP>newx<SP>=<SP>"";var<SP>imgs<SP>=<SP>document.getElementsByClassName("entry-thumb<SP>sld<SP>initial<SP>lazyloaded");for(var<SP>i=0;i<imgs.length;i++){newx<SP>+=imgs[i].src+"\n";}document.getElementById("xyz").value=newx;document.focus();
TAG POS=1 TYPE=TEXTAREA ATTR=ID:xyz EXTRACT=TXT
URL GOTO=javascript:var<SP>node=document.getElementById('xyz');node.parentNode.removeChild(node);document.focus();
PROMPT {{!EXTRACT}}
SET !EXTRACT NULL
HTML-Code:

Code: Select all

<html>
<body>
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg">

<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg">
</body>
</html>
Join 9kw.eu Captcha Service now and let your iMacros continue downloads and scripts while you sleep. - Custom iMacros? Contact me!
Post Reply