Help: Lazy Load Image URL Extraction

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information:CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
icaro9
Posts: 1
Joined: Fri Apr 06, 2018 10:06 pm

Help: Lazy Load Image URL Extraction

Post by icaro9 » Fri Apr 06, 2018 10:43 pm

Hi,
I'm using Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Please help me to extract urls from images in a object list, each one is a "lazy load scheme"
Html code is as follows:
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg"
...
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg"
...

My goal is extract image urls https://www.com/Ineedthis.jpg, https://www.com/Ineedthis2.jpg and so but I have tried some commands with no luck so far
TAG POS=1 TYPE=IMG ATTR=CLASS:entry-thumb<SP>sld EXTRACT=HREF returns a url but from a random object (not first one) from the list
TAG POS=1 TYPE=IMG ATTR=HREF:https://www.com*.jpg EXTRACT=HREF returns a #EAN error

Thanks in advance
FCI: Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823
chivracq
Posts: 8144
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Help: Lazy Load Image URL Extraction

Post by chivracq » Sat Apr 07, 2018 1:41 am

icaro9 wrote:Hi,
I'm using

Code: Select all

Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Please help me to extract urls from images in a object list, each one is a "lazy load scheme"
Html code is as follows:

Code: Select all

<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
    <div class="td-module-image">
        <div class="entry-thumb99">
            <div class="td-module-thumb">
                <a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
                    <img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg"
...
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
    <div class="td-module-image">
        <div class="entry-thumb99">
            <div class="td-module-thumb">
                <a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
                    <img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg"
...
My goal is extract image urls https://www.com/Ineedthis.jpg, https://www.com/Ineedthis2.jpg and so but I have tried some commands with no luck so far

Code: Select all

TAG POS=1 TYPE=IMG ATTR=CLASS:entry-thumb<SP>sld EXTRACT=HREF returns a url but from a random object (not first one) from the list
TAG POS=1 TYPE=IMG ATTR=HREF:https://www.com*.jpg EXTRACT=HREF returns a #EAN error
Thanks in advance
FCI:

Code: Select all

Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823
Yes...., good-good-good about "FCI", but you only mention the Browser Versions => PM26.5 + FF56.0, hum, and iMB v10.0 (this one not a very "standard" Version, but OK, but hum again then..., then I have to read your Qt... [...], OK, read it, simple Extraction Qt, then if you've been using iMB10, released about 5y ago, you should be answering all Threads on the Forum by now... :? )
=> OK, you mention your Browsers Versions but not which Version of iMacros for FF you have on each Browser...
Even if that won't play a Role, OK..., you did your best already..., I give you a "better" Answer, ah-ah...!

[...] Yeah, but the URL's you provided are not for your Site for me to have a look, yeah then without a Site to be able to check/"play with", pfff, no time now then to do the more "elaborate" Thinking, which I actually nearly never do anymore, I don't like to "work" from some HTML Source Excerpt, I would need to reconstruct a "real" HTML Page to do my Testing, the 'POS=n' won't work for you, and we will end up 2 Pages later (of the Thread) with you posting the URL of your Site or L&P or uploading the HTML Page, then, pfff..., do "the necessary" directly, ah-ah...! (Sorry, the "ah-ah" is for some Indian User always mentioning "do the necessary" each time they start a new Thread and I each time get completely "pissed" at them because of that Formulation...)

But OK, that was my 15min available for the whole WE, next chance in 2 or 3 days... :P
I wanted to be "nice" to you because you had mentioned your FCI and I thought I could have a look quickly, but nope, too much Info still missing...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
thecoder2012
Posts: 248
Joined: Sat Aug 15, 2015 5:14 pm
Location: Internet
Contact:

Re: Help: Lazy Load Image URL Extraction

Post by thecoder2012 » Thu May 03, 2018 2:34 am

icaro9 wrote:I'm using Palemoon 26.5.0 (x86) & Firefox 56.0 (64-bit) + Win 7 x64 + iMacro 10.0.2.2823.
Example with iMacros 8.9.7, Win8.1 and Waterfox 55:

Code: Select all

SET !EXTRACT_TEST_POPUP NO
TAG POS=1 TYPE=* ATTR=* EXTRACT=HTM
SET !EXTRACT NULL
URL GOTO=javascript:var<SP>input<SP>=<SP>document.createElement('textarea');input.id="xyz";document.body.appendChild(input);document.focus();
URL GOTO=javascript:var<SP>newx<SP>=<SP>"";var<SP>imgs<SP>=<SP>document.getElementsByClassName("entry-thumb<SP>sld<SP>initial<SP>lazyloaded");for(var<SP>i=0;i<imgs.length;i++){newx<SP>+=imgs[i].src+"\n";}document.getElementById("xyz").value=newx;document.focus();
TAG POS=1 TYPE=TEXTAREA ATTR=ID:xyz EXTRACT=TXT
URL GOTO=javascript:var<SP>node=document.getElementById('xyz');node.parentNode.removeChild(node);document.focus();
PROMPT {{!EXTRACT}}
SET !EXTRACT NULL
HTML-Code:

Code: Select all

<html>
<body>
<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="AAAAAAAAA" rel="bookmark" href="BBBBBBBBB">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="CCCCCCCCC" alt="" src="https://www.com/Ineedthis.jpg">

<div class="td_module_10 td_module_wrap td-animation-stack" itemtype="https://schema.org/Article" itemscope="">
<div class="td-module-image">
<div class="entry-thumb99">
<div class="td-module-thumb">
<a title="DDDDDDDDD" rel="bookmark" href="EEEEEEEEE">
<img class="entry-thumb sld initial lazyloaded" width="320" height="180" title="FFFFFFFFFF" alt="" src="https://www.com/Ineedthis2.jpg">
</body>
</html>
Join 9kw.eu Captcha Service now and let your iMacros continue downloads and scripts while you sleep. - Custom iMacros? Contact me!
Post Reply