Get image info

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information:CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Chilly_Bang
Posts: 29
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Tue May 15, 2018 9:06 am

Hi guys, @thecoder2012 & chivracq !

I'm now running this imacro for purposes i've planned - it rocks, but... it is a bit more detailed as i can handle its results. The thing is: it gets even images and image info, which are not "normal" images, but are as data:image...base64 implemented.

Could you give me a hint how to exclude such images from scraping?

BTW: i was forced to replace comma to tabs as separator, because some urls have not encoded commas, which were breaking my table.

Thanks!
FCI: Win 7 x64 + Win10 x64 + FF 45.9.0 + iMacro for FF 9.0.3
chivracq
Posts: 8413
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Get image info

Post by chivracq » Wed May 16, 2018 12:24 am

Chilly_Bang wrote:Hi guys, @thecoder2012 & chivracq !

I'm now running this imacro for purposes i've planned - it rocks, but... it is a bit more detailed as i can handle its results. The thing is: it gets even images and image info, which are not "normal" images, but are as data:image...base64 implemented.

Could you give me a hint how to exclude such images from scraping?

BTW: i was forced to replace comma to tabs as separator, because some urls have not encoded commas, which were breaking my table.

Thanks!
Yeah, wait to see if @thecoder2012 will react, you were using his Solution anyway..., which I had tested but that didn't work on PM (v24.6.6) (+v8.8.2 for FF), Pb with the 'TEXTAREA' Field, I didn't investigate too much further as it was working on FF v55.0.3 (+v8.9.7 for FF), and I would eventually have had another Solution (based on a similar Trick, posted by @iimfun a few months earlier), which was working in PM24, but I never posted my "Results"..., and I guess I would have to start all over again from the beginning more or less...

But I guess the first part of the "Trick" still works, you only get "too much" Data in the Extract..., which you can always "clean" in 'EVAL()' before saving it locally... Different Techniques for that, I'll dig into it again if @thecoder2012 doesn't come out, but I will need the concrete Full Script and URL(s) where that "Pb" happens and exactly what Data you want to keep and to discard..., and I usually need at least 3 different Examples to find a "generic" Solution...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Chilly_Bang
Posts: 29
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Wed May 16, 2018 8:19 am

Well, for those who needs quick and dirty solution; take Notepad++ and
- Search for string,
- bookmark lines with matches,
- delete bookmarked lines-

For other who is looking for this updated solution - hope for TheCoder2012 will coming :)
FCI: Win 7 x64 + Win10 x64 + FF 45.9.0 + iMacro for FF 9.0.3
User avatar
thecoder2012
Posts: 309
Joined: Sat Aug 15, 2015 5:14 pm
Location: Internet
Contact:

Re: Get image info

Post by thecoder2012 » Tue May 22, 2018 4:47 am

chivracq wrote:and I would eventually have had another Solution (based on a similar Trick, posted by @iimfun a few months earlier), which was working in PM24, but I never posted my "Results"...,
Link? Thread?
Chilly_Bang wrote:it is a bit more detailed as i can handle its results.
It's Javascript. All ways are possible.
Chilly_Bang wrote:The thing is: it gets even images and image info, which are not "normal" images, but are as data:image...base64 implemented.
Why no images with base64 ("data:image") ?
You can convert every image to base64 and base64 to image with Javascript.
Chilly_Bang wrote:BTW: i was forced to replace comma to tabs as separator, because some urls have not encoded commas, which were breaking my table.
Sure. You can use any separator.
Chilly_Bang wrote:Could you give me a hint how to exclude such images from scraping?
As IIM File without base64 images: (tested with iMacros 8.9.7 and Waterfox 55)

Code: Select all

URL GOTO=https://www.google.de/
URL GOTO=javascript:(function(){let<SP>di<SP>=<SP>window.document.images;let<SP>csv<SP>=<SP>"";for(let<SP>i=0,l=di.length,img=di[i];i<l;++i,img=di[i]){if(!img.src.match(/^data:image/i)){let<SP>w=img.width,h=img.height,nw=img.naturalWidth,nh=img.naturalHeight;csv<SP>+=<SP>img.src+","+w+","+h+","+nw+","+nh+"\r\n";}};window.document.body.innerHTML<SP>=<SP>"<textarea<SP>id=csv<SP>rows=10<SP>cols=150<SP>wrap=off>"+csv+"</textarea>"})();
TAG POS=1 TYPE=TEXTAREA ATTR=ID:csv EXTRACT=TXT
PROMPT {{!EXTRACT}}
For iMacros 8.9.7 and Javascript without base64 images:

Code: Select all

iimPlayCode("URL GOTO=https://www.google.de/");
let di = window.document.images;
let csv = "";
for(let i=0,l=di.length,img=di[i];i<l;++i,img=di[i]){
	if(!img.src.match(/^data:image/i)){
		let w=img.width,h=img.height,nw=img.naturalWidth,nh=img.naturalHeight;
		csv += img.src+","+w+","+h+","+nw+","+nh+'\r\n';
	}
};
window.document.body.innerHTML = "<textarea id=csv rows=10 cols=150 wrap=off>"+csv+"</textarea>";
Join 9kw.eu Captcha Service now and let your iMacros continue downloads and scripts while you sleep. - Custom iMacros? Contact me! :idea:
Chilly_Bang
Posts: 29
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Thu May 09, 2019 2:04 pm

On extensively using this imacros i even realized, that some SVG images aren't scraped. Try the imacros code on the page "https:// www. flyer alarm.com/at/callback" (delete backspaces from the middle of url) - undependently of waiting time and loading status there are only first four images scraped (first means the order how they appear in the developer tools) - all further images are... skipped? out of scope? Don't know. base64 images are skipped too - but this was the plan, is an expected behavior. But what happens with skipped images?
FCI: Win 7 x64 + Win10 x64 + FF 45.9.0 + iMacro for FF 9.0.3
chivracq
Posts: 8413
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Get image info

Post by chivracq » Thu May 09, 2019 3:16 pm

Chilly_Bang wrote:
Thu May 09, 2019 2:04 pm
On extensively using this imacros i even realized, that some SVG images aren't scraped. Try the imacros code on the page "https:// www. flyer alarm.com/at/callback" (delete backspaces from the middle of url) - undependently of waiting time and loading status there are only first four images scraped (first means the order how they appear in the developer tools) - all further images are... skipped? out of scope? Don't know. base64 images are skipped too - but this was the plan, is an expected behavior. But what happens with skipped images?
I don't really get why you need to half mask the URL for that Site, not very "practical" if "we" want to have a look... :roll: , and this Site is very sloooooow to load...:
https://www.flyeralarm.com/at/callback

But OK, my 2ct, from a Look at the Source on that Page, I guess the 'window.document.images' correctly only sees 4x HTML Elements of Type=Image (2x '.svg' + 1x '.png' + 1x '.gif' => Tot=4), all other "Images" are of Type=Icon or Type=Background.
=> You probably need to add a 2nd and 3rd Block(s) of Code or include them in the Count if you want to catch those 2 other Types also, I guess...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Chilly_Bang
Posts: 29
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Thu May 09, 2019 3:39 pm

With the following function i successfully get all background images and their width and height into console:

Code: Select all

function getBgImgs (doc) {
  const srcChecker = /url\(\s*?['"]?\s*?(\S+?)\s*?["']?\s*?\)/i
  return Array.from(
    Array.from(doc.querySelectorAll('*'))
      .reduce((collection, node) => {
        let prop = window.getComputedStyle(node, null)
          .getPropertyValue('background-image')
        // match `url(...)`
        let match = srcChecker.exec(prop)
        if (match) {
          collection.add(match[1])
        }
        return collection
      }, new Set())
  )
}

getBgImgs(document)

function loadImg (src, timeout = 500) {
  var imgPromise = new Promise((resolve, reject) => {
    let img = new Image()
    img.onload = () => {
      resolve({
        src: src,
        width: img.naturalWidth,
        height: img.naturalHeight
      })
    }
    img.onerror = reject
    img.src = src
  })
  var timer = new Promise((resolve, reject) => {
    setTimeout(reject, timeout)
  })
  return Promise.race([imgPromise, timer])
}

function loadImgAll (imgList, timeout = 500) {
  return new Promise((resolve, reject) => {
    Promise.all(
      imgList
        .map(src => loadImg(src, timeout))
        .map(p => p.catch(e => false))
    ).then(results => resolve(results.filter(r => r)))
  })
}

loadImgAll(getBgImgs(document)).then(imgs => console.log(imgs))
But i fail as always on the same place - i can't build this function into the imacros code, so it will work as the second code block and extract background images with their width, height, naturalWidth and naturalHeight. Could maybe somebody help me on this? The imacros i use is:

Code: Select all

SET !ERRORIGNORE YES
TAB T=1
SET !DATASOURCE urls.csv 
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO={{!COL1}}
WAIT SECONDS=4
URL GOTO=javascript:(function(){let<SP>di<SP>=<SP>window.document.images;let<SP>csv<SP>=<SP>"";for(let<SP>i=0,l=di.length,img=di[i];i<l;++i,img=di[i]){let<SP>w=img.width,h=img.height,nw=img.naturalWidth,nh=img.naturalHeight;csv<SP>+=<SP>window.location.href+","+img.src+","+w+","+h+","+nw+","+nh+"\r\n";};window.document.body.innerHTML<SP>=<SP>"<textarea<SP>id=csv<SP>rows=10<SP>cols=150<SP>wrap=off>"+csv+"</textarea>"})();
TAG POS=1 TYPE=TEXTAREA ATTR=ID:csv EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=image-data.csv
FCI: Win 7 x64 + Win10 x64 + FF 45.9.0 + iMacro for FF 9.0.3
chivracq
Posts: 8413
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Get image info

Post by chivracq » Fri May 10, 2019 4:12 am

Chilly_Bang wrote:
Thu May 09, 2019 3:39 pm
With the following function i successfully get all background images and their width and height into console:

Code: Select all

function getBgImgs (doc) {
  const srcChecker = /url\(\s*?['"]?\s*?(\S+?)\s*?["']?\s*?\)/i
  return Array.from(
    Array.from(doc.querySelectorAll('*'))
      .reduce((collection, node) => {
        let prop = window.getComputedStyle(node, null)
          .getPropertyValue('background-image')
        // match `url(...)`
        let match = srcChecker.exec(prop)
        if (match) {
          collection.add(match[1])
        }
        return collection
      }, new Set())
  )
}

getBgImgs(document)

function loadImg (src, timeout = 500) {
  var imgPromise = new Promise((resolve, reject) => {
    let img = new Image()
    img.onload = () => {
      resolve({
        src: src,
        width: img.naturalWidth,
        height: img.naturalHeight
      })
    }
    img.onerror = reject
    img.src = src
  })
  var timer = new Promise((resolve, reject) => {
    setTimeout(reject, timeout)
  })
  return Promise.race([imgPromise, timer])
}

function loadImgAll (imgList, timeout = 500) {
  return new Promise((resolve, reject) => {
    Promise.all(
      imgList
        .map(src => loadImg(src, timeout))
        .map(p => p.catch(e => false))
    ).then(results => resolve(results.filter(r => r)))
  })
}

loadImgAll(getBgImgs(document)).then(imgs => console.log(imgs))
But i fail as always on the same place - i can't build this function into the imacros code, so it will work as the second code block and extract background images with their width, height, naturalWidth and naturalHeight. Could maybe somebody help me on this? The imacros i use is:

Code: Select all

SET !ERRORIGNORE YES
TAB T=1
SET !DATASOURCE urls.csv 
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO={{!COL1}}
WAIT SECONDS=4
URL GOTO=javascript:(function(){let<SP>di<SP>=<SP>window.document.images;let<SP>csv<SP>=<SP>"";for(let<SP>i=0,l=di.length,img=di[i];i<l;++i,img=di[i]){let<SP>w=img.width,h=img.height,nw=img.naturalWidth,nh=img.naturalHeight;csv<SP>+=<SP>window.location.href+","+img.src+","+w+","+h+","+nw+","+nh+"\r\n";};window.document.body.innerHTML<SP>=<SP>"<textarea<SP>id=csv<SP>rows=10<SP>cols=150<SP>wrap=off>"+csv+"</textarea>"})();
TAG POS=1 TYPE=TEXTAREA ATTR=ID:csv EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=image-data.csv
Okay-Okay, ouf-ouf...! OK, good -good, I see you react to my 'Image-Background' Feedback, without mentioning me, so I don't know if you just found out by yourself or if my last Reply was a bit "useful"...!? But the Script you posted next looks very nice and pretty Advanced to me, I'm not sure if I can "debug" it mentally or can do some Testing on the Page if you didn't half obfuscate the URL again...

So, was your last Post a Reply/Reaction to my Post...? and if not, didn't you notice I had posted a Reply in the "meantime", or are you too "stupid" to see when other Users are trying to help you...!? @OP = @[Chilly_Bang]...!? :?:

Pfff, not really motivated to go digging completely in that Script to be honest, as I didn't get any Feedback from my previous Reply, and I'm not really very Guru-Guru-Advanced in '.js' iMacros Scripts, yeah, I can do it if I'm "a bit motivated", but hum...I'll have to start here from the Beg more or less..., so I will first let @thecoder2012 to react..., hopefully!?, the Code/Script you (@OP) used was from him, not mine, and yep, I'm "clever" enough to follow and understand any Script in any Language, but hum-hum-hum...!?, I only go digging when there is no other "Option"..., and hum, if I "can do it", then ah-ah...!, "ANYBODY can do it", re-ah-ah...! :P
=> @thecoder2012..., grrr...! I hope you are going to react, ah-ah...! :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
Posts: 8413
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Get image info

Post by chivracq » Fri May 10, 2019 4:31 am

Yeah, meaning I'll just be "watching" this Thread if "anything" 'interesting" comes out, ah-ah...! :wink:

Ah hum..., checking the TimeStamps and I realize you posted your extended previous Reply about 30 min after mine, and I know posting an "extended Reply" can take some time [a lot of time...!], so OK, I now have a "milder" Expectation..., but hum you (@OP) should still have noticed that I had posted just before you..., especially as your next Reply was related to my Reply... :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Chilly_Bang
Posts: 29
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Fri May 10, 2019 8:32 am

if my last Reply was a bit "useful"
Surely, i was coming to idea (how) to fetch background images after you pointed me to.

This script is very simply to check - throw it into console and you'll see nice output.
FCI: Win 7 x64 + Win10 x64 + FF 45.9.0 + iMacro for FF 9.0.3
chivracq
Posts: 8413
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Get image info

Post by chivracq » Fri May 10, 2019 9:23 am

Chilly_Bang wrote:
Fri May 10, 2019 8:32 am
if my last Reply was a bit "useful"
Surely, i was coming to idea (how) to fetch background images after you pointed me to.

This script is very simply to check - throw it into console and you'll see nice output.
Ah OK, you had posted about 30 min after me, so I wasn't sure if your next Script was "related" to my "2ct" as you didn't mention any "Relationship" with my previous Post... OK, then, I'm impressed... But, pfff..., I guess you are already much more advanced than me about that JS and iMacros Functionality as I never use any '.js' Scripts, so I'll really first let @thecoder2012 react if he hopefully sees the last Replies in your Thread, that's much more his "Cup of Tea" than mine, ah-ah...!, I never use iMacros "that way" for myself, so... pfff..., I would have to start more or less from the Beginning... I'm already "impressed" my 2ct were a bit "relevant" apparently, ah-ah...! :o
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Chilly_Bang
Posts: 29
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Fri May 10, 2019 10:02 am

it plagues me while my whole life in it - can't get working code parts work together.
But sure, your 2cts brought substantial light into thing - i haven't had it on scope, there are images, which come from different sources.
FCI: Win 7 x64 + Win10 x64 + FF 45.9.0 + iMacro for FF 9.0.3
Chilly_Bang
Posts: 29
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Fri May 17, 2019 1:52 pm

Oh-oh, thecoder2012 wasn't there yet... :( And how about you, @chivracq - do the mood and social life allow to look a bit deeper there?
FCI: Win 7 x64 + Win10 x64 + FF 45.9.0 + iMacro for FF 9.0.3
User avatar
thecoder2012
Posts: 309
Joined: Sat Aug 15, 2015 5:14 pm
Location: Internet
Contact:

Re: Get image info

Post by thecoder2012 » Thu Jul 04, 2019 10:54 pm

Chilly_Bang wrote:
Thu May 09, 2019 3:39 pm
Could maybe somebody help me on this?
You can try other things than "querySelectorAll('*')"
a) getElementsByTagName
b) addEventListener
c) XPCOM (e.g. Monitoring_HTTP_activity)
d) iMacros + GreaseMonkey/Tampermonkey
e) iMacros Control via the Scripting Interface
chivracq wrote:
Fri May 10, 2019 4:12 am
=> @thecoder2012..., grrr...! I hope you are going to react, ah-ah...! :wink:
My time is limited. :wink:
Chilly_Bang wrote:
Fri May 10, 2019 8:32 am
This script is very simply to check
It's not simple enough for iMacros. (source get-all-images-in-dom-including-background-en ?) :roll:
Chilly_Bang wrote:
Fri May 17, 2019 1:52 pm
Oh-oh, thecoder2012 wasn't there yet...
I'm here. Really! :)
Crosspost https://stackoverflow.com/questions/561 ... th-imacros ?
Join 9kw.eu Captcha Service now and let your iMacros continue downloads and scripts while you sleep. - Custom iMacros? Contact me! :idea:
Post Reply