Get image info

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.
Chilly_Bang
Posts: 17
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Tue May 15, 2018 9:06 am

Hi guys, @thecoder2012 & chivracq !

I'm now running this imacro for purposes i've planned - it rocks, but... it is a bit more detailed as i can handle its results. The thing is: it gets even images and image info, which are not "normal" images, but are as data:image...base64 implemented.

Could you give me a hint how to exclude such images from scraping?

BTW: i was forced to replace comma to tabs as separator, because some urls have not encoded commas, which were breaking my table.

Thanks!
FCI: Win 7 x64 + FF 52.7.3 + iMacro for FF 9.0.3
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Get image info

Post by chivracq » Wed May 16, 2018 12:24 am

Chilly_Bang wrote:Hi guys, @thecoder2012 & chivracq !

I'm now running this imacro for purposes i've planned - it rocks, but... it is a bit more detailed as i can handle its results. The thing is: it gets even images and image info, which are not "normal" images, but are as data:image...base64 implemented.

Could you give me a hint how to exclude such images from scraping?

BTW: i was forced to replace comma to tabs as separator, because some urls have not encoded commas, which were breaking my table.

Thanks!
Yeah, wait to see if @thecoder2012 will react, you were using his Solution anyway..., which I had tested but that didn't work on PM (v24.6.6) (+v8.8.2 for FF), Pb with the 'TEXTAREA' Field, I didn't investigate too much further as it was working on FF v55.0.3 (+v8.9.7 for FF), and I would eventually have had another Solution (based on a similar Trick, posted by @iimfun a few months earlier), which was working in PM24, but I never posted my "Results"..., and I guess I would have to start all over again from the beginning more or less...

But I guess the first part of the "Trick" still works, you only get "too much" Data in the Extract..., which you can always "clean" in 'EVAL()' before saving it locally... Different Techniques for that, I'll dig into it again if @thecoder2012 doesn't come out, but I will need the concrete Full Script and URL(s) where that "Pb" happens and exactly what Data you want to keep and to discard..., and I usually need at least 3 different Examples to find a "generic" Solution...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Chilly_Bang
Posts: 17
Joined: Tue Jan 27, 2015 9:13 am

Re: Get image info

Post by Chilly_Bang » Wed May 16, 2018 8:19 am

Well, for those who needs quick and dirty solution; take Notepad++ and
- Search for string,
- bookmark lines with matches,
- delete bookmarked lines-

For other who is looking for this updated solution - hope for TheCoder2012 will coming :)
FCI: Win 7 x64 + FF 52.7.3 + iMacro for FF 9.0.3
User avatar
thecoder2012
Posts: 248
Joined: Sat Aug 15, 2015 5:14 pm
Location: Internet
Contact:

Re: Get image info

Post by thecoder2012 » Tue May 22, 2018 4:47 am

chivracq wrote:and I would eventually have had another Solution (based on a similar Trick, posted by @iimfun a few months earlier), which was working in PM24, but I never posted my "Results"...,
Link? Thread?
Chilly_Bang wrote:it is a bit more detailed as i can handle its results.
It's Javascript. All ways are possible.
Chilly_Bang wrote:The thing is: it gets even images and image info, which are not "normal" images, but are as data:image...base64 implemented.
Why no images with base64 ("data:image") ?
You can convert every image to base64 and base64 to image with Javascript.
Chilly_Bang wrote:BTW: i was forced to replace comma to tabs as separator, because some urls have not encoded commas, which were breaking my table.
Sure. You can use any separator.
Chilly_Bang wrote:Could you give me a hint how to exclude such images from scraping?
As IIM File without base64 images: (tested with iMacros 8.9.7 and Waterfox 55)

Code: Select all

URL GOTO=https://www.google.de/
URL GOTO=javascript:(function(){let<SP>di<SP>=<SP>window.document.images;let<SP>csv<SP>=<SP>"";for(let<SP>i=0,l=di.length,img=di[i];i<l;++i,img=di[i]){if(!img.src.match(/^data:image/i)){let<SP>w=img.width,h=img.height,nw=img.naturalWidth,nh=img.naturalHeight;csv<SP>+=<SP>img.src+","+w+","+h+","+nw+","+nh+"\r\n";}};window.document.body.innerHTML<SP>=<SP>"<textarea<SP>id=csv<SP>rows=10<SP>cols=150<SP>wrap=off>"+csv+"</textarea>"})();
TAG POS=1 TYPE=TEXTAREA ATTR=ID:csv EXTRACT=TXT
PROMPT {{!EXTRACT}}
For iMacros 8.9.7 and Javascript without base64 images:

Code: Select all

iimPlayCode("URL GOTO=https://www.google.de/");
let di = window.document.images;
let csv = "";
for(let i=0,l=di.length,img=di[i];i<l;++i,img=di[i]){
	if(!img.src.match(/^data:image/i)){
		let w=img.width,h=img.height,nw=img.naturalWidth,nh=img.naturalHeight;
		csv += img.src+","+w+","+h+","+nw+","+nh+'\r\n';
	}
};
window.document.body.innerHTML = "<textarea id=csv rows=10 cols=150 wrap=off>"+csv+"</textarea>";
Join 9kw.eu Captcha Service now and let your iMacros continue downloads and scripts while you sleep. - Custom iMacros? Contact me!
Post Reply