How to extract results from haveibeenpwned.com

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

How to extract results from haveibeenpwned.com

by Steverob1066 on Tue Sep 13, 2016 5:42 am

I am trying to extract the output of queries submitted to haveibeenpwned.com. (Great website BTW!)

You enter an email address and it returns a result basically saying "Good News" or "Oh No". I just want to know if the email address is "Good News" answer. If it's an "Oh No" answer then I want iMacros to return something else like #EANF (I can sort emails later)

This ought to be easy, but the issue is that the HTML has both answers, so whether the page is showing "Good News" or "Oh No", iMacros can see both answers because both are present in the HTML but only one is displayed.

How can I distinguish between the displayed answer and the hidden answers and just capture the displayed answer.

Here's a simple example:
TAB T=1
'URL GOTO=https://haveibeenpwned.com/ - just open this page and leave it open
SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 3
'This is my list of emails to enter on the page one by one in a loop
SET !DATASOURCE emails.csv

'The next bit enters the email address into the form on the page
TAG POS=1 TYPE=INPUT:EMAIL FORM=ACTION:/ ATTR=ID:Account CONTENT={{!col1}}
TAG POS=1 TYPE=BUTTON FORM=ACTION:/ ATTR=ID:searchPwnage
wait seconds=2

'The next bit just extracts the answer "Good news" - if this isn't present I would expect #EANF - but not so because it's there on the page but hidden
TAG POS=1 TYPE=H2 ATTR=TXT:Good* extract=txt

ADD !EXTRACT {{!col1}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=extractPwned.csv
wait seconds=5
Steverob1066
 
Posts: 15
Joined: Wed Jun 19, 2013 10:05 am

Re: How to extract results from haveibeenpwned.com

by chivracq on Tue Sep 13, 2016 9:13 am

Steverob1066 wrote:I am trying to extract the output of queries submitted to haveibeenpwned.com. (Great website BTW!)

You enter an email address and it returns a result basically saying "Good News" or "Oh No". I just want to know if the email address is "Good News" answer. If it's an "Oh No" answer then I want iMacros to return something else like #EANF (I can sort emails later)

This ought to be easy, but the issue is that the HTML has both answers, so whether the page is showing "Good News" or "Oh No", iMacros can see both answers because both are present in the HTML but only one is displayed.

How can I distinguish between the displayed answer and the hidden answers and just capture the displayed answer.

Here's a simple example:
Code: Select all
TAB T=1
'URL GOTO=https://haveibeenpwned.com/ - just open this page and leave it open
SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 3
'This is my list of emails to enter on the page one by one in a loop
SET !DATASOURCE emails.csv

'The next bit enters the email address into the form on the page
TAG POS=1 TYPE=INPUT:EMAIL FORM=ACTION:/ ATTR=ID:Account CONTENT={{!col1}}
TAG POS=1 TYPE=BUTTON FORM=ACTION:/ ATTR=ID:searchPwnage
wait seconds=2

'The next bit just extracts the answer "Good news" - if this isn't present I would expect #EANF -  but not so because it's there on the page but hidden
TAG POS=1 TYPE=H2 ATTR=TXT:Good* extract=txt

ADD !EXTRACT {{!col1}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=extractPwned.csv
wait seconds=5

Hum, useless (= Non-Descriptive) Thread Title, (all Threads on this Sub-Forum are related to "Problems extracting from a webpage"...! :roll: ), I don't read...

+ CIM...! :mrgreen: (Read my Sig, and I've always asked you to mention your FCI when you open a Thread, many Commands are not implemented (or get broken) for all Browsers/Versions...)

And hum, not impressed too much by your Use of the Forum, as all your 8 previous Threads are all waiting for some Follow-up from your Side, except maybe this one where as "the unique Exception", you shared your Solution..., even if there were a few Replies in it and you could maybe have followed up a bit further...

Sorry but I only help Users using the Forum a bit correctly, so I guess I will pass on that one... :shock: :
=> Correct Sub-Forum + No Spam (= No Duplicates) + Descriptive Thread Title + FCI mentioned + URL & Script when possible and where they get stuck and what they've tried + Neat Follow-up on all (previous) Threads (= Bump their Thread after a while if nobody reacted or share their Solution if they manage to solve the Pb by themselves as that could help other Users, and of course a mini-Thanks if sbd reacted/answered a Thread)...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: How to extract results from haveibeenpwned.com

by iimfun on Wed Sep 14, 2016 7:25 am

Try to replace the line
Code: Select all
TAG POS=1 TYPE=H2 ATTR=TXT:Good* extract=txt

with
Code: Select all
SET !EXTRACT_TEST_POPUP NO
TAG POS=1 TYPE=DIV ATTR=CLASS:"*panel-collapse collapse in" EXTRACT=TXT
SET !EXTRACT EVAL("('!{{!EXTRACT}}'.match(/Good news/)) ? 'Good news' : 'Oh no';")
iimfun
 
Posts: 239
Joined: Tue Jul 19, 2016 6:06 am

Re: How to extract results from haveibeenpwned.com

by Steverob1066 on Sun Sep 18, 2016 1:42 pm

@iimfun
Thanks - that seems to work - very helpful!

@chivracq
Yeah - nice one, teacher.
Steverob1066
 
Posts: 15
Joined: Wed Jun 19, 2013 10:05 am

Re: How to extract results from haveibeenpwned.com

by chivracq on Sun Sep 18, 2016 3:14 pm

Steverob1066 wrote:@iimfun
Thanks - that seems to work - very helpful!

@chivracq
Yeah - nice one, teacher.

Yep, glad to help, nice to see that you manage for the first time to finish a Thread a bit neatly with a Thanks for who helps you... 8)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: How to extract results from haveibeenpwned.com

by Steverob1066 on Mon Sep 19, 2016 5:34 am

@chivracq Not trying to have the last word: my main learning in using these forums is not checking the alert by email box when responding. This means I don't know when people comment, which means I don't go back to check because after a few days the problem has probably been replaced by another more urgent one. Won't make that mistake here again. Feel free to tell me I'm an idiot for not doing that in the first place - but I already know that.
Steverob1066
 
Posts: 15
Joined: Wed Jun 19, 2013 10:05 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 3 guests

cron
-->