Extracting some lines from a website

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting some lines from a website

by silviuactiv on Fri Jan 29, 2016 5:28 am

Hello guys.
Need some help with a script. I want to extract this messages from this site.

I want to extract the email and the message and put them in csv file and should look like this: "email@dasdasda.com E-mail address is valid" Or if the email is invalid it will look like this "email@dasdasda.com E-mail is invalid"

I tried to extract the text and save it to a file like this
"TAG POS=1 TYPE=TD ATTR=TD:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=tanana"
but it didint work..
Any suggestions ? Thanks and have a great day you all.


Image
silviuactiv
 
Posts: 6
Joined: Fri Jan 29, 2016 4:35 am

Re: Extracting some lines from a website

by chivracq on Fri Jan 29, 2016 9:45 am

silviuactiv wrote:Hello guys.
Need some help with a script. I want to extract this messages from this site.

I want to extract the email and the message and put them in csv file and should look like this: "email@dasdasda.com E-mail address is valid" Or if the email is invalid it will look like this "email@dasdasda.com E-mail is invalid"

I tried to extract the text and save it to a file like this
Code: Select all
TAG POS=1 TYPE=TD ATTR=TD:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=tanana

but it didint work..
Any suggestions ? Thanks and have a great day you all.

Image

1- CIM...! :mrgreen:

2- And can't you post the URL of the Site for me to have a look...?

3- And where does the E-mail Address in the Input Field come from...? Do you type it manually, does it come from a .CSV, do you already have it stored in some Variable in your Script...?

4- Without the URL of your Site, one Solution that will probably work is to try to tag any Element containing "E-mail address is valid"...
And using 'EVAL()', you spit out the Text that you want depending on Found or Not Found (= "#EANF#"), stg like:
Code: Select all
SET E-mail_Valid "E-mail address is valid."
SET E-mail_Invalid "E-mail address is invalid."
'SET E-mail_Address ((!COL1}}
SET E-mail_Address wefghsdec@sdfgtnvft.com

SET !EXTRACT NULL
TAG POS=1 TYPE=TD ATTR=TXT:*E-mail<SP>address<SP>is<SP>valid* EXTRACT=TXT
SET Validation EVAL("var s='{{!EXTRACT}}'; var emv='{{E-mail_Valid}}'; var emi='{{E-mail_Invalid}}'; var x; if(s!='#EANF#'){x=emv;} else {x=emi;}; x;")
SET !EXTRACT {{E-mail_Address}}
ADD !EXTRACT {{Validation}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=tanana
(Not tested obviously...!)

And I think it's better to save the E-mail Address and the Validation in 2 separate Cols in your .CSV in case you'll need to reuse that E-mail Address for some other Script, otherwise, you'll first need to isolate the E-mail Address from a Phrase like "asdfasf@dfhrt.com is valid."
Last edited by chivracq on Sat Jan 30, 2016 12:59 pm, edited 1 time in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting some lines from a website

by silviuactiv on Sat Jan 30, 2016 5:41 am

First of all let me thank you so much for your respond. My knowledge about imacros or programming is damm low..
I was afraid if i share the site my topic would probably be deleted and i would be banned.

I really dont know how to make a database with all those e-mails and let the script grab them from database and insert them on thise website and valid them....so since i dont know how to do that i made these .
The website is in the code below

Code: Select all
URL GOTO=http://mailtester.com/testmail.php
TAG POS=1 TYPE=INPUT:TEXT FORM=ACTION:testmail.php ATTR=NAME:email CONTENT=asddaav@gmail.com
TAG POS=1 TYPE=DIV ATTR=ID:maincontainer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ACTION:testmail.php ATTR=*
TAG POS=1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=tanana


and i was thinking to multyply all these lines in excel for more e-mails.

Btw i don't necesarly need the part with

Code: Select all
 SET E-mail_Valid "E-mail address is valid."
SET E-mail_Invalid "E-mail address is invalid."


So basicaly the script must do this. Verify an e-mail, ( this part works so far) grab the message displayed by the website if the mail is valid or invalid , insert the message if its valid or invalid in column 1 in a csv / excel, grab the e-mail verified and insert it in column 2. So in the end i can open the csv file and sort them by validd or invalid and delete the invalid ones.

Thanks alot for your patience and your support. Sadly i can't offer money for your help. But i can do a little work for you if you help me with these. I will stay online and refresh this page for the next few hour, maybe you are around and we can talk. We can move this conversation privately so we won't flood the forum. Thanks again buddy.
silviuactiv
 
Posts: 6
Joined: Fri Jan 29, 2016 4:35 am

Re: Extracting some lines from a website

by chivracq on Sat Jan 30, 2016 1:14 pm

silviuactiv wrote:First of all let me thank you so much for your respond. My knowledge about imacros or programming is damm low..
I was afraid if i share the site my topic would probably be deleted and i would be banned.

I really dont know how to make a database with all those e-mails and let the script grab them from database and insert them on thise website and valid them....so since i dont know how to do that i made these .
The website is in the code below

Code: Select all
URL GOTO=http://mailtester.com/testmail.php
TAG POS=1 TYPE=INPUT:TEXT FORM=ACTION:testmail.php ATTR=NAME:email CONTENT=asddaav@gmail.com
TAG POS=1 TYPE=DIV ATTR=ID:maincontainer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ACTION:testmail.php ATTR=*
TAG POS=1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=tanana


and i was thinking to multyply all these lines in excel for more e-mails.

Btw i don't necesarly need the part with

Code: Select all
 SET E-mail_Valid "E-mail address is valid."
SET E-mail_Invalid "E-mail address is invalid."


So basicaly the script must do this. Verify an e-mail, ( this part works so far) grab the message displayed by the website if the mail is valid or invalid , insert the message if its valid or invalid in column 1 in a csv / excel, grab the e-mail verified and insert it in column 2. So in the end i can open the csv file and sort them by validd or invalid and delete the invalid ones.

Thanks alot for your patience and your support. Sadly i can't offer money for your help. But i can do a little work for you if you help me with these. I will stay online and refresh this page for the next few hour, maybe you are around and we can talk. We can move this conversation privately so we won't flood the forum. Thanks again buddy.

No Pb obviously to post an URL, as long as it is related to your Qt/Pb...

But "CIM" again for me to follow up..., read my Sig, I normally don't even react to Threads when FCI is not mentioned..., even if I can see that you are on CR on Win10, I guess, from your Printscreen...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6479
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting some lines from a website

by silviuactiv on Mon Feb 01, 2016 1:23 am

Hey. Sorry and thanks for your patience. I thought CIM was a salute.

Imacros version is "VERSION BUILD=8340723 RECORDER=CR"
Browser = Google Chrome, latest version
Os= Windows 7 Professional 64 bits

Today i will be online 8 hours starting now.
silviuactiv
 
Posts: 6
Joined: Fri Jan 29, 2016 4:35 am

Re: Extracting some lines from a website

by silviuactiv on Mon Feb 01, 2016 8:29 am

Did i miss something ?
silviuactiv
 
Posts: 6
Joined: Fri Jan 29, 2016 4:35 am

Re: Extracting some lines from a website

by silviuactiv on Wed Feb 03, 2016 6:47 am

FCI
Imacros
Code: Select all
VERSION BUILD=8340723 RECORDER=CR
URL GOTO=http://mailtester.com/testmail.php
TAG POS=1 TYPE=INPUT:TEXT FORM=ACTION:testmail.php ATTR=NAME:email CONTENT=asddaav@gmail.com
TAG POS=1 TYPE=DIV ATTR=ID:maincontainer
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ACTION:testmail.php ATTR=*
TAG POS=1 TYPE=TD ATTR=TXT:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=tanana


Browser Chrome Version 48.0.2564.97 -website code
Code: Select all
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
   <title>MailTester.com</title>
   <meta name="description" content="Check if an e-mail address is valid or not. Find out why a mail bounces. Get technical information about a mail account and it's mail (SMTP) server."/>
   <meta name="keywords" content="e-mail,check,test,verify,lookup,smtp,mail,email,protocol,vrfy"/>
   <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
   <meta http-equiv="pragma" content="no-cache"/>
   <meta http-equiv="cache-control" content="no-cache"/>
   <meta http-equiv="expires" content="0"/>
   <meta http-equiv="expires" content="-1"/>
      <meta name="google-site-verification" content="vkbLZEPHahtiU971aJpwixbk-pPOR1NTw3Nw8F8DDFE"/>
   <link rel="stylesheet" type="text/css" href="stylesheet.css" title="CSS"/>
   <link rel="search" type="application/opensearchdescription+xml" title="MailTester.com e-mail address verification" href="opensearch.xml"/>
<!-- Begin Cookie Consent plugin by Silktide - http://silktide.com/cookieconsent -->
<script type="text/javascript">
    window.cookieconsent_options = {"message":"This website uses cookies to ensure you get the best experience on our website","dismiss":"Got it!","learnMore":"More info","link":"","theme":"dark-floating"};
</script>
<script type="text/javascript" src="//s3.amazonaws.com/cc.silktide.com/cookieconsent.latest.min.js"></script>
<!-- End Cookie Consent plugin -->
</head>
<body id="mainbody">

<div id="maincontainer">

<div id="banner">
      <div id="sponsor">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- mailtester_banner_728x90_v1 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:728px;height:90px"
     data-ad-client="ca-pub-4825003612245471"
     data-ad-slot="2368896146"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
      </div>
      <img alt="MailTester.com" src="images/logo.gif" width="380" height="45" class="logo"/>
<span class="languages">
</span>

</div>

<div id="filltopleft"></div>
<div id="filltopmiddle"></div>
<div id="filltopright"></div>
<div id="fillleft"></div>
<div id="fillright"></div>
<div id="fillbottomleft"></div>
<div id="fillbottommiddle"></div>
<div id="fillbottomright"></div>


<div id="content">
      <div style="float: right">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- mailtester_banner_300x600_v1 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:300px;height:600px"
     data-ad-client="ca-pub-4825003612245471"
     data-ad-slot="4762129349"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
      </div>

<h1>E-mail address verification</h1>

<form method="post" action="testmail.php">
<input type="hidden" name="lang" value="en"/><table>
   <tr>
      <th>E-mail address</th>
      <td><input type="text" name="email" value="silviu.activ@gmail.com" size="48" autofocus="autofocus"/></td>
   </tr>
   <tr>
      <th>&nbsp;</th>
      <td><input type="submit" value="Check address" class="Button"/></td>
   </tr>
</table>
</form>

<!--  -->
         <table cellspacing="0" cellpadding="0" border="0">
            <tr valign="top">
               <td colspan="3" align="right" bgcolor="#00DD00" class="FixedWidth" style="border-bottom: solid 1px #000000">silviu.activ</td>
               <td align="center" bgcolor="#00DD00" class="FixedWidth">@</td>
               <td colspan="5" align="left" bgcolor="#00DD00" class="FixedWidth" style="border-bottom: solid 1px #000000">gmail.com</td>
               <td colspan="2"></td>
            </tr>
<tr valign="top"><td rowspan="4"><img src="images/trsp_pix.gif" width="49" height="1"/></td>
<td rowspan="3" bgcolor="#000000"><img src="images/trsp_pix.gif" width="1" height="15"/></td>
<td rowspan="3"><img src="images/trsp_pix.gif" width="49" height="1"/></td>
<td><img src="images/trsp_pix.gif" width="8" height="1"/></td>
<td rowspan="3"><img src="images/trsp_pix.gif" width="25" height="1"/></td>
<td rowspan="2" bgcolor="#000000"><img src="images/trsp_pix.gif" width="1" height="15"/></td>
<td rowspan="2"><img src="images/trsp_pix.gif" width="25" height="1"/></td>
<td rowspan="1" bgcolor="#000000"><img src="images/trsp_pix.gif" width="1" height="15"/></td>
<td rowspan="1"><img src="images/trsp_pix.gif" width="25" height="1"/></td>
<td><img src="images/trsp_pix.gif" width="20" height="20"/></td>
<td></td></tr>
<tr valign="top"><td><img src="images/trsp_pix.gif" width="8" height="1"/></td>
<td><img src="images/arrow_vert.gif" width="1" height="15"/></td>
<td colspan="1"><img src="images/arrow_hor.gif" width="25" height="15"/></td>
<td><img src="images/arrow_head.gif" width="20" height="15"/></td>
<td bgcolor="#00DD00">Mail servers found for domain:<br />
- alt3.gmail-smtp-in.l.google.com (priority 30, ip address: 74.125.204.27)<br />
- alt4.gmail-smtp-in.l.google.com (priority 40, ip address: 173.194.72.26)<br />
- gmail-smtp-in.l.google.com (priority 5, ip address: 74.125.71.26)<br />
- alt1.gmail-smtp-in.l.google.com (priority 10, ip address: 64.233.164.27)<br />
- alt2.gmail-smtp-in.l.google.com (priority 20, ip address: 74.125.68.27)<br />
Using mail server with lowest priority number:<br />
- gmail-smtp-in.l.google.com (priority 5, ip address: 74.125.71.26)</td></tr>
<tr valign="top"><td><img src="images/trsp_pix.gif" width="8" height="1"/></td>
<td><img src="images/arrow_vert.gif" width="1" height="15"/></td>
<td colspan="3"><img src="images/arrow_hor.gif" width="51" height="15"/></td>
<td><img src="images/arrow_head.gif" width="20" height="15"/></td>
<td bgcolor="#00DD00">Mailserver identification:<br />
mx.google.com ESMTP r2si10082928wjz.116 - gsmtp</td></tr>
<tr valign="top"><td><img src="images/arrow_vert.gif" width="1" height="15"/></td>
<td colspan="1"><img src="images/arrow_hor.gif" width="49" height="15"/></td>
<td colspan="6"><img src="images/arrow_hor.gif" width="85" height="15"/></td>
<td><img src="images/arrow_head.gif" width="20" height="15"/></td>
<td bgcolor="#00DD00">E-mail address is valid</td></tr>
         </table>
      <br/>
      <br/>
      <br/>
      <br/>
      <br/>
      <br/>
      <div style="float: left">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- mailtester_banner_336x280_v1 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:336px;height:280px"
     data-ad-client="ca-pub-4825003612245471"
     data-ad-slot="5601564146"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
      </div>


</div>


<div id="navigation">
<ul>
<li><a href="testmail.php" class="current"><img alt="Test Mail" src="images/generated/testmail_hi.en.gif"/></a></li>
<li><a href="showpage.php?name=download"><img alt="Download" src="images/generated/download.en.gif"/></a></li>
<li><a href="faq.php"><img alt="FAQ" src="images/generated/faq.en.gif"/></a></li>
<li><a href="glossary.php"><img alt="Glossary" src="images/generated/glossary.en.gif"/></a></li>
<li><a href="links.php"><img alt="Links" src="images/generated/links.en.gif"/></a></li>
<li><a href="showpage.php?name=contact"><img alt="Contact" src="images/generated/contact.en.gif"/></a></li>
</ul>
</div>

<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-10924943-1");
pageTracker._trackPageview();
} catch(err) {}</script>

</div>

</body>
</html>


Os= Windows 7 Professional 64 bits

I hopde i didn't do anything wrong here...
silviuactiv
 
Posts: 6
Joined: Fri Jan 29, 2016 4:35 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 7 guests

-->