Extract email but only if not a duplicate in my CVS file?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extract email but only if not a duplicate in my CVS file?

by myima on Thu Aug 24, 2017 4:36 pm

Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:
Code: Select all
TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv


How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :)


.
myima
 
Posts: 8
Joined: Fri Jan 08, 2016 2:51 am

Re: Extract email but only if not a duplicate in my CVS file

by chivracq on Thu Aug 24, 2017 7:21 pm

myima wrote:Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:
Code: Select all
TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv


How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :).

CIM...! :mrgreen:
"CIM" for this current Thread and for your previous one as well, probably Reason why I never reacted to your previous Thread, but some other User tried to help you in your previous Thread and you never bothered to follow up..., well 1,5 years ago..., then follow up on both Threads with missing Info and you may bump this one in 1,5 years if you are still looking for a Solution... :idea:

And pfff, if you are a bit clever and don't want to wait for 1,5 years, you can search my Posts, I've already provided several different Solutions for your Qt... :idea:
(But you'll still need to handle your 2 Threads a bit correctly for me to follow up and want to help you in the "Future", ah-ah...!)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

by myima on Thu Aug 24, 2017 8:43 pm

Sorry for not following up. This forum wont let me sign up for notifications so I had no idea anyone replied (PS just found the subscribe button, I looked everywhere. Very hard to see the black letters over a dark blue footer). Thank for following up the old one, I actually ended up doing just what you suggested and used the experimental method and it worked :)

As for this one, what information am I missing? Please understand I'm an iMacro noob but I have spent hours learning and figured out all the stuff so far on my own.
myima
 
Posts: 8
Joined: Fri Jan 08, 2016 2:51 am

Re: Extract email but only if not a duplicate in my CVS file

by myima on Thu Aug 24, 2017 10:10 pm

My config:
iMacros 8.9.7
Firefox 55.0.2 (64-bit)
OsX 10.12.6 but I can use parallels win7 too
myima
 
Posts: 8
Joined: Fri Jan 08, 2016 2:51 am

Re: Extract email but only if not a duplicate in my CVS file

by chivracq on Thu Aug 24, 2017 11:11 pm

myima wrote:Sorry for not following up. This forum wont let me sign up for notifications so I had no idea anyone replied (PS just found the subscribe button, I looked everywhere. Very hard to see the black letters over a dark blue footer). Thank for following up the old one, I actually ended up doing just what you suggested and used the experimental method and it worked :)

As for this one, what information am I missing? Please understand I'm an iMacro noob but I have spent hours learning and figured out all the stuff so far on my own.

myima wrote:My config:
Code: Select all
iMacros 8.9.7
Firefox 55.0.2 (64-bit)
OsX 10.12.6 but I can use parallels win7 too

Yep, sorry but I don't react to Thread when FCI is not mentioned, I won't ask a next time...
OK, previous Thread a bit correctly finished, even if FCI was not mentioned...

Current one, hum, first good to know that v8.9.7 for FF still works on FF55 v55.0.2, I was still waiting to update from v54.0.1to v55.0.2, ah-ah...! You are the first one to mention this FCI...!

Oh..., but yep indeed, you have some "Notify me when a reply is posted" Option in your Thread, and even if anything "goes wrong", you don't need to wait for 1,5 years to check some previous Thread only because you have some new Qt and sbd reminds you of your previous Thread, sorry, but that doesn't motivate much to want to help you again... Just saying... :idea:

>>>

OK, looking at your Case, but hum, ... "ATTR=CLASS:"tel-number"" doesn't really rhyme with "email"... Euh...!?!? :?
Site-URL not posted => I cannot check...
Sorry but if you want some Help from me, you need to post some "real" Info, I get completely "pissed off" by fake Info and Marketing Bullshit... :idea:

But anyway, like already mentioned in my previous Post, I've already posted several Methods to tackle your Qt..., I expect a little bit of "Research" now from your Side..., I won't be writing your Script... :roll:
(Yep, sorry, I'm not always very-very "friendly", but... I only help Users who really try their best and really-really get stuck... :oops: )
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

by myima on Fri Aug 25, 2017 12:00 am

I like 8.9.7 as it has some features I use a lot. I just set my FF not to manually update too.

I was testing extracting both emails and phone numbers from craigs list ads. I can extract both up until I start getting captchas. Ive been doing research for the past 2 days. Just last night I was up doing this 10pm to 9am believe it or not. I think I've read just about every post on the internet, some a similar but I still wasn't able to implement it. I got a lot done but still can't get past the not getting duplicate emails/numbers. Probably because it requires javascript and I dont know the language. Im trying not to contact the same user selling what I'm looking for on CL more than once. Now I could simple clean my CSV file manually after I get some email/# extracted but thats not what I'm tying to accomplish.

I would love to make this macro completely autonomous:
search-->if find-->extract-->contact-->wait sec...repeat
if duplicate found-->NOT extract-->restart macro


.
myima
 
Posts: 8
Joined: Fri Jan 08, 2016 2:51 am

Re: Extract email but only if not a duplicate in my CVS file

by chivracq on Fri Aug 25, 2017 12:35 am

myima wrote:I like 8.9.7 as it has some features I use a lot. I just set my FF not to manually update too.

I was testing extracting both emails and phone numbers from craigs list ads.
I can extract both up until I start getting captchas. Ive been doing research for the past 2 days.

Just last night I was up doing this 10pm to 9am believe it or not.
I think I've read just about every post on the internet, some a similar but I still wasn't able to implement it. I got a lot done but still can't get past the not getting duplicate emails/numbers. Probably because it requires javascript and I dont know the language. Im trying not to contact the same user selling what I'm looking for on CL more than once. Now I could simple clean my CSV file manually after I get some email/# extracted but thats not what I'm tying to accomplish.

I would love to make this macro completely autonomous:
search-->if find-->extract-->contact-->wait sec...repeat
if duplicate found-->NOT extract-->restart macro.

Hum... still not completely convinced to help you, first I don't help for Captchas which are meant as anti-Web-Automation Measure, I'm of course able to bypass them all, but I respect the "Idea" for "normal Users"...

And then mail Users, grrr..., this is Spam for me, sorry...! :shock:
I don't do Spam/Like/Follow/Comment/Games/Votes/Hacking/DDos, oops...! 8)

But your Case is fairly easy, a simple Extract + 'EVAL()', ah-ah...!
As I said, I've already posted several Solutions for your Scenario... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

by myima on Mon Aug 28, 2017 1:21 pm

I'm really not doing any spamming, I'm just automating searching for an item on craigs-list so when one does come out for sale I contact the seller automatically.

Do you mind letting me know what I should search for under your posts? You have over 6000 posts... I'm willing to read all topics dealing with this
myima
 
Posts: 8
Joined: Fri Jan 08, 2016 2:51 am

Re: Extract email but only if not a duplicate in my CVS file

by chivracq on Tue Aug 29, 2017 6:26 am

myima wrote:I'm really not doing any spamming, I'm just automating searching for an item on craigs-list so when one does come out for sale I contact the seller automatically.

Do you mind letting me know what I should search for under your posts? You have over 6000 posts... I'm willing to read all topics dealing with this

Oh well, then look at the following Thread for example, where I ended up writing a complete Script for the User as I found the Case "interesting", but there are other similar Threads on the Forum as well and a few other Methods that I probably have mentioned in that Thread as well...
The Link will take you to the Post with the final Script, but you'll need to read the whole Thread which is a bit long (4 Pages :oops: ) to understand a bit the whole Idea..., and to adapt it a bit to your Needs...:
- Re: exclude content of a txt/csv file when running script

Good luck and post your final Script once you've gotten it to work, to make this Thread a bit useful for other Users as well... 8)

EDIT: I had forgotten to place the Link to the Thread... :oops:
Last edited by chivracq on Wed Aug 30, 2017 2:32 am, edited 1 time in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

by myima on Tue Aug 29, 2017 7:29 pm

I think you forgot to add the link to the thread you were referring to. And yes if I figure it out I will post my end results :)

.
myima
 
Posts: 8
Joined: Fri Jan 08, 2016 2:51 am

Re: Extract email but only if not a duplicate in my CVS file

by chivracq on Wed Aug 30, 2017 2:36 am

myima wrote:I think you forgot to add the link to the thread you were referring to. And yes if I figure it out I will post my end results. :)

Hum..., that's indeed a judicious Observation, ah-ah...! Oops, sorry about that... :oops:
Previous Post edited and Link added... :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

by chivracq on Wed Aug 30, 2017 4:20 am

Hum..., and like I said, there are quite a few similar/relevant Threads already on the Forum, here are just another 2 I just come across...:
- Re: Checking for a word in .csv file
- Re: Get number of lines from CSV and use as variable?
(For the 2nd Thread, it doesn't sound like it but it's the same Method...)

Oh...!, and here, another one...!:
- Re: If equals info in excel cell then...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

by almodoen on Tue Sep 05, 2017 5:39 pm

myima wrote:Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:
Code: Select all
TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv


How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :)


.



you have to check every line in your csv file and see if it = extraced email if not then add it to the csv file

i can make loop for you , you have to use imacros via Javascript ..
almodoen
 
Posts: 10
Joined: Tue Sep 05, 2017 2:26 pm

Re: Extract email but only if not a duplicate in my CVS file

by chivracq on Tue Sep 05, 2017 5:56 pm

almodoen wrote:
myima wrote:Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:
Code: Select all
TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv


How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :).

you have to check every line in your csv file and see if it = extraced email if not then add it to the csv file

i can make loop for you , you have to use imacros via Javascript ..

"... you have to use imacros via Javascript...", hum..., that's not completely correct..., and not the "best" Solution anyway... (unless you only have 10-20 Rows in your '.CSV'...)
In one of the Threads I've placed a direct Link to in my previous Post, I posted a Method in pure '.iim' that can handle 1000 Rows in a '.CSV' in 0.5 Sec while a Solution with looping through the '.CSV' from a '.js' Script will need 2-3 Min to handle the same 1000 Rows...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6490
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 2 guests

-->