Extract email but only if not a duplicate in my CVS file?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
myima
Posts: 8
Joined: Fri Jan 08, 2016 9:51 am

Extract email but only if not a duplicate in my CVS file?

Post by myima » Thu Aug 24, 2017 11:36 pm

Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:

Code: Select all

TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv
How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :)


.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

Post by chivracq » Fri Aug 25, 2017 2:21 am

myima wrote:Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:

Code: Select all

TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv
How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :).
CIM...! :mrgreen:
"CIM" for this current Thread and for your previous one as well, probably Reason why I never reacted to your previous Thread, but some other User tried to help you in your previous Thread and you never bothered to follow up..., well 1,5 years ago..., then follow up on both Threads with missing Info and you may bump this one in 1,5 years if you are still looking for a Solution... :idea:

And pfff, if you are a bit clever and don't want to wait for 1,5 years, you can search my Posts, I've already provided several different Solutions for your Qt... :idea:
(But you'll still need to handle your 2 Threads a bit correctly for me to follow up and want to help you in the "Future", ah-ah...!)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
myima
Posts: 8
Joined: Fri Jan 08, 2016 9:51 am

Re: Extract email but only if not a duplicate in my CVS file

Post by myima » Fri Aug 25, 2017 3:43 am

Sorry for not following up. This forum wont let me sign up for notifications so I had no idea anyone replied (PS just found the subscribe button, I looked everywhere. Very hard to see the black letters over a dark blue footer). Thank for following up the old one, I actually ended up doing just what you suggested and used the experimental method and it worked :)

As for this one, what information am I missing? Please understand I'm an iMacro noob but I have spent hours learning and figured out all the stuff so far on my own.
myima
Posts: 8
Joined: Fri Jan 08, 2016 9:51 am

Re: Extract email but only if not a duplicate in my CVS file

Post by myima » Fri Aug 25, 2017 5:10 am

My config:
iMacros 8.9.7
Firefox 55.0.2 (64-bit)
OsX 10.12.6 but I can use parallels win7 too
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

Post by chivracq » Fri Aug 25, 2017 6:11 am

myima wrote:Sorry for not following up. This forum wont let me sign up for notifications so I had no idea anyone replied (PS just found the subscribe button, I looked everywhere. Very hard to see the black letters over a dark blue footer). Thank for following up the old one, I actually ended up doing just what you suggested and used the experimental method and it worked :)

As for this one, what information am I missing? Please understand I'm an iMacro noob but I have spent hours learning and figured out all the stuff so far on my own.
myima wrote:My config:

Code: Select all

iMacros 8.9.7
Firefox 55.0.2 (64-bit)
OsX 10.12.6 but I can use parallels win7 too
Yep, sorry but I don't react to Thread when FCI is not mentioned, I won't ask a next time...
OK, previous Thread a bit correctly finished, even if FCI was not mentioned...

Current one, hum, first good to know that v8.9.7 for FF still works on FF55 v55.0.2, I was still waiting to update from v54.0.1to v55.0.2, ah-ah...! You are the first one to mention this FCI...!

Oh..., but yep indeed, you have some "Notify me when a reply is posted" Option in your Thread, and even if anything "goes wrong", you don't need to wait for 1,5 years to check some previous Thread only because you have some new Qt and sbd reminds you of your previous Thread, sorry, but that doesn't motivate much to want to help you again... Just saying... :idea:

>>>

OK, looking at your Case, but hum, ... "ATTR=CLASS:"tel-number"" doesn't really rhyme with "email"... Euh...!?!? :?
Site-URL not posted => I cannot check...
Sorry but if you want some Help from me, you need to post some "real" Info, I get completely "pissed off" by fake Info and Marketing Bullshit... :idea:

But anyway, like already mentioned in my previous Post, I've already posted several Methods to tackle your Qt..., I expect a little bit of "Research" now from your Side..., I won't be writing your Script... :roll:
(Yep, sorry, I'm not always very-very "friendly", but... I only help Users who really try their best and really-really get stuck... :oops: )
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
myima
Posts: 8
Joined: Fri Jan 08, 2016 9:51 am

Re: Extract email but only if not a duplicate in my CVS file

Post by myima » Fri Aug 25, 2017 7:00 am

I like 8.9.7 as it has some features I use a lot. I just set my FF not to manually update too.

I was testing extracting both emails and phone numbers from craigs list ads. I can extract both up until I start getting captchas. Ive been doing research for the past 2 days. Just last night I was up doing this 10pm to 9am believe it or not. I think I've read just about every post on the internet, some a similar but I still wasn't able to implement it. I got a lot done but still can't get past the not getting duplicate emails/numbers. Probably because it requires javascript and I dont know the language. Im trying not to contact the same user selling what I'm looking for on CL more than once. Now I could simple clean my CSV file manually after I get some email/# extracted but thats not what I'm tying to accomplish.

I would love to make this macro completely autonomous:
search-->if find-->extract-->contact-->wait sec...repeat
if duplicate found-->NOT extract-->restart macro


.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

Post by chivracq » Fri Aug 25, 2017 7:35 am

myima wrote:I like 8.9.7 as it has some features I use a lot. I just set my FF not to manually update too.

I was testing extracting both emails and phone numbers from craigs list ads.
I can extract both up until I start getting captchas. Ive been doing research for the past 2 days.

Just last night I was up doing this 10pm to 9am believe it or not.
I think I've read just about every post on the internet, some a similar but I still wasn't able to implement it. I got a lot done but still can't get past the not getting duplicate emails/numbers. Probably because it requires javascript and I dont know the language. Im trying not to contact the same user selling what I'm looking for on CL more than once. Now I could simple clean my CSV file manually after I get some email/# extracted but thats not what I'm tying to accomplish.

I would love to make this macro completely autonomous:
search-->if find-->extract-->contact-->wait sec...repeat
if duplicate found-->NOT extract-->restart macro.
Hum... still not completely convinced to help you, first I don't help for Captchas which are meant as anti-Web-Automation Measure, I'm of course able to bypass them all, but I respect the "Idea" for "normal Users"...

And then mail Users, grrr..., this is Spam for me, sorry...! :shock:
I don't do Spam/Like/Follow/Comment/Games/Votes/Hacking/DDos, oops...! 8)

But your Case is fairly easy, a simple Extract + 'EVAL()', ah-ah...!
As I said, I've already posted several Solutions for your Scenario... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
myima
Posts: 8
Joined: Fri Jan 08, 2016 9:51 am

Re: Extract email but only if not a duplicate in my CVS file

Post by myima » Mon Aug 28, 2017 8:21 pm

I'm really not doing any spamming, I'm just automating searching for an item on craigs-list so when one does come out for sale I contact the seller automatically.

Do you mind letting me know what I should search for under your posts? You have over 6000 posts... I'm willing to read all topics dealing with this
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

Post by chivracq » Tue Aug 29, 2017 1:26 pm

myima wrote:I'm really not doing any spamming, I'm just automating searching for an item on craigs-list so when one does come out for sale I contact the seller automatically.

Do you mind letting me know what I should search for under your posts? You have over 6000 posts... I'm willing to read all topics dealing with this
Oh well, then look at the following Thread for example, where I ended up writing a complete Script for the User as I found the Case "interesting", but there are other similar Threads on the Forum as well and a few other Methods that I probably have mentioned in that Thread as well...
The Link will take you to the Post with the final Script, but you'll need to read the whole Thread which is a bit long (4 Pages :oops: ) to understand a bit the whole Idea..., and to adapt it a bit to your Needs...:
- Re: exclude content of a txt/csv file when running script

Good luck and post your final Script once you've gotten it to work, to make this Thread a bit useful for other Users as well... 8)

EDIT: I had forgotten to place the Link to the Thread... :oops:
Last edited by chivracq on Wed Aug 30, 2017 9:32 am, edited 1 time in total.
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
myima
Posts: 8
Joined: Fri Jan 08, 2016 9:51 am

Re: Extract email but only if not a duplicate in my CVS file

Post by myima » Wed Aug 30, 2017 2:29 am

I think you forgot to add the link to the thread you were referring to. And yes if I figure it out I will post my end results :)

.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

Post by chivracq » Wed Aug 30, 2017 9:36 am

myima wrote:I think you forgot to add the link to the thread you were referring to. And yes if I figure it out I will post my end results. :)
Hum..., that's indeed a judicious Observation, ah-ah...! Oops, sorry about that... :oops:
Previous Post edited and Link added... :wink:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

Post by chivracq » Wed Aug 30, 2017 11:20 am

Hum..., and like I said, there are quite a few similar/relevant Threads already on the Forum, here are just another 2 I just come across...:
- Re: Checking for a word in .csv file
- Re: Get number of lines from CSV and use as variable?
(For the 2nd Thread, it doesn't sound like it but it's the same Method...)

Oh...!, and here, another one...!:
- Re: If equals info in excel cell then...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
almodoen
Posts: 11
Joined: Tue Sep 05, 2017 9:26 pm

Re: Extract email but only if not a duplicate in my CVS file

Post by almodoen » Wed Sep 06, 2017 12:39 am

myima wrote:Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:

Code: Select all

TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv
How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :)


.

you have to check every line in your csv file and see if it = extraced email if not then add it to the csv file

i can make loop for you , you have to use imacros via Javascript ..
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extract email but only if not a duplicate in my CVS file

Post by chivracq » Wed Sep 06, 2017 12:56 am

almodoen wrote:
myima wrote:Hi I am going nuts trying to figure out how to extract email from a website but only if that email isn't already in my CVS file. If the email is already there can I stop the iMacro and have it restart from the begging?

To extract and save I'm using and it works:

Code: Select all

TAG POS=1 TYPE=P ATTR=CLASS:"tel-number" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=/Users/Me FILE=cl-num.csv
How would I go about checking if the email is already in my CVS? And if it is make the iMacro restart?

Thanks everyone :).
you have to check every line in your csv file and see if it = extraced email if not then add it to the csv file

i can make loop for you , you have to use imacros via Javascript ..
"... you have to use imacros via Javascript...", hum..., that's not completely correct..., and not the "best" Solution anyway... (unless you only have 10-20 Rows in your '.CSV'...)
In one of the Threads I've placed a direct Link to in my previous Post, I posted a Method in pure '.iim' that can handle 1000 Rows in a '.CSV' in 0.5 Sec while a Solution with looping through the '.CSV' from a '.js' Script will need 2-3 Min to handle the same 1000 Rows...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply