How to across new line scraping?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
bfdxi
Posts: 7
Joined: Wed Jun 19, 2013 12:07 pm

How to across new line scraping?

Post by bfdxi » Wed Jun 19, 2013 12:17 pm

I try to scrape data from bestbuy.com with this command:

macro2 += "TAG POS="+ i +" TYPE=H3 ATTR=itemprop:name* EXTRACT=TXT" + jsLF;


And this is a source code:

<h3 id="name_1218849839811" itemprop="name">




<a rel="product" href="/site/Insignia%26%23153%3B+-+32%26%2334%3B+Class+(31-1/2%26%2334%3B+Diag.)+-+LED+-+720p+-+60Hz+-+HDTV/7521045.p?id=1218849839811&skuId=7521045">
Insignia™ - 32" Class (31-1/2" Diag.) - LED - 720p - 60Hz - HDTV</a></h3>


However when I open a file and see on this column, I get a data following:

"





Dynex™ - 32"" Class (31-1/2"" Diag.) - LCD - 720p - 60Hz - HDTV"


I want to know how to scrape with out " and new line.

Thank you ....
User avatar
FoxDot
Posts: 20
Joined: Thu Jun 13, 2013 2:31 pm
Location: Chisinau
Contact:

Re: How to across new line scraping?

Post by FoxDot » Thu Jun 20, 2013 1:26 pm

Simple:

Code: Select all

iimGetLastExtract().replace(/\s\s/g,"");
The Fox Will Find You!
bfdxi
Posts: 7
Joined: Wed Jun 19, 2013 12:07 pm

Re: How to across new line scraping?

Post by bfdxi » Fri Jun 21, 2013 3:30 am

FoxDot wrote:Simple:

Code: Select all

iimGetLastExtract().replace(/\s\s/g,"");
Thank for your help.

I try to add this function to my js:

Code: Select all

var GetAmazon;
var getRaw;
var jsLF = "\n";

macro2 ="CODE:";
macro2 += "SET !ERRORIGNORE YES"	+ jsLF;
macro2 += "SET !ERRORCONTINUE YES" +	jsLF;
macro2 += "SET !EXTRACT_TEST_POPUP NO" +	jsLF;

pagenumber= 1000;
startpage = 1;

while(startpage < pagenumber)
{

for(i=1;i<=50;i++){
macro2 += "TAG POS="+ i +" TYPE=IMG ATTR=class:thumb* EXTRACT=HREF" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=H3 ATTR=itemprop:name* EXTRACT=TXT" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=A ATTR=itemprop:url* EXTRACT=HREF" + jsLF;
macro2 += "SAVEAS TYPE=EXTRACT FOLDER=* FILE=test.csv" + jsLF;
}
macro2 += "TAG POS=1 TYPE=A ATTR=TXT:Next<SP>Page"   + jsLF;
getRaw = iimGetLastExtract().replace(/\s\s/g,"");
GetAmazon= iimPlay(macro2);
}
macro2 += "CLEAR" +	jsLF;
And this is a page which I want to fetch a data

Code: Select all

http://www.bestbuy.com/site/olstemplatemapper.jsp?_dyncharset=ISO-8859-1&_dynSessConf=5242784900518619985&id=pcat17080&type=page&lcn=TV+%26+Home+Theater&sc=TVVideoSP&st=processingtime%3A%3E1900-01-01&usc=abcat0100000&cp=1&sp=-bestsellingsort+skuid&nrp=60&qp=cabcat0100000%23%230%23%23wv~~cabcat0101000%23%23-1%23%23wv~~q466173746c696d69747067735f323236~~nf519||24323030202d20243234392e3939&add_to_pkg=false&pagetype=listing&gf=y
However, when I run a js, it's a same result is COL2 title which I fetch from H3 itemprop=name has a multiple break lines.
User avatar
FoxDot
Posts: 20
Joined: Thu Jun 13, 2013 2:31 pm
Location: Chisinau
Contact:

Re: How to across new line scraping?

Post by FoxDot » Tue Jun 25, 2013 10:02 am

You're using it in the wrong place, you're saving your data to *.csv before using replace function ))

You should extract:

Code: Select all

macro2 += "TAG POS="+ i +" TYPE=IMG ATTR=class:thumb* EXTRACT=HREF" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=H3 ATTR=itemprop:name* EXTRACT=TXT" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=A ATTR=itemprop:url* EXTRACT=HREF" + jsLF;
iimPlay(macro2);
Then Save

Code: Select all

getRaw = iimGetLastExtract().replace(/\s\s/g,"");
iimSet("save", getRaw);
iimPlay("CODE:SET !EXTRACT {{save}}\nSAVEAS TYPE=EXTRACT FOLDER=* FILE=test.csv");
The Fox Will Find You!
bfdxi
Posts: 7
Joined: Wed Jun 19, 2013 12:07 pm

Re: How to across new line scraping?

Post by bfdxi » Mon Jul 01, 2013 9:35 am

FoxDot wrote:You're using it in the wrong place, you're saving your data to *.csv before using replace function ))

You should extract:

Code: Select all

macro2 += "TAG POS="+ i +" TYPE=IMG ATTR=class:thumb* EXTRACT=HREF" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=H3 ATTR=itemprop:name* EXTRACT=TXT" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=A ATTR=itemprop:url* EXTRACT=HREF" + jsLF;
iimPlay(macro2);
Then Save

Code: Select all

getRaw = iimGetLastExtract().replace(/\s\s/g,"");
iimSet("save", getRaw);
iimPlay("CODE:SET !EXTRACT {{save}}\nSAVEAS TYPE=EXTRACT FOLDER=* FILE=test.csv");
Can you help me to insert its to my .js for hxxp://www.homedepot.com/Appliances-Refrigerat ... mPrice=80-?

Code: Select all

var GetCode;

var jsLF = "\n";

macro2 ="CODE:";
macro2 += "SET !ERRORIGNORE YES"	+ jsLF;
macro2 += "SET !ERRORCONTINUE YES" +	jsLF;
macro2 += "SET !EXTRACT_TEST_POPUP NO" +	jsLF;

pagenumber= 1000;
startpage = 1;

while(startpage < pagenumber)
{

for(i=1;i<=24;i++){
macro2 += "TAG POS="+ i +" TYPE=IMG ATTR=SRC:http://www.homedepot.com/catalog/productImages/*.jpg EXTRACT=TXT" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=A ATTR=CLASS:item_description* EXTRACT=HREF" + jsLF;
macro2 += "TAG POS="+ i +" TYPE=IMG ATTR=SRC:http://www.homedepot.com/catalog/productImages/*.jpg EXTRACT=HREF" + jsLF;
macro2 += "SAVEAS TYPE=EXTRACT FOLDER=* FILE=refrigeration.csv" + jsLF;
}
macro2 += "TAG POS=1 TYPE=IMG ATTR=SRC:http://www.homedepot.com/static/images/layout/triangle-green-right.gif"   + jsLF;
GetCode = iimPlay(macro2);
}
macro2 += "CLEAR" +	jsLF;
I try to insert your code:
getRaw = iimGetLastExtract().replace(/\s\s/g,"");
iimSet("save", getRaw);
iimPlay("CODE:SET !EXTRACT {{save}}\nSAVEAS TYPE=EXTRACT FOLDER=* FILE=test.csv");
below macro2 += "SAVEAS TYPE=EXTRACT FOLDER=* FILE=refrigeration.csv" + jsLF; line, but when open file it wasn't save anything in first column.

Thank very much
User avatar
FoxDot
Posts: 20
Joined: Thu Jun 13, 2013 2:31 pm
Location: Chisinau
Contact:

Re: How to across new line scraping?

Post by FoxDot » Tue Jul 02, 2013 10:22 am

Yes, in the first column you don't have anything, what you're expecting when extract with this code?

Code: Select all

macro2 += "TAG POS="+ i +" TYPE=IMG ATTR=SRC:http://www.homedepot.com/catalog/productImages/*.jpg EXTRACT=TXT" + jsLF;
IMG tag doesn't have any text to be extracted, that's why your 1st column is null.
The Fox Will Find You!
bfdxi
Posts: 7
Joined: Wed Jun 19, 2013 12:07 pm

Re: How to across new line scraping?

Post by bfdxi » Tue Jul 02, 2013 12:14 pm

Never mind, thank for your suggestion :)
Post Reply