Extract data from 'itemprop' with iMacros

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
gman
Posts: 2
Joined: Thu Oct 30, 2014 7:22 am

Extract data from 'itemprop' with iMacros

Post by gman » Thu Oct 30, 2014 7:27 am

Can someone please help me with this? I wrote a script (iMacros) to get data from a website, and it works well.
I am using iMacros for Firefox 8.8.2.
But, the only thing I can not seem to get right, is the extraction of 'itemprop' values. See below a piece of HTML data:

Code: Select all

<div id="data_profile_middle160">
<h3><span>Information</span></h3>
<div><strong>Location:</strong>
<span itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<a href="http://thedjlist.com/world/Netherlands/Amsterdam/" itemprop="addressLocality">Amsterdam</a>,
<a href="http://thedjlist.com/world/netherlands/" itemprop="addressCountry">Netherlands</a>
The iMacros script I am using is:

Code: Select all

VERSION BUILD=6000328
TAB T=1

SET !EXTRACT_TEST_POPUP NO

SET !DATASOURCE sites.csv
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO={{!COL1}}
TAG POS=1 TYPE=DIV ATTR=ID:dj_name&&TXT:* EXTRACT=TXT  
TAG POS=1 TYPE=A ATTR=ITEMPROP:addressLocality&&TXT:* EXTRACT=TXT    
ADD !EXTRACT {{!URLCURRENT}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+{{!NOW:ddmmyyyy}}.csv
WAIT SECONDS=1
The output I get is this:

Code: Select all

"AFROJACK  Send Message","#EANF#","http://thedjlist.com/djs/AFROJACK/"
The URL is: http://thedjlist.com/djs/AFROJACK/

I'd like to extract the value 'Amsterdam' from the itemprop 'addressLocality'. The page is full of itemprop's so I really need to know how to get the values.

Thanks for help in advance.
Snayler
Posts: 12
Joined: Wed Mar 26, 2014 2:12 pm

Re: Extract data from 'itemprop' with iMacros

Post by Snayler » Thu Oct 30, 2014 1:42 pm

I guess the only way to do this is by using javascript and regular expressions on the source code of the page. EVAL seems to do the trick.
gman
Posts: 2
Joined: Thu Oct 30, 2014 7:22 am

Re: Extract data from 'itemprop' with iMacros

Post by gman » Thu Oct 30, 2014 2:48 pm

Snayler wrote:I guess the only way to do this is by using javascript and regular expressions on the source code of the page. EVAL seems to do the trick.
Snayler, thanks for the info. Alas, I am not a programmer at all, so do not have the knowledge to use Javascript.
Snayler
Posts: 12
Joined: Wed Mar 26, 2014 2:12 pm

Re: Extract data from 'itemprop' with iMacros

Post by Snayler » Thu Oct 30, 2014 3:38 pm

Now that I've understood your problem and checked the URL, I have to say: forget javascript, there's an easier way. First, you might want to replace

Code: Select all

TAG POS=1 TYPE=DIV ATTR=ID:dj_name&&TXT:* EXTRACT=TXT  
with

Code: Select all

TAG POS=1 TYPE=SPAN ATTR=ITEMPROP:name EXTRACT=TXT
in order to avoid the "Send message" part.

Now for the location part, I'm having trouble locating the element you're looking for. So I found a better way to extract the location:

Code: Select all

TAG POS=1 TYPE=IMG ATTR=SRC:http://i0.thedjlist.com/*.gif EXTRACT=ALT
With this two changes your code should work.
Post Reply