Having issue with extracting information by following links

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Having issue with extracting information by following links

by Loxigan on Wed Nov 09, 2016 11:12 am

Imacros version: 10
Win 10
The sample scripts work and I am using visual studio to code the scripts.

This is what I am trying to do and it is for personal use so that I can choose the best dentist for my needs. I want to go to google and extract their healthgrade, vitals.com ratings. For the healthgrade ratings, sometimes the rating is right on the google search results page and I sometimes have to click on the results link to scrape the ratings from the actual healthgrades site. For vitals.com ratings I will always have to click on the link to get the ratings from vitals.com. The issue I'm having is this: I can can follow the correct links to vitals.com and healthgrades.com and extract the info but for some reason when I issue the BACK command to navigate back to the previous google results page to continue, the extract variable it seems is completely cleared and I only get the last extraction info into the csv file. Is imacros capable of extracting info by following a multi level link structure? (e.g. is it possible to retain the extracted data when you navigate into and out of links in the same macro?

Thanks.

Macro to extract the dentist's name and other important data from the search results page of dentaquest.com:

Code: Select all
VERSION BUILD=10022823
TAB OPEN
TAB T=1

TAG POS={{I}} TYPE=TD ATTR=* EXTRACT=TXT
SET name EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

SET mypos {{I}}
ADD mypos 2

TAG POS={{mypos}} TYPE=TD ATTR=* EXTRACT=TXT
SET denttype EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

ADD mypos 2
TAG POS={{mypos}} TYPE=TD ATTR=* EXTRACT=TXT
SET address EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

ADD mypos 1
TAG POS={{mypos}} TYPE=TD ATTR=* EXTRACT=TXT
SET phone EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

ADD !EXTRACT {{name}}
ADD !EXTRACT {{denttype}}
ADD !EXTRACT {{address}}
ADD !EXTRACT {{phone}}


Macro to extract info from google search results and by following relevant search results links:

Code: Select all
VERSION BUILD=10022823
TAB T=2
SET !TIMEOUT_PAGE 3

URL GOTO = https://www.google.com/#q={{name}}

WAIT SECONDS=1

'set initial anchor
TAG POS=1 TYPE=A ATTR=HREF:*healthgrades* EXTRACT=TXT
SET !EXTRACT NULL

'set search boundry
TAG POS=R1 TYPE=SPAN ATTR=CLASS:st EXTRACT=TXT
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
SET !EXTRACT NULL

'reset anchor
TAG POS=1 TYPE=A ATTR=HREF:*healthgrades* EXTRACT=TXT
SET !EXTRACT NULL

'extract ratings for healthgrades
TAG POS=R1 TYPE=DIV ATTR=CLASS:f<SP>slp EXTRACT=TXT
SET ratings EVAL("var extract = \"{{!EXTRACT}}\"; extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

'follow zocdoc link and extra
TAG POS=1 TYPE=A ATTR=TXT:*Vitals*

'extract number of rating stars
TAG POS=1 TYPE=SPAN ATTR=CLASS:score<SP>overview-number EXTRACT=TXT
SET vitalstars EVAL("var extract = \"{{!EXTRACT}}\"; extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

'extract number of votes
TAG POS=1 TYPE=A ATTR=HREF:*reviews* EXTRACT=TXT
SET vitalnum EVAL("var extract = \"{{!EXTRACT}}\"; extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

WAIT SECONDS=1

BACK

ADD !EXTRACT {{name}}
ADD !EXTRACT {{denttype}}
ADD !EXTRACT {{address}}
ADD !EXTRACT {{phone}}
ADD !EXTRACT {{vitralstars}}
ADD !EXTRACT {{vitralnum}}

SAVEAS TYPE=EXTRACT FOLDER=C:\\Users\Helix\Documents\iMacros\Macros FILE=test.csv

TAB T=1


C# code:

Code: Select all
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace dentamacro
{
    class Program
    {
        static void Main(string[] args)
        {
            var i = 5;

            int timeout = 1;
            // Add a reference to iMacros Scripting Interface COM in your project to be able to access control iMacros
            // If you target .NET 4.0, set "Embed Interop Types" to false, in the reference to the iMacros Interface COM
            iMacros.Status status;
            iMacros.App app = new iMacros.App();

            app.iimOpen("-V7", false, timeout);

            while (i < 87)
            {
                app.iimSet("I", i.ToString());
                app.iimPlay("C:\\Users\\Helix\\Documents\\iMacros\\Macros\\extractname.iim");
               
                var name =  app.iimGetLastExtract(1);
                var denttype = app.iimGetLastExtract(2);
                var address = app.iimGetLastExtract(3);
                var phone = app.iimGetLastExtract(4);

                if (name == "" || name == "#EANF#")
                {
                    break;
                }

                app.iimSet("name", name);
                app.iimSet("denttype", denttype);
                app.iimSet("address", address);
                app.iimSet("phone", phone);

                Console.WriteLine(denttype);

                app.iimPlay("C:\\Users\\Helix\\Documents\\iMacros\\Macros\\searchname.iim");

                i = i + 9;

            }


        }
    }
}
Loxigan
 
Posts: 7
Joined: Wed Nov 09, 2016 10:37 am

Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 3 guests

-->