Having issue with extracting information by following links

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information:CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
Loxigan
Posts: 7
Joined: Wed Nov 09, 2016 5:37 pm

Having issue with extracting information by following links

Post by Loxigan » Wed Nov 09, 2016 6:12 pm

Imacros version: 10
Win 10
The sample scripts work and I am using visual studio to code the scripts.

This is what I am trying to do and it is for personal use so that I can choose the best dentist for my needs. I want to go to google and extract their healthgrade, vitals.com ratings. For the healthgrade ratings, sometimes the rating is right on the google search results page and I sometimes have to click on the results link to scrape the ratings from the actual healthgrades site. For vitals.com ratings I will always have to click on the link to get the ratings from vitals.com. The issue I'm having is this: I can can follow the correct links to vitals.com and healthgrades.com and extract the info but for some reason when I issue the BACK command to navigate back to the previous google results page to continue, the extract variable it seems is completely cleared and I only get the last extraction info into the csv file. Is imacros capable of extracting info by following a multi level link structure? (e.g. is it possible to retain the extracted data when you navigate into and out of links in the same macro?

Thanks.

Macro to extract the dentist's name and other important data from the search results page of dentaquest.com:

Code: Select all

VERSION BUILD=10022823
TAB OPEN 
TAB T=1

TAG POS={{I}} TYPE=TD ATTR=* EXTRACT=TXT
SET name EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

SET mypos {{I}}
ADD mypos 2

TAG POS={{mypos}} TYPE=TD ATTR=* EXTRACT=TXT
SET denttype EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

ADD mypos 2
TAG POS={{mypos}} TYPE=TD ATTR=* EXTRACT=TXT
SET address EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

ADD mypos 1
TAG POS={{mypos}} TYPE=TD ATTR=* EXTRACT=TXT
SET phone EVAL("var extract = \"{{!EXTRACT}}\"; if (extract == \"#EANF#\") MacroError(\"No more listings on this page\"); else extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

ADD !EXTRACT {{name}}
ADD !EXTRACT {{denttype}}
ADD !EXTRACT {{address}}
ADD !EXTRACT {{phone}}


Macro to extract info from google search results and by following relevant search results links:

Code: Select all

VERSION BUILD=10022823
TAB T=2
SET !TIMEOUT_PAGE 3

URL GOTO = https://www.google.com/#q={{name}}

WAIT SECONDS=1

'set initial anchor
TAG POS=1 TYPE=A ATTR=HREF:*healthgrades* EXTRACT=TXT
SET !EXTRACT NULL

'set search boundry
TAG POS=R1 TYPE=SPAN ATTR=CLASS:st EXTRACT=TXT
SET !ENDOFPAGE {{!TAGSOURCEINDEX}}
SET !EXTRACT NULL

'reset anchor
TAG POS=1 TYPE=A ATTR=HREF:*healthgrades* EXTRACT=TXT
SET !EXTRACT NULL

'extract ratings for healthgrades
TAG POS=R1 TYPE=DIV ATTR=CLASS:f<SP>slp EXTRACT=TXT
SET ratings EVAL("var extract = \"{{!EXTRACT}}\"; extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

'follow zocdoc link and extra
TAG POS=1 TYPE=A ATTR=TXT:*Vitals*

'extract number of rating stars
TAG POS=1 TYPE=SPAN ATTR=CLASS:score<SP>overview-number EXTRACT=TXT
SET vitalstars EVAL("var extract = \"{{!EXTRACT}}\"; extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

'extract number of votes
TAG POS=1 TYPE=A ATTR=HREF:*reviews* EXTRACT=TXT
SET vitalnum EVAL("var extract = \"{{!EXTRACT}}\"; extract.replace(\"#EANF#\", \"\").replace(/^\\s*|,\\s*$/g, \"\");")
SET !EXTRACT NULL

WAIT SECONDS=1

BACK

ADD !EXTRACT {{name}}
ADD !EXTRACT {{denttype}}
ADD !EXTRACT {{address}}
ADD !EXTRACT {{phone}}
ADD !EXTRACT {{vitralstars}}
ADD !EXTRACT {{vitralnum}}

SAVEAS TYPE=EXTRACT FOLDER=C:\\Users\Helix\Documents\iMacros\Macros FILE=test.csv

TAB T=1
C# code:

Code: Select all

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace dentamacro
{
    class Program
    {
        static void Main(string[] args)
        {
            var i = 5;

            int timeout = 1;
            // Add a reference to iMacros Scripting Interface COM in your project to be able to access control iMacros
            // If you target .NET 4.0, set "Embed Interop Types" to false, in the reference to the iMacros Interface COM
            iMacros.Status status;
            iMacros.App app = new iMacros.App();

            app.iimOpen("-V7", false, timeout);

            while (i < 87)
            {
                app.iimSet("I", i.ToString());
                app.iimPlay("C:\\Users\\Helix\\Documents\\iMacros\\Macros\\extractname.iim");
                
                var name =  app.iimGetLastExtract(1);
                var denttype = app.iimGetLastExtract(2);
                var address = app.iimGetLastExtract(3);
                var phone = app.iimGetLastExtract(4);

                if (name == "" || name == "#EANF#")
                {
                    break;
                }

                app.iimSet("name", name);
                app.iimSet("denttype", denttype);
                app.iimSet("address", address);
                app.iimSet("phone", phone);

                Console.WriteLine(denttype);

                app.iimPlay("C:\\Users\\Helix\\Documents\\iMacros\\Macros\\searchname.iim");

                i = i + 9;

            }


        }
    }
}
Post Reply