Extracting Text and Images from a Website

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting Text and Images from a Website

by Ratfink on Sun Mar 27, 2016 5:47 pm

Hi All,
I am working on getting data from a website, http://www.midfifty.com.
Specifically, I need to get all the information from all the products, text and images, including pricing.
I have listed all the pertinent info below along with the code so far. It works fine however I cant figure out the following:

    How to loop to the next product/item
    How to loop to the next page to get the next page of items and so on.
    How to or the best way to get the product images and a way of attaching them, referring to them or saving them to the csv file from the extracted text?

I need to be able to get all the product data with images on all the pages. I can figure out how to get info on one product but not all. I can figure out how to get one image but not all and how would I capture the images and refer them to the part number text capture in the csv file?

Thankyou in advance, struggling a bit.


iMacros Version is 10.0.2.2823
Operating System is Windows Server 2012R2, English
Browser is IE 11, with latest updates.
iMacros Demos OK
iMacros Scripting Interface is OK
URL; http://www.midfifty.com
iMacro Code is listed below
VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://midfifty.com/store.php
TAG POS=1 TYPE=A ATTR=TXT:57-60<SP>Parts
TAG POS=1 TYPE=DIV ATTR=TXT:Bed<SP>Insulators,<SP>Hold<SP>Down<SP>Pads
TAG POS=1 TYPE=DIV ATTR=CLASS:large EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:price<SP>bld<SP>lrgr EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:bld<SP>cntr EXTRACT=TXT
TAG POS=25 TYPE=TD ATTR=* EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:cntr EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:price<SP>cntr<SP>lrgr<SP>bld EXTRACT=TXT
TAG POS=2 TYPE=DIV ATTR=CLASS:cntr EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Extract_{{!NOW:ddmmyy_hhnnss}}.csv
Ratfink
 
Posts: 5
Joined: Mon Jan 18, 2016 11:52 pm

Re: Extracting Text and Images from a Website

by IrishMacro on Thu Apr 07, 2016 1:44 am

For saving an image try http://wiki.imacros.net/SAVEPICTUREAS

Before you save the picture, extract its href to an excel sheet by simply clicking to create a tag

TAG POS=1 TYPE=IMG ATTR=SRC:http://m.midfifty.com/partpics/147-*jpg EXTRACT=HREF
SAVEAS TYPE=EXTRACT FOLDER=* FILE=resultsfile.csv
TAG POS=1 TYPE=IMG ATTR=SRC:http://m.midfifty.com/partpics/147-*jpg CONTENT=EVENT:SAVEPICTUREAS
Look for the image in your Downloads folder under the imacros folder
Firefox free plugin, last version
Win7
IrishMacro
 
Posts: 135
Joined: Wed Nov 03, 2010 5:27 am

Re: Extracting Text and Images from a Website

by NoraChoi on Tue May 31, 2016 7:38 pm

Ratfink wrote:Hi All,
I am working on getting data from a website, http://www.midfifty.com.
Specifically, I need to get all the information from all the products, text and images, including pricing.
I have listed all the pertinent info below along with the code so far. It works fine however I cant figure out the following:

    How to loop to the next product/item
    How to loop to the next page to get the next page of items and so on.
    How to or the best way to get the product images and a way of attaching them, referring to them or saving them to the csv file from the extracted text?

I need to be able to get all the product data with images on all the pages. I can figure out how to get info on one product but not all. I can figure out how to get one image but not all and how would I capture the images and refer them to the part number text capture in the csv file?

Thankyou in advance, struggling a bit.


iMacros Version is 10.0.2.2823
Operating System is Windows Server 2012R2, English
Browser is IE 11, with latest updates.
iMacros Demos OK
iMacros Scripting Interface is OK
URL; http://www.midfifty.com
iMacro Code is listed below
VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://midfifty.com/store.php
TAG POS=1 TYPE=A ATTR=TXT:57-60<SP>Parts
TAG POS=1 TYPE=DIV ATTR=TXT:Bed<SP>Insulators,<SP>Hold<SP>Down<SP>Pads
TAG POS=1 TYPE=DIV ATTR=CLASS:large EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:price<SP>bld<SP>lrgr EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:bld<SP>cntr EXTRACT=TXT
TAG POS=25 TYPE=TD ATTR=* EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:cntr EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:price<SP>cntr<SP>lrgr<SP>bld EXTRACT=TXT
TAG POS=2 TYPE=DIV ATTR=CLASS:cntr EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Extract_{{!NOW:ddmmyy_hhnnss}}.csv



Hi Ratfink,

I've opened the site and if you just need the hrefs of all the images, this tool http://www.octoparse.com/?fo may help you.

You can use it extract all the information from all the products and related text, including pricing.

I recommend you check out the tutorials http://www.octoparse.com/Tutorial as it might spark some ideas for you. :wink:
NoraChoi
 
Posts: 6
Joined: Wed May 25, 2016 11:14 pm

Re: Extracting Text and Images from a Website

by janib4all on Tue Jun 07, 2016 1:28 pm

Check for the JS tutorials; Octopus and other visual grabbing tools are out there but sometime you need more control over scrapping - use iMacro or Python instead.
Hire the BoT-fReeak!
botspecialist.blogspot.com
janib4all
 
Posts: 132
Joined: Tue Jul 20, 2010 11:44 pm
Location: Karachi, Sindh, Pakistan


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 7 guests

-->