Save extracted DIV to new page/file

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
mattiewae2
Posts: 1
Joined: Sun Dec 30, 2012 2:32 pm

Save extracted DIV to new page/file

Post by mattiewae2 » Sun Dec 30, 2012 2:43 pm

Hi!

http://www.vitisvitae.be/

From this page I would like to extract the div element content.

Code: Select all

VERSION BUILD=7601105 RECORDER=FX
VERSION BUILD=6600525     
TAB T=1        
SET !EXTRACT_TEST_POPUP NO
TAG POS=1 TYPE=DIV ATTR=ID:content EXTRACT=HTM
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+{{!URLCURRENT}}.htm
In this version I only have text in a csv file, I need the html of the extracted div.

Code: Select all

VERSION BUILD=7601105 RECORDER=FX
TAB T=1
TAG POS=1 TYPE=DIV ATTR=ID:content
SAVEAS TYPE=HTM FOLDER=* FILE=+{{!URLCURRENT}}
In this version I can select the div element I want to extract but it saves the whole page.
I only want to save the div element.

It would be realy nice if someone could point me in the right direction.

Thanks!
Lantus
Posts: 3
Joined: Tue Jan 19, 2016 5:40 pm

Re: Save extracted DIV to new page/file

Post by Lantus » Mon Dec 17, 2018 11:47 pm

There is a way to do it with a javascript macro...

Something like:

Code: Select all

function writeFile(path,string,exact){//<versao>1.1</versao>
    //http://stackoverflow.com/questions/14677247/imacro-setting-variable-saveas-csv
    //import FileUtils.jsm
    Components.utils.import("resource://gre/modules/FileUtils.jsm");
    //declare file
    var file = new FileUtils.File(path);

    //declare file path
    file.initWithPath(path);

    //if it exists move on if not create it
    if (!file.exists()){
    	file.create(file.NORMAL_FILE_TYPE, 0666);
    }

    var charset = 'UTF-8';
    var fileStream = Components.classes['@mozilla.org/network/file-output-stream;1']
    .createInstance(Components.interfaces.nsIFileOutputStream);
    fileStream.init(file, 18, 0x200, false);
    var converterStream = Components
    .classes['@mozilla.org/intl/converter-output-stream;1']
    .createInstance(Components.interfaces.nsIConverterOutputStream);
    converterStream.init(fileStream, charset, string.length,
    Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);

    //write file to location
    if(!exact) string = "\r\n"+string;
    converterStream.writeString(string); 
    converterStream.close();
    fileStream.close();
 } 

var string = window.content.document.getElementById("content").innerHTML; //or outerHTML 
var path = 'C:\content.html';
var exact = true;
writeFile(path,string,exact);

It won't be a html file, because there are no opening or closing of html, head and body. But you can add them to the string:

Code: Select all


string = '<html><head></head><body>'+string+'</body>';

The question is very old, and probably solved. But stays here for people with the same issues.
Post Reply