Looping and Extracting

Discussions and Tech Support specific to the iMacros Firefox add-on.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Looping and Extracting

Post by Hello 71 » Tue Mar 04, 2008 1:22 am

I imagine something with Javascript to loop through...
Something that extracts everything enclosed in

Code: Select all

<td class="nowrap">
*4 alphabetic characters, then a hyphen, then 6 numeric characters, then 4 more alphabetic characters* OR
*4 characters* only if #1 is totally unsupported

Code: Select all

</td>
then extracts them to a text file for direct copying...
So, in steps, extract, then paste to a text file, then if possible, call a Javascript function embedded in the page.
Last edited by Hello 71 on Wed Mar 05, 2008 1:43 am, edited 1 time in total.
User avatar
Tech Support
Posts: 4947
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Tue Mar 04, 2008 10:43 pm

Can you please post the URL of the web page? This would allow us to quickly re-create the issue on our test systems.
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Login URL

Post by Hello 71 » Wed Mar 05, 2008 1:37 am

"HTTP error 200: The page you have reached requires authentication."
However, here's a macro to access a temporary account: http://tinyurl.com/28xewv
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Goes in Data Extraction and Web Screen Scraping?

Post by Hello 71 » Fri Mar 07, 2008 2:52 pm

Does anyone think this should go in "Data Extraction and Web Screen Scraping"? (Because this post is about "Data Extraction")
joe_brown
Posts: 33
Joined: Mon Aug 06, 2007 8:49 pm

Post by joe_brown » Sat Mar 08, 2008 6:44 pm

The URL you've provided (http://tinyurl.com/28xewv) is useless, since it requires your master-password to be entered to be able to decrypt the user-password needed to login to that site.
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Sorry.

Post by Hello 71 » Sat Mar 08, 2008 6:49 pm

Sorry. Better one: http://tinyurl.com/32u3tk
joe_brown
Posts: 33
Joined: Mon Aug 06, 2007 8:49 pm

Post by joe_brown » Sat Mar 08, 2008 7:05 pm

Where on that site do you have the table you want to extract the data from?

Your second iMacro just does the login.
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Whoops (Again)

Post by Hello 71 » Sat Mar 08, 2008 7:15 pm

I forgot that this new account (that I just made) doesn't have any.

Relevant Code:

Code: Select all

</tr><tr>
		<td class="nowrap">AAAA-123456-BBBB</td><td>No</td><td>0</td><td class="nowrap">29-01-2008</td>
	</tr><tr>
		<td class="nowrap">BBST-765075-HHXN</td><td>Yes</td><td>5</td><td class="nowrap">22-02-2008</td>
	</tr><tr>
		<td class="nowrap">BFXJ-493794-NVHM</td><td>No</td><td>0</td><td class="nowrap">08-03-2008</td>
	</tr><tr>
		<td class="nowrap">BFXJ-493794-NVHW</td><td>Yes</td><td>5</td><td class="nowrap">08-03-2008</td>
	</tr><tr>
		<td class="nowrap">BLUL-472445-BWRW</td><td>No</td><td>0</td><td class="nowrap">27-09-2007</td>
	</tr><tr>
		<td class="nowrap">BLVK-782412-HMCT</td><td>Yes</td><td>5</td><td class="nowrap">03-10-2007</td>
	</tr><tr>
		<td class="nowrap">BLVL-472445-BWRW</td><td>Yes</td><td>5</td><td class="nowrap">28-09-2007</td>
	</tr><tr>
		<td class="nowrap">BPVR-026530-BCNZ</td><td>Yes</td><td>5</td><td class="nowrap">14-02-2008</td>
	</tr>
So, extract these (example) values:

Code: Select all

AAAA-123456-BBBB,BBST-765075-HHXN,BFXJ-493794-NVHM,BFXJ-493794-NVHW,BLUL-472445-BWRW...
joe_brown
Posts: 33
Joined: Mon Aug 06, 2007 8:49 pm

Post by joe_brown » Sat Mar 08, 2008 10:04 pm

Here is a generic piece of code to extract just one column from a table into a file (extract_table_col.js)

Test-case:
----------
$imacros_dir\extract_table_col.js

Code: Select all

// --- user setup ---
var myurl='file://D:/Hello71.htm',  // The url for the page with the table to extract
    out_file='D:/extracted_table.csv',  // Name of the Output-file for the extraction
    column=1,  // which column from the table should be extracted
    table_ident_tag='TAG POS=1 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT',  // table-identifier (in case of multiple tables on the page allows to choose the right one)
    out_delimiter=',';  // output delimiter. For all-in-1-line use ',' For each-on-its-line use "\n", etc ..

// --- system setup ---
var nl="\n",  // NewLine
    RE001=/^["](.*)["]$/,  // regexp to remove the first and last '"'
    extract,rows,rows_count,entry,colext=column-1;  // Local declarations to avoid conflicts with iMacros JS-code

// --- functions ---

// filewrite
function fwrite(
 fname,  // [string] filename to write to
 str,  // [string] string to write
 append  // [bool] true=append, false=overwrite
){
  var fos,e;
  try{
    fos=new java.io.FileOutputStream(fname,append);
    fos.write(java.lang.String(''+str).getBytes());
    fos.close();
    fos=null;
  }catch(e){
    alert(''+e);
    return 1;
  }
  return 0;
};

// --- main ---

iimDisplay('Start.');
iimPlay('CODE:'+
        'URL GOTO='+myurl+nl+
        table_ident_tag
       );
extract=iimGetLastExtract(1);
rows=extract.split("\n");extract=null;
rows_count=rows.length;
fwrite(out_file,'',false);  // truncating the output-file
for(var i=0;i<rows_count;i++){
  entry=RE001.exec(rows[i].split(',')[colext]);
  if(entry && entry.length>=2) fwrite(out_file,entry[1]+out_delimiter,true);
}
iimDisplay('Done.'+nl+'See output-file'+nl+out_file);

D:\Hello71.htm

Code: Select all

<html>
<table>
<tr>
      <td class="nowrap">AAAA-123456-BBBB</td><td>No</td><td>0</td><td class="nowrap">29-01-2008</td>
   </tr><tr>
      <td class="nowrap">BBST-765075-HHXN</td><td>Yes</td><td>5</td><td class="nowrap">22-02-2008</td>
   </tr><tr>
      <td class="nowrap">BFXJ-493794-NVHM</td><td>No</td><td>0</td><td class="nowrap">08-03-2008</td>
   </tr><tr>
      <td class="nowrap">BFXJ-493794-NVHW</td><td>Yes</td><td>5</td><td class="nowrap">08-03-2008</td>
   </tr><tr>
      <td class="nowrap">BLUL-472445-BWRW</td><td>No</td><td>0</td><td class="nowrap">27-09-2007</td>
   </tr><tr>
      <td class="nowrap">BLVK-782412-HMCT</td><td>Yes</td><td>5</td><td class="nowrap">03-10-2007</td>
   </tr><tr>
      <td class="nowrap">BLVL-472445-BWRW</td><td>Yes</td><td>5</td><td class="nowrap">28-09-2007</td>
   </tr><tr>
      <td class="nowrap">BPVR-026530-BCNZ</td><td>Yes</td><td>5</td><td class="nowrap">14-02-2008</td>
   </tr>
</table>
</html>
[Refresh Macro List], Run the extract_table_col.js iMacro
----------

Cheers,
Joe
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Two issues

Post by Hello 71 » Sat Mar 08, 2008 10:38 pm

Not sure if this has anything to do with your code, but why is it that if I use \n for the delimiter it ignores it? And also, could you add 2 regular expressions to make it so that "PIN Code" and "[a number]" aren't included in the text? Also, would it be possible to call a javascript function embedded in the web page (__doPostBack('ctl00$content$dgPINS$ctl12$ctl[a number that starts at 01, then goes up by 1s to 10, then goes right back to 01, then goes back up by 1s to 10]), then extract the PINs in that page, then do it all over again? So, in steps,
  1. Original code
  2. 2 regexps to remove "PIN Code and [a number] from the text file
  3. Call __doPostBack('ctl00$content$dgPINS$ctl12$ctl[a number that starts at 01, then goes up to 10, then goes right back to 01, then goes back up by 1s to 10]
  4. Increment a set variable which is appended to the embedded javascript function
  5. Start all over again, until #3 goes back up to 10
  6. Hopefully, play a ding sound
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Move up

Post by Hello 71 » Sun Mar 09, 2008 8:54 pm

Moving up...
joe_brown
Posts: 33
Joined: Mon Aug 06, 2007 8:49 pm

Post by joe_brown » Sun Mar 09, 2008 9:15 pm

You don't have a specific problem with iMacros - you are just trying to get someone else to write the code for you.

People, who are able to write code, do that mostly for money (unless they are doing it for themselves of course), thus I'd suggest, if you are not able to write that code yourself, you get someone who is able and willing to do it for you for the amount of money you are willing to invest into the project.

I've given you my 3 hours (the first working attempt took me 15 minutes, but it wasn't elegant and generic .. to make it such, it took me the remaining time) - and that's all I'm ready to give you.

In your request there is nothing, that would be impossible to code - even though I admit, it may be tricky (not impossible) to call the foreign JS-function.

Good luck.

Cheers,
Joe
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Ah...

Post by Hello 71 » Sun Mar 09, 2008 11:46 pm

Ah... I'll probably get someone pretty easily. (I hope.) And thanks. Hopefully, the person I get will get the point from your code.
killer_mpg
Posts: 9
Joined: Wed Mar 12, 2008 12:56 am

Re: Looping and Extracting

Post by killer_mpg » Sat Mar 15, 2008 6:58 am

I was curious about the call to

fos=new java.io.FileOutputStream(fname,append);

and i tried to run this and it could not find 'java'.

I though that java and javascript were different and you could not call java stuff from javascript.
can anyone clear this up
Hello 71
Posts: 96
Joined: Mon Feb 18, 2008 12:06 am
Location: Toronto, ON, Canada

Re: Looping and Extracting

Post by Hello 71 » Sat Mar 15, 2008 3:00 pm

Sort of. I think the definition of ECMAscript says you cannot, in general, but I think why this works is because the definition of JScript (Microsofts interpretation of ECMAscript) defines those classes. I think Microsoft thought that for intercapability (?!?) they put in those classes. So, in one phrase, yes and no. :|
Why yours doesn't work is probably because your browser is more [secure?] than IE, or even Firefox. Firefox just conjures an exception saying "Exception.Security.SecurityException". (Exception.Security.SecurityException? Shouldn't it be Exception.SecurityException?)
Post Reply