Is a text present somewhere (anywhere) on the page

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
tata668
Posts: 42
Joined: Sun Jun 14, 2009 2:34 am

Is a text present somewhere (anywhere) on the page

Post by tata668 » Wed Nov 04, 2009 4:02 am

I just want to know if a particular text is present somewhere on the page, without knowing what are the wrapping html tags.

I currently use:

Code: Select all

TAG POS=1 TYPE=* ATTR=TXT:*Hello<SP>World* EXTRACT=TXT
But this code doesn't always work!! Sometimes it returns "#EANF#", even if the text IS present somewhere on the page. When I use a more precise "TYPE" (ie: TYPE=DIV), it works.

What is the correct way to check if a text is present on the page, wihtout knowing where exactly?

Thanks in advance.
Hannes, Tech Support

Re: Is a text present somewhere (anywhere) on the page

Post by Hannes, Tech Support » Wed Nov 04, 2009 9:16 am

Thanks for reporting!

There's nothing wrong with your macro, I'm afraid. So this might be a bug. Is there a site where we can reproduce this issue?
tata668
Posts: 42
Joined: Sun Jun 14, 2009 2:34 am

Re: Is a text present somewhere (anywhere) on the page

Post by tata668 » Wed Nov 04, 2009 3:44 pm

Here you go:

Code: Select all


TAG POS=1 TYPE=* ATTR=TXT:*The<SP>confirmation<SP>code<SP>you<SP>entered<SP>was<SP>incorrect.* EXTRACT=TXT
TAG POS=1 TYPE=* ATTR=TXT:*Please<SP>note<SP>that<SP>you<SP>will<SP>need<SP>to<SP>enter* EXTRACT=TXT
TAG POS=1 TYPE=* ATTR=TXT:*Length<SP>must<SP>be<SP>between* EXTRACT=TXT
None of those extracts work (agains the code posted below)

Please save the page to make your tests because I'll remove it in a couple of days.

Thanks for the help.
Last edited by tata668 on Thu Nov 05, 2009 2:02 pm, edited 2 times in total.
tata668
Posts: 42
Joined: Sun Jun 14, 2009 2:34 am

Re: Is a text present somewhere (anywhere) on the page

Post by tata668 » Wed Nov 04, 2009 4:12 pm

I'm still doing tests, but I think that using "BODY" as the type to look for works:

Code: Select all

TAG POS=1 TYPE=BODY ATTR=TXT:*Hello<SP>World* EXTRACT=TXT
I'm not sure if it will always work though? I mean, I understand that is the text is in the header of the page it won't work but will it work for any text in the body?

Would "HTML" work as the type?

Edit: HTML seems to work too...

For now, I'll use BODY to search for text anywhere on the page:

Code: Select all

TAG POS=1 TYPE=BODY ATTR=TXT:*Hello<SP>World* EXTRACT=TXT
I'd like to know if you are aware that this code could fail in some situation, though...
Hannes, Tech Support

Re: Is a text present somewhere (anywhere) on the page

Post by Hannes, Tech Support » Thu Nov 05, 2009 8:39 am

tata668 wrote:Please save the page to make your tests because I'll remove it in a couple of days.
Thanks!
I will try to recreate this issue. but let me post the HTML source code for the records, first:

Code: Select all

<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" xml:lang="en-gb" lang="en-gb"><head>




<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta http-equiv="content-style-type" content="text/css">
<meta http-equiv="content-language" content="en-gb">
<meta http-equiv="imagetoolbar" content="no">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="copyright" content="2000, 2002, 2005, 2007 phpBB Group">
<meta name="keywords" content="">
<meta name="description" content="">
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7">
<title>yourdomain.com • User Control Panel • Register</title>

<!--
	phpBB style name: prosilver
	Based on style:   prosilver (this is the default phpBB3 style)
	Original author:  Tom Beddard ( http://www.subBlue.com/ )
	Modified by:      
	
	NOTE: This page was generated by phpBB, the free open-source bulletin board package.
	      The phpBB Group is not responsible for the content of this page and forum. For more information
	      about phpBB please visit http://www.phpbb.com
-->

<script type="text/javascript">
// <![CDATA[
	var jump_page = 'Enter the page number you wish to go to:';
	var on_page = '';
	var per_page = '';
	var base_url = '';
	var style_cookie = 'phpBBstyle';
	var style_cookie_settings = '; path=/; domain=.logisphere.com';
	var onload_functions = new Array();
	var onunload_functions = new Array();

	
	/**
	* Find a member
	*/
	function find_username(url)
	{
		popup(url, 760, 570, '_usersearch');
		return false;
	}

	/**
	* New function for handling multiple calls to window.onload and window.unload by pentapenguin
	*/
	window.onload = function()
	{
		for (var i = 0; i < onload_functions.length; i++)
		{
			eval(onload_functions[i]);
		}
	}

	window.onunload = function()
	{
		for (var i = 0; i < onunload_functions.length; i++)
		{
			eval(onunload_functions[i]);
		}
	}

// ]]>
</script>
<script type="text/javascript" src="wildcard.in.type.fails.to.extract_files/styleswitcher.js"></script>
<script type="text/javascript" src="wildcard.in.type.fails.to.extract_files/forum_fn.js"></script>

<link href="wildcard.in.type.fails.to.extract_files/print.css" rel="stylesheet" type="text/css" media="print" title="printonly">
<link href="wildcard.in.type.fails.to.extract_files/style.css" rel="stylesheet" type="text/css" media="screen, projection">

<link href="wildcard.in.type.fails.to.extract_files/normal.css" rel="stylesheet" type="text/css" title="A">
<link href="wildcard.in.type.fails.to.extract_files/medium.css" rel="alternate stylesheet" type="text/css" title="A+">
<link href="wildcard.in.type.fails.to.extract_files/large.css" rel="alternate stylesheet" type="text/css" title="A++">


</head><body linkifytime="13" linkified="0" linkifying="false" id="phpbb" class="section-ucp ltr">

<div id="wrap">
	<a id="top" name="top" accesskey="t"></a>
	<div id="page-header">
		<div class="headerbar">
			<div class="inner"><span class="corners-top"><span></span></span>

			<div id="site-description">
				<a href="http://www.logisphere.com/testforum/index.php" title="Board index" id="logo"><img src="wildcard.in.type.fails.to.extract_files/site_logo.gif" alt="" title="" height="52" width="139"></a>
				<h1>yourdomain.com</h1>
				<p>A short text to describe your forum</p>
				<p class="skiplink"><a href="#start_here">Skip to content</a></p>
			</div>

					<div id="search-box">
				<form action="./search.php" method="post" id="search">
				<fieldset>
					<input name="keywords" id="keywords" maxlength="128" title="Search for keywords" class="inputbox search" value="Search…" onclick="if(this.value=='Search…')this.value='';" onblur="if(this.value=='')this.value='Search…';" type="text"> 
					<input class="button2" value="Search" type="submit"><br>
					<a href="http://www.logisphere.com/testforum/search.php" title="View the advanced search options">Advanced search</a> 				</fieldset>
				</form>
			</div>
		
			<span class="corners-bottom"><span></span></span></div>
		</div>

		<div class="navbar">
			<div class="inner"><span class="corners-top"><span></span></span>

			<ul class="linklist navlinks">
				<li class="icon-home"><a href="http://www.logisphere.com/testforum/index.php" accesskey="h">Board index</a> </li>

				<li class="rightside"><a href="#" onclick="fontsizeup(); return false;" onkeypress="fontsizeup(); return false;" class="fontsize" title="Change font size">Change font size</a></li>

							</ul>

			
			<ul class="linklist rightside">
				<li class="icon-faq"><a href="http://www.logisphere.com/testforum/faq.php" title="Frequently Asked Questions">FAQ</a></li>
				<li class="icon-register"><a href="http://www.logisphere.com/testforum/ucp.php?mode=register">Register</a></li>					<li class="icon-logout"><a href="http://www.logisphere.com/testforum/ucp.php?mode=login" title="Login" accesskey="l">Login</a></li>
							</ul>

			<span class="corners-bottom"><span></span></span></div>
		</div>

	</div>

	<a name="start_here"></a>
	<div id="page-body">
		
		 
<script type="text/javascript">
// <![CDATA[
	/**
	* Change language
	*/
	function change_language(lang_iso)
	{
		document.forms['register'].change_lang.value = lang_iso;
		document.forms['register'].submit.click();
	}

// ]]>
</script>

<form method="post" action="./ucp.php?mode=register" id="register">

<div class="panel">
	<div class="inner"><span class="corners-top"><span></span></span>

	<h2>yourdomain.com - Registration</h2>

	<fieldset class="fields2">
	<dl><dd class="error">The confirm code you entered is too long.<br>The confirmation code you entered was incorrect.</dd></dl>		<dl><dd><strong>Please
note that you will need to enter a valid e-mail address before your
account is activated. You will receive an e-mail at the address you
provide that contains an account activation link.</strong></dd></dl>
	
	<dl>
		<dt><label for="username">Username:</label><br><span>Length must be between 3 and 20 characters.</span></dt>
		<dd><input tabindex="1" name="username" id="username" size="25" class="inputbox autowidth" title="Username" type="text"></dd>
	</dl>
	<dl>
		<dt><label for="email">E-mail address:</label></dt>
		<dd><input tabindex="2" name="email" id="email" size="25" maxlength="100" class="inputbox autowidth" title="E-mail address" type="text"></dd>
	</dl>
	<dl>
		<dt><label for="email_confirm">Confirm e-mail address:</label></dt>
		<dd><input tabindex="3" name="email_confirm" id="email_confirm" size="25" maxlength="100" class="inputbox autowidth" title="Confirm e-mail address" type="text"></dd>
	</dl>
	<dl>
		<dt><label for="new_password">Password:</label><br><span>Must be between 6 and 30 characters.</span></dt>
		<dd><input tabindex="4" name="new_password" id="new_password" size="25" value="" class="inputbox autowidth" title="New password" type="password"></dd>
	</dl>
	<dl>
		<dt><label for="password_confirm">Confirm password:</label></dt>
		<dd><input tabindex="5" name="password_confirm" id="password_confirm" size="25" value="" class="inputbox autowidth" title="Confirm password" type="password"></dd>
	</dl>

	<hr>

	<dl>
		<dt><label for="lang">Language:</label></dt>
		<dd><select name="lang" id="lang" onchange="change_language(this.value); return false;" title="Language"><option value="en" selected="selected">British English</option></select></dd>
	</dl>
	<dl>
		<dt><label for="tz">Timezone:</label></dt>
		<dd><select name="tz" id="tz" class="autowidth"><option title="[UTC - 12] Baker Island Time" value="-12">[UTC - 12] Baker Island Time</option><option title="[UTC - 11] Niue Time, Samoa Standard Time" value="-11">[UTC - 11] Niue Time, Samoa Standard Time</option><option title="[UTC - 10] Hawaii-Aleutian Standard Time, Cook Island Time" value="-10">[UTC - 10] Hawaii-Aleutian Standard Time, Cook Island Time</option><option title="[UTC - 9:30] Marquesas Islands Time" value="-9.5">[UTC - 9:30] Marquesas Islands Time</option><option title="[UTC - 9] Alaska Standard Time, Gambier Island Time" value="-9">[UTC - 9] Alaska Standard Time, Gambier Island Time</option><option title="[UTC - 8] Pacific Standard Time" value="-8">[UTC - 8] Pacific Standard Time</option><option title="[UTC - 7] Mountain Standard Time" value="-7">[UTC - 7] Mountain Standard Time</option><option title="[UTC - 6] Central Standard Time" value="-6">[UTC - 6] Central Standard Time</option><option title="[UTC - 5] Eastern Standard Time" value="-5">[UTC - 5] Eastern Standard Time</option><option title="[UTC - 4:30] Venezuelan Standard Time" value="-4.5">[UTC - 4:30] Venezuelan Standard Time</option><option title="[UTC - 4] Atlantic Standard Time" value="-4">[UTC - 4] Atlantic Standard Time</option><option title="[UTC - 3:30] Newfoundland Standard Time" value="-3.5">[UTC - 3:30] Newfoundland Standard Time</option><option title="[UTC - 3] Amazon Standard Time, Central Greenland Time" value="-3">[UTC - 3] Amazon Standard Time, Central Greenland Time</option><option title="[UTC - 2] Fernando de Noronha Time, South Georgia & the South Sandwich Islands Time" value="-2">[UTC - 2] Fernando de Noronha Time, South Georgia & the South Sandwich Islands Time</option><option title="[UTC - 1] Azores Standard Time, Cape Verde Time, Eastern Greenland Time" value="-1">[UTC - 1] Azores Standard Time, Cape Verde Time, Eastern Greenland Time</option><option title="[UTC] Western European Time, Greenwich Mean Time" value="0" selected="selected">[UTC] Western European Time, Greenwich Mean Time</option><option title="[UTC + 1] Central European Time, West African Time" value="1">[UTC + 1] Central European Time, West African Time</option><option title="[UTC + 2] Eastern European Time, Central African Time" value="2">[UTC + 2] Eastern European Time, Central African Time</option><option title="[UTC + 3] Moscow Standard Time, Eastern African Time" value="3">[UTC + 3] Moscow Standard Time, Eastern African Time</option><option title="[UTC + 3:30] Iran Standard Time" value="3.5">[UTC + 3:30] Iran Standard Time</option><option title="[UTC + 4] Gulf Standard Time, Samara Standard Time" value="4">[UTC + 4] Gulf Standard Time, Samara Standard Time</option><option title="[UTC + 4:30] Afghanistan Time" value="4.5">[UTC + 4:30] Afghanistan Time</option><option title="[UTC + 5] Pakistan Standard Time, Yekaterinburg Standard Time" value="5">[UTC + 5] Pakistan Standard Time, Yekaterinburg Standard Time</option><option title="[UTC + 5:30] Indian Standard Time, Sri Lanka Time" value="5.5">[UTC + 5:30] Indian Standard Time, Sri Lanka Time</option><option title="[UTC + 5:45] Nepal Time" value="5.75">[UTC + 5:45] Nepal Time</option><option title="[UTC + 6] Bangladesh Time, Bhutan Time, Novosibirsk Standard Time" value="6">[UTC + 6] Bangladesh Time, Bhutan Time, Novosibirsk Standard Time</option><option title="[UTC + 6:30] Cocos Islands Time, Myanmar Time" value="6.5">[UTC + 6:30] Cocos Islands Time, Myanmar Time</option><option title="[UTC + 7] Indochina Time, Krasnoyarsk Standard Time" value="7">[UTC + 7] Indochina Time, Krasnoyarsk Standard Time</option><option title="[UTC + 8] Chinese Standard Time, Australian Western Standard Time, Irkutsk Standard Time" value="8">[UTC + 8] Chinese Standard Time, Australian Western Standard Time, Irkutsk Standard Time</option><option title="[UTC + 8:45] Southeastern Western Australia Standard Time" value="8.75">[UTC + 8:45] Southeastern Western Australia Standard Time</option><option title="[UTC + 9] Japan Standard Time, Korea Standard Time, Chita Standard Time" value="9">[UTC + 9] Japan Standard Time, Korea Standard Time, Chita Standard Time</option><option title="[UTC + 9:30] Australian Central Standard Time" value="9.5">[UTC + 9:30] Australian Central Standard Time</option><option title="[UTC + 10] Australian Eastern Standard Time, Vladivostok Standard Time" value="10">[UTC + 10] Australian Eastern Standard Time, Vladivostok Standard Time</option><option title="[UTC + 10:30] Lord Howe Standard Time" value="10.5">[UTC + 10:30] Lord Howe Standard Time</option><option title="[UTC + 11] Solomon Island Time, Magadan Standard Time" value="11">[UTC + 11] Solomon Island Time, Magadan Standard Time</option><option title="[UTC + 11:30] Norfolk Island Time" value="11.5">[UTC + 11:30] Norfolk Island Time</option><option title="[UTC + 12] New Zealand Time, Fiji Time, Kamchatka Standard Time" value="12">[UTC + 12] New Zealand Time, Fiji Time, Kamchatka Standard Time</option><option title="[UTC + 12:45] Chatham Islands Time" value="12.75">[UTC + 12:45] Chatham Islands Time</option><option title="[UTC + 13] Tonga Time, Phoenix Islands Time" value="13">[UTC + 13] Tonga Time, Phoenix Islands Time</option><option title="[UTC + 14] Line Island Time" value="14">[UTC + 14] Line Island Time</option></select></dd>
	</dl>

		</fieldset>

	<span class="corners-bottom"><span></span></span></div>
</div>

<div class="panel">
	<div class="inner"><span class="corners-top"><span></span></span>

	<h3>Confirmation of registration</h3>
	<p>To
prevent automated registrations the board requires you to enter a
confirmation code. The code is displayed in the image you should see
below. If you are visually impaired or cannot otherwise read this code
please contact the <a href="mailto:ogregras@gmail.com">Board Administrator</a>.</p>

	<fieldset class="fields2">
	<dl>
		<dt><label for="confirm_code">Confirmation code:</label></dt>
		<dd><img src="wildcard.in.type.fails.to.extract_files/ucp.png" alt="" title=""></dd>
		<dd><input name="confirm_code" id="confirm_code" size="8" maxlength="8" class="inputbox narrow" title="Confirmation code" type="text"></dd>
		<dd>Enter
the code exactly as it appears. All letters are case insensitive, there
is no zero. If you cannot read the code you can request a new one by
clicking the button.</dd>
		<dd><input value="Refresh confirmation code" class="button2" type="submit"></dd> 	</dl>
	</fieldset>

	<span class="corners-bottom"><span></span></span></div>
</div>

<div class="panel">
	<div class="inner"><span class="corners-top"><span></span></span>

	<fieldset class="submit-buttons">
		<input name="agreed" value="true" type="hidden">
<input name="change_lang" value="0" type="hidden">
<input name="confirm_id" value="c2676449466176765eb026ca1e749aa0" type="hidden">		<input value="Reset" name="reset" class="button2" type="reset">&nbsp;
		<input name="submit" id="submit" value="Submit" class="button1" type="submit">
		<input name="creation_time" value="1257348434" type="hidden">
<input name="form_token" value="1d8561de6f7b8a679b3dd99e0f530fcaaf0cdc1d" type="hidden">
	</fieldset>

	<span class="corners-bottom"><span></span></span></div>
</div>
</form>

</div>

<div id="page-footer">

	<div class="navbar">
		<div class="inner"><span class="corners-top"><span></span></span>

		<ul class="linklist">
			<li class="icon-home"><a href="http://www.logisphere.com/testforum/index.php" accesskey="h">Board index</a></li>
							<li class="rightside"><a href="http://www.logisphere.com/testforum/memberlist.php?mode=leaders">The team</a> • <a href="http://www.logisphere.com/testforum/ucp.php?mode=delete_cookies">Delete all board cookies</a> • All times are UTC </li>
		</ul>

		<span class="corners-bottom"><span></span></span></div>
	</div>
	
<!--
	We request you retain the full copyright notice below including the link to www.phpbb.com.
	This not only gives respect to the large amount of time given freely by the developers
	but also helps build interest, traffic and use of phpBB3. If you (honestly) cannot retain
	the full copyright we ask you at least leave in place the "Powered by phpBB" line, with
	"phpBB" linked to www.phpbb.com. If you refuse to include even this then support on our
	forums may be affected.

	The phpBB Group : 2006
//-->

	<div class="copyright">Powered by <a href="http://www.phpbb.com/">phpBB</a> © 2000, 2002, 2005, 2007 phpBB Group
			</div>
</div>

</div>

<div>
	<a id="bottom" name="bottom" accesskey="z"></a>
	<img src="wildcard.in.type.fails.to.extract_files/cron.gif" alt="cron" height="1" width="1"></div>

</body></html>
Hannes, Tech Support

Re: Is a text present somewhere (anywhere) on the page

Post by Hannes, Tech Support » Thu Nov 05, 2009 9:39 am

Running iMacros v6.85 (beta), your extraction commands from above (with type=*) do work. Judging from the result, they are matched against the full site (<html>).

If you'd like to try, here's the current beta: http://www.iopus.com/imacros/support/b685.htm
tata668
Posts: 42
Joined: Sun Jun 14, 2009 2:34 am

Re: Is a text present somewhere (anywhere) on the page

Post by tata668 » Thu Nov 05, 2009 2:00 pm

Hannes, iOpus wrote:Running iMacros v6.85 (beta), your extraction commands from above (with type=*) do work. Judging from the result, they are matched against the full site (<html>).

If you'd like to try, here's the current beta: http://www.iopus.com/imacros/support/b685.htm
I' use TYPE=BODY for now, and wait for the next release...

Thanks for the help!
Post Reply