Extraction issue of html attribute value

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
breezyguy
Posts: 7
Joined: Mon Sep 14, 2015 9:45 am

Extraction issue of html attribute value

Post by breezyguy » Sun Mar 22, 2020 12:49 pm

Hi everyone especially to chivracq :)

I am trying to extract attribute value for the attached html page. However, the below imacros code only extract the last part of html code and write only to one column as an output. Could anyone help me what is wrong with my code ?

Image

Code: Select all

SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 0
TAB T=1
SET !TIMEOUT_PAGE 15
SET !DATASOURCE input.csv
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
SET !DATASOURCE_COLUMNS 1

URL GOTO={{!COL1}}
WAIT SECONDS=2
ADD !EXTRACT {{!URLCURRENT}}TAG POS=1 TYPE=DIV ATTR=CLASS:color-swatch-div* EXTRACT=HTM
SET ATTR EVAL("'{{!EXTRACT}}'.match(/aria-label=[\"'](.+?)[\"']/)[1];")
SET !EXTRACT NULL
ADD !EXTRACT {{ATTR}}
WAIT SECONDS=2

TAG POS=2 TYPE=DIV ATTR=CLASS:color-swatch-div* EXTRACT=HTM
SET ATTR EVAL("'{{!EXTRACT}}'.match(/aria-label=[\"'](.+?)[\"']/)[1];")
SET !EXTRACT NULL
ADD !EXTRACT {{ATTR}}
WAIT SECONDS=2

TAG POS=3 TYPE=DIV ATTR=CLASS:color-swatch-div* EXTRACT=HTM
SET ATTR EVAL("'{{!EXTRACT}}'.match(/aria-label=[\"'](.+?)[\"']/)[1];")
SET !EXTRACT NULL
ADD !EXTRACT {{ATTR}}
WAIT SECONDS=2

SAVEAS TYPE=EXTRACT FOLDER=* FILE=out.csv
Last edited by breezyguy on Sun Mar 22, 2020 8:39 pm, edited 3 times in total.
chivracq
Posts: 9425
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: iMacros: content extraction issue

Post by chivracq » Sun Mar 22, 2020 1:23 pm

breezyguy wrote:
Sun Mar 22, 2020 12:49 pm
I am trying to extract attr. value for the attached html page. However the imacros code below only extract the last part of html code and write only to one column as an output. Could anyone help me what is wrong with my code ?

Code: Select all

https://hizliresim.com/SMSnnv][img]https://i.hizliresim.com/SMSnnv.png

Code: Select all

SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 0
TAB T=1
SET !TIMEOUT_PAGE 15
SET !DATASOURCE input.csv
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
SET !DATASOURCE_COLUMNS 1

URL GOTO={{!COL1}}
WAIT SECONDS=2
ADD !EXTRACT {{!URLCURRENT}}TAG POS=1 TYPE=DIV ATTR=CLASS:color-swatch-div* EXTRACT=HTM
SET ATTR EVAL("'{{!EXTRACT}}'.match(/aria-label=[\"'](.+?)[\"']/)[1];")
SET !EXTRACT NULL
ADD !EXTRACT {{ATTR}}
WAIT SECONDS=2

TAG POS=2 TYPE=DIV ATTR=CLASS:color-swatch-div* EXTRACT=HTM
SET ATTR EVAL("'{{!EXTRACT}}'.match(/aria-label=[\"'](.+?)[\"']/)[1];")
SET !EXTRACT NULL
ADD !EXTRACT {{ATTR}}
WAIT SECONDS=2

TAG POS=3 TYPE=DIV ATTR=CLASS:color-swatch-div* EXTRACT=HTM
SET ATTR EVAL("'{{!EXTRACT}}'.match(/aria-label=[\"'](.+?)[\"']/)[1];")
SET !EXTRACT NULL
ADD !EXTRACT {{ATTR}}
WAIT SECONDS=2

SAVEAS TYPE=EXTRACT FOLDER=* FILE=out.csv

Grrr....!, and you can't say 5 years later, "I didn't know about it"...! :roll:

=> I will answer your Thread once you'll have corrected all the "faulty parts" that don't comply with the Forum Rules, and when the Forum Admin will have moved your Thread to the correct Sub-Forum (= 'Data Extraction'), I can't do it myself... (But no need now to open a Duplicate now, this is SPAM, and a "Show-Killer" for me to help...)

>>>

Hum..., and the 'FCI' part also applies for your parallel Thread on SOF:
- iMacros: content extraction issue

The Thread Title is a bit OK for SOF but not for this Forum, and the 'External Pix Hosting Site' Rule for the Screenshot doesn't apply for SOF...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
breezyguy
Posts: 7
Joined: Mon Sep 14, 2015 9:45 am

Re: iMacros: content extraction issue

Post by breezyguy » Sun Mar 22, 2020 8:58 pm

chivracq wrote:
Sun Mar 22, 2020 1:23 pm

Grrr....!, and you can't say 5 years later, "I didn't know about it"...! :roll:

I am not using this program to become developer. I asked a question in 2015 then now I have needed imacros and asked my second question.
I have searched on the internet but not found a solution.
chivracq wrote:
Sun Mar 22, 2020 1:23 pm
=> I will answer your Thread once you'll have corrected all the "faulty parts" that don't comply with the Forum Rules, and when the Forum Admin will have moved your Thread to the correct Sub-Forum (= 'Data Extraction'), I can't do it myself... (But no need now to open a Duplicate now, this is SPAM, and a "Show-Killer" for me to help...)
I have tried to make all my faulty parts corrected because of my respect to Forum rules not for your answer.
>>>
chivracq wrote:
Sun Mar 22, 2020 1:23 pm
Hum..., and the 'FCI' part also applies for your parallel Thread on SOF:
- iMacros: content extraction issue
Don't worry about the other forum mr. chivracq.
chivracq
Posts: 9425
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: iMacros: content extraction issue

Post by chivracq » Mon Mar 23, 2020 3:20 am

breezyguy wrote:
Sun Mar 22, 2020 8:58 pm
chivracq wrote:
Sun Mar 22, 2020 1:23 pm

Grrr....!, and you can't say 5 years later, "I didn't know about it"...! :roll:

I am not using this program to become developer. I asked a question in 2015 then now I have needed imacros and asked my second question.
I have searched on the internet but not found a solution.
chivracq wrote:
Sun Mar 22, 2020 1:23 pm
=> I will answer your Thread once you'll have corrected all the "faulty parts" that don't comply with the Forum Rules, and when the Forum Admin will have moved your Thread to the correct Sub-Forum (= 'Data Extraction'), I can't do it myself... (But no need now to open a Duplicate now, this is SPAM, and a "Show-Killer" for me to help...)
I have tried to make all my faulty parts corrected because of my respect to Forum rules not for your answer.
>>>
chivracq wrote:
Sun Mar 22, 2020 1:23 pm
Hum..., and the 'FCI' part also applies for your parallel Thread on SOF:
- iMacros: content extraction issue
Don't worry about the other forum mr. chivracq.

Hum, not sure if you are serious :? , but OK, never mind, good luck with other (Advanced) Users then... :|

Solution to your "Issue" is pretty simple anyway, if you take 10 min to understand how the 'EXTRACT' Mechanism works, a bit "surprising" for sbd who's been using iMacros for 5 years now... :o
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 9425
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extraction issue of html attribute value

Post by chivracq » Mon Mar 23, 2020 6:18 pm

Alright, I see some "Progress" concerning the "Quality" of this Thread, before I will "eventually" answer it...: :D

- Thread Title is now correct. Good...! :D
Even if you kept using the "old"/SOF Thread Title in your last Reply, but I guess from now on, the "new" Thread Title should be used for all Posts...

- The Forum Admin has moved the Thread from the 'iMacros for FF' to the "correct" 'Data Extraction' Sub-Forum. Thanks to him...! 8)

>>>

Items/Conditions left to comply with the Forum Rules:
- Upload your Screenshot directly to the Forum, and not to some External Pix Hosting Site.It's all explained in the Forum Rules if you bothered to read those if you wonder why...

And hum, this Rule actually also seems to apply to 'SOF' as I notice some "other" User re-uploaded your Screenshot to 'imgur' where they (='SOF') seem to have a dedicated "/stack/" Sub-Domain, which answers in a way a Qt I had been asking myself, about how they would handle if 'imgur' ever went dark or commercial one day, they have already taken care of the "commercial" Case, and for the "dark" Case, they probably have some Admin Rights on that Sub-Domain and can probably easily make a regular Backup of their whole Folder on that Server, which is probably just a few Gb's of Data for the whole 'SOF' Site...

Code: Select all

https://i.stack.imgur.com/30nia.png
They are then a bit more "clever" than I thought, ah-ah...! :o Even if that "Rule" is not clearly mentioned as a "Rule"/"Obligation" in their buggy FAQ (well, the whole Site is completely buggy anyway, and poorly designed... :shock: ) about the 'Images' Section which even doesn't use an 'imgur' URL for their Examples...:
Images

Images can be added primarily by using the the editor toolbar button insert image toolbar button. This brings up a special interface that allows you to upload an image online (through the imgur hosting service) through us - even from your clipboard. Alternatively, it can be input similarly to adding a link:

HTML <img src="http://example.com/img.jpg">
Markdown ![sample image](http://example.com/img.jpg)
>>>

Items/Conditions left for me to answer a/this Thread:
- Mention your FCI, just like I had already asked you 5 years ago in your previous Thread... :roll:

- Also mention your FCI in your parallel Thread on 'SOF' like I've asked you...
And yes, I do "worry about the other forum", perfectly fine to open parallel Threads on different Tech Forums, but you need to provide the same Quality in all of them, or I don't answer... Simple...!

(And deleting your Thread on 'SOF' at some point or when you'll have gotten your Answer/Script working like many if not most first time Posters on 'SOF' do on that Forum is also a "Show-Killer" for me to ever help them, just saying... :!: )
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
breezyguy
Posts: 7
Joined: Mon Sep 14, 2015 9:45 am

Re: Extraction issue of html attribute value

Post by breezyguy » Mon Mar 23, 2020 9:21 pm

chivracq wrote:
Mon Mar 23, 2020 6:18 pm
Items/Conditions left for me to answer a/this Thread:
- Mention your FCI, just like I had already asked you 5 years ago in your previous Thread... :roll:
My FCI is still the same as I told you five years ago :D
chivracq wrote:
Mon Mar 23, 2020 6:18 pm
- Also mention your FCI in your parallel Thread on 'SOF' like I've asked you...
And yes, I do "worry about the other forum", perfectly fine to open parallel Threads on different Tech Forums, but you need to provide the same Quality in all of them, or I don't answer... Simple...!
I thank you for the answer for my previous question not for this one. Also thank you mr. chivracq giving chance to ( Beginner ) users to answer my simple question. :arrow:
Post Reply