Extracting a table when two columns share same element name

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
tennisdude
Posts: 7
Joined: Mon Jan 30, 2017 9:48 pm

Extracting a table when two columns share same element name

Post by tennisdude » Thu Mar 30, 2017 4:52 pm

Hi,

I'm trying to extract a table with 7 columns of data.

In the table I'm trying to extract two columns share the same element name "CLASS:player-name" I have to apply a regular expression to that element name to remove additional data I do not need.

When I try to create a loop on the POS of the individual element when The script loops it uses the same player name for both columns in the csv file.

I've tried to get the script to use relative extraction and have had no luck.

Could some of the imacros wizards help me or point me in the right direction??

Cheers
TD

Code: Select all


VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

TAG POS=3 TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

TAG POS=5 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=6 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv

chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting a table when two columns share same element n

Post by chivracq » Fri Mar 31, 2017 1:05 am

tennisdude wrote:Hi,

I'm trying to extract a table with 7 columns of data.

In the table I'm trying to extract two columns share the same element name "CLASS:player-name" I have to apply a regular expression to that element name to remove additional data I do not need.

When I try to create a loop on the POS of the individual element when The script loops it uses the same player name for both columns in the csv file.

I've tried to get the script to use relative extraction and have had no luck.

Could some of the imacros wizards help me or point me in the right direction??

Cheers
TD

Code: Select all

VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

TAG POS=3 TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

TAG POS=5 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=6 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")

TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
TAG POS=3 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv

FCIM...! :mrgreen: (Always mention your FCI when you open a Thread, read my Sig..., that's mostly why I never reacted to any of your 2 previous Threads, even if one mentioned IE, many Commands are not implemented for all Browsers/Versions or behave differently...)
=> FCI:

Code: Select all

iMacros v10.0.2, iMB/IE...?, if IE v...?, OS...?
(If you could mention that Info + for your 2 previous Threads as well (where you could post some Update btw, either you still have the Pb or you managed to solve those 2 Threads and you are expected to share your Sol), I won't follow up otherwise... (and I won't ask again in some future Thread(s)...?)
I normally don't even read Threads when FCI is not clearly mentioned (preferably at the top of the Opening Post in a Thread)... :idea:

But OK, I had a look at your Case as you luckily provided Script + URL, and hum..., not Difficulty at all to use Relative Positioning, I don't know what you tried, but there is no Glitch at all for R-Positioning for the 2nd Player based on the 1st Player or even for the whole Row based on the first Element (the "SF"/"QF"/etc...) like for example:

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
'SET !EXTRACT_TEST_POPUP NO
TAB T=1

'URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

TAG POS=5 TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
PROMPT _{{!EXTRACT}}_

TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
PROMPT _{{!EXTRACT}}_

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT
PROMPT _{{!EXTRACT}}_

PAUSE
SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv
(Tested on iMacros for FF v8.8.2, Pale Moon v26.3.3 (=FF47), Win10-x64.)

If you want to loop your Script, you only need to "play" with '!LOOP' for the first ('round') Element which seems to follow 'TAG POS=3/5/7/9/etc'... =>:

Code: Select all

SET !LOOP 1
SET My_Loop {{!LOOP}
ADD My_Loop {{!LOOP}}
ADD My_Loop 1

TAG POS={{My_Loop}} TYPE=TD ATTR=CLASS:round EXTRACT=TXT
TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT

'/etc... all other Extracts with 'R1' as well...
Hum..., and all your Data Manipulation directly on '!EXTRACT' and especially on the complete '!EXTRACT' to clean up a bit the 2 Names is not Best Practice in my Opinion and could be "dangerous" and unreliable, it's better to use Temp_Vars for each Extract and to "reconstruct" the '!EXTRACT' before doing the 'SAVEAS'...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
tennisdude
Posts: 7
Joined: Mon Jan 30, 2017 9:48 pm

Re: Extracting a table when two columns share same element n

Post by tennisdude » Wed Jul 05, 2017 7:05 am

Hi Chivracq,

Sorry about the long awaited delay in getting back to you.
I had given up with this project until I had success with other scripts I decided to give this project a try again.

I'm using version 9.03 for FF running on 64 bit windows 7.

I've edited the script like you suggested and I cannot get the regular expression to run on EXTRACT and it seems that
some of the player-name extracts are duplicating in one column. It only happens some so its really strange behavior.

does anyone have any suggestions in how I can make this code get the desired output?

Here's my code.

Code: Select all


VERSION BUILD=9030808 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS

URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

SET {{!LOOP}} 1
SET My_Loop {{!LOOP}}
ADD My_Loop {{!LOOP}}

TAG POS={{My_Loop}} TYPE=TD ATTR=CLASS:round EXTRACT=TXT
SET Round !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT
SET Event_name !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name1 !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
SET Odds1 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
SET Odds2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT
SET H2H !EXTRACT



ADD Round !EXTRACT
ADD Event_name !EXTRACT
ADD Player_Name1 !EXTRACT
ADD Player_Name2 !EXTRACT
ADD Odds1 !EXTRACT
ADD Odds2 !EXTRACT
ADD H2H !EXTRACT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv









the output.


R64 Wimbledon P Lorenzi / A Mannarino(ITA/FRA) S Clayton / J O'Mara(GBR/GBR) 2.4 1.55
R64 Wimbledon T Bellucci / R Dutra Silva(BRA/BRA) F Martin / D Nestor(FRA/CAN)F Martin / D Nestor(FRA/CAN)13 6 1.12
R64 Wimbledon J Brunstrom / A Siljestrom(SWE/SWE) V Troicki / N Zimonjic(SRB/SRB)
R64 Wimbledon J Erlich / T Huey(ISR/PHL) I Dodig / M Granollers(CRO/ESP)I Dodig / M Granollers(CRO/ESP)6 3.4 1.3
R64 Wimbledon P Petzschner / A Peya(GER/AUT) R Haase / D Inglot(NLD/) 2 1.77
R64 Wimbledon D Marcan / T Weissborn(CRO/AUT) F Mergea / A Qureshi(ROU/PAK)F Mergea / A Qureshi(ROU/PAK)14 6.05 1.12
R64 Wimbledon A Behar / A Bury(URU/BLR) M Daniell / M Demoliner(NZL/BRA)
R64 Wimbledon D Brown / M Zverev(GER/GER) R Bopanna / E Roger-Vasselin(IND/FRA)R Bopanna / E Roger-Vasselin(IND/FRA)8 2.8 1.42
R64 Wimbledon P Raja / D Sharan(IND/IND) K Edmund / J Sousa(GBR/PRT) 1.33 3.2
R64 Wimbledon C Berlocq / A Ramos-Vinolas(ARG/ESP) J Cabal / R Farah(COL/COL)J Cabal / R Farah(COL/COL)12
R64 Wimbledon G Muller / S Querrey(LUX/USA) N Mektic / F Skugor(CRO/CRO) 1.4 2.9
R64 Wimbledon R Jebavy / J Vesely(CZE/CZE) J Murray / B Soares(/BRA)J Murray / B Soares(/BRA)3 8 1.07
R64 Wimbledon M Matkowski / M Mirnyi(POL/BLR) C Hsieh / M Schnur(/USA)
R64 Wimbledon M Reid / J Smith(AUS/AUS) F Lopez / M Lopez(ESP/ESP)F Lopez / M Lopez(ESP/ESP)11
R64 Wimbledon M Baghdatis / M Jaziri(CYP/TUN) S Darcis / B Paire(BEL/) 1.75 2.05
R64 Wimbledon S Gonzalez / D Young(MEX/USA) P Herbert / N Mahut(FRA/FRA)P Herbert / N Mahut(FRA/FRA)2
R64 Wimbledon Jiri Vesely(CZE) Fabio Fognini(ITA)Fabio Fognini(ITA)28 2.1 1.72 H2H 0-2
R64 Wimbledon Jerzy Janowicz(POL) Lucas Pouille(FRA)Lucas Pouille(FRA)14 3 1.4 H2H 0-1
R64 Wimbledon Nikoloz Basilashvili(GEO) Sam Querrey(USA)Sam Querrey(USA)24 5 1.16 H2H 0-0
R64 Wimbledon Ruben Bemelmans(BEL) Daniil Medvedev(RUS) 5.5 1.14 H2H 0-0
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting a table when two columns share same element n

Post by chivracq » Thu Jul 06, 2017 2:27 pm

tennisdude wrote:Hi Chivracq,

Sorry about the long awaited delay in getting back to you.
I had given up with this project until I had success with other scripts I decided to give this project a try again.

I'm using

Code: Select all

version 9.03 for FF running on 64 bit windows 7.
I've edited the script like you suggested and I cannot get the regular expression to run on EXTRACT and it seems that
some of the player-name extracts are duplicating in one column. It only happens some so its really strange behavior.

does anyone have any suggestions in how I can make this code get the desired output?

Here's my code.

Code: Select all


VERSION BUILD=9030808 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS

URL GOTO=https://matchstat.com/tennis/all-upcoming-matches

SET {{!LOOP}} 1
SET My_Loop {{!LOOP}}
ADD My_Loop {{!LOOP}}

TAG POS={{My_Loop}} TYPE=TD ATTR=CLASS:round EXTRACT=TXT
SET Round !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:event-name EXTRACT=TXT
SET Event_name !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name1 !EXTRACT


TAG POS=R1 TYPE=TD ATTR=CLASS:player-name EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/ \\(.+/gm, '');")
SET Player_Name2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-0 EXTRACT=TXT
SET Odds1 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:odds-td<SP>odds-1 EXTRACT=TXT
SET Odds2 !EXTRACT
 

TAG POS=R1 TYPE=TD ATTR=CLASS:h2h EXTRACT=TXT
SET H2H !EXTRACT



ADD Round !EXTRACT
ADD Event_name !EXTRACT
ADD Player_Name1 !EXTRACT
ADD Player_Name2 !EXTRACT
ADD Odds1 !EXTRACT
ADD Odds2 !EXTRACT
ADD H2H !EXTRACT

SAVEAS TYPE=EXTRACT FOLDER=* FILE=T1.csv

the output.

Code: Select all

R64	Wimbledon	P Lorenzi / A Mannarino(ITA/FRA)	S Clayton / J O'Mara(GBR/GBR)	2.4	1.55	
R64	Wimbledon	T Bellucci / R Dutra Silva(BRA/BRA)	F Martin / D Nestor(FRA/CAN)F Martin / D Nestor(FRA/CAN)13	6	1.12	
R64	Wimbledon	J Brunstrom / A Siljestrom(SWE/SWE)	V Troicki / N Zimonjic(SRB/SRB)			
R64	Wimbledon	J Erlich / T Huey(ISR/PHL)	I Dodig / M Granollers(CRO/ESP)I Dodig / M Granollers(CRO/ESP)6	3.4	1.3	
R64	Wimbledon	P Petzschner / A Peya(GER/AUT)	R Haase / D Inglot(NLD/)	2	1.77	
R64	Wimbledon	D Marcan / T Weissborn(CRO/AUT)	F Mergea / A Qureshi(ROU/PAK)F Mergea / A Qureshi(ROU/PAK)14	6.05	1.12	
R64	Wimbledon	A Behar / A Bury(URU/BLR)	M Daniell / M Demoliner(NZL/BRA)			
R64	Wimbledon	D Brown / M Zverev(GER/GER)	R Bopanna / E Roger-Vasselin(IND/FRA)R Bopanna / E Roger-Vasselin(IND/FRA)8	2.8	1.42	
R64	Wimbledon	P Raja / D Sharan(IND/IND)	K Edmund / J Sousa(GBR/PRT)	1.33	3.2	
R64	Wimbledon	C Berlocq / A Ramos-Vinolas(ARG/ESP)	J Cabal / R Farah(COL/COL)J Cabal / R Farah(COL/COL)12			
R64	Wimbledon	G Muller / S Querrey(LUX/USA)	N Mektic / F Skugor(CRO/CRO)	1.4	2.9	
R64	Wimbledon	R Jebavy / J Vesely(CZE/CZE)	J Murray / B Soares(/BRA)J Murray / B Soares(/BRA)3	8	1.07	
R64	Wimbledon	M Matkowski / M Mirnyi(POL/BLR)	C Hsieh / M Schnur(/USA)			
R64	Wimbledon	M Reid / J Smith(AUS/AUS)	F Lopez / M Lopez(ESP/ESP)F Lopez / M Lopez(ESP/ESP)11			
R64	Wimbledon	M Baghdatis / M Jaziri(CYP/TUN)	S Darcis / B Paire(BEL/)	1.75	2.05	
R64	Wimbledon	S Gonzalez / D Young(MEX/USA)	P Herbert / N Mahut(FRA/FRA)P Herbert / N Mahut(FRA/FRA)2			
R64	Wimbledon	Jiri Vesely(CZE)	Fabio Fognini(ITA)Fabio Fognini(ITA)28	2.1	1.72	H2H 0-2
R64	Wimbledon	Jerzy Janowicz(POL)	Lucas Pouille(FRA)Lucas Pouille(FRA)14	3	1.4	H2H 0-1
R64	Wimbledon	Nikoloz Basilashvili(GEO)	Sam Querrey(USA)Sam Querrey(USA)24	5	1.16	H2H 0-0
R64	Wimbledon	Ruben Bemelmans(BEL)	Daniil Medvedev(RUS)	5.5	1.14	H2H 0-0
Hum..., 3 months later indeed, never too late I guess... 8)

OK, so you are now on FF, your Thread started about iMB/IE, OK, and we finally have your FCI, even if your FF Version is still missing..., I guess it's too complicated to mention 3 Versions...!? :shock:
=> I reckon you'll probably be on FF54 + iMacros for FF v9.0.3 + Win7-x64.

Oh...!, but you missed the following part in my previous Reply...:
(If you could mention that Info + for your 2 previous Threads as well (where you could post some Update btw, either you still have the Pb or you managed to solve those 2 Threads and you are expected to share your Sol), I won't follow up otherwise... (and I won't ask again in some future Thread(s)...?)
I normally don't even read Threads when FCI is not clearly mentioned (preferably at the top of the Opening Post in a Thread)... :idea:
=> Sorry, but I don't follow up... :roll:
You need to use the Forum "a bit correctly" if you want me to help you..., and that applies to all your Threads...:
- Thread 1: Cannot connect VB script to Access .accdb
- Thread 2: Re: Can't connect to Database

But hum, for this one, from a quick look at your Script, you could read a bit of Documentation (+ reading a few Pages of Forum Threads is a good Practice as well, if you want to understand a little bit how iMacros works... + look at the Demo-Macros...), you haven't understood how to use Variables with iMacros, you have a complete Wiki Page dedicated to Variables... (and '!EXTRACT' is a Var as well btw...) + use 'PROMPT' to follow/debug your Vars... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
tennisdude
Posts: 7
Joined: Mon Jan 30, 2017 9:48 pm

Re: Extracting a table when two columns share same element n

Post by tennisdude » Thu Jul 06, 2017 3:40 pm

Hi Chivracq,


Yes the versions you mentioned are correct. I'll do my best to use a post template for future threads to benefit the forum.
I've used the prompt extract and it stores the correct elements I want so I'm not sure why in this case few rows down it fails.
Is there another tag I could use in order to be more direct with the column of data I want?
This script is needed to trigger another script to gather stats after it collects the daily matchups.
I have the other script working but I need this one to work first.

Once I do get this script working I'll be able to concentrate on the other scripts for the DB.


Thanks,
t
tennisdude
Posts: 7
Joined: Mon Jan 30, 2017 9:48 pm

Re: Extracting a table when two columns share same element n

Post by tennisdude » Fri Jul 07, 2017 9:40 pm

I'm a little stuck at the moment.

Can anyone point me in the right direction?
Post Reply