Amazon Order History--can it be scraped?

Discussions and Tech Support related to the iMacros Firefox Add-on.

Moderator: iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Amazon Order History--can it be scraped?

by macropolis on Tue Sep 09, 2008 2:35 pm

Hi, I'm reasonably skilled at VBA and have studied iMacros and have used it for an iterative purpose, but I'm unclear how to get started on this:

I want to end up with a table of my Amazon.com purchases; from there, I'll filter and sort to create reports on purchases I've made for an advocacy (read: volunteer) project. I've made probably 150 or so purchases over the past few years. I think that such a report is available for "commercial" type accounts. Mine isn't such an account, it's simply a regular consumer account. I've asked customer service a couple of times, and have been told that there's no download available, at least for regular accounts.

After logging into Amazon, I can pull up order history information, but there are various challenges. These include:

There is no "all orders" capability. Using a drop down, I can choose from:
Open and recently placed orders
Orders placed in the past 6 month
Orders placed in 2008
Orders placed in 2007
Orders placed in 2006
(etc., until...)
Orders placed in 1998

A comprehensive set presumably would be to use "Orders placed in 2008" through "Orders placed in 1998"

This project started in 2003, so I "only" need 2003 through 2008.

Another challenge is that the orders for a given year don't necessarily appear on a single page. For example, for 2008, the bottom of the page says:
"More Orders: 1 | 2 | 3"
And in that sentence, "2" is a link, and "3" is a link. ("1" is not a link, presumably because it presently is being displayed.)

Each page is a report of "orders". Let's call this an "Orders Report Page" and the entire set of pages for a given year the "Orders report".

The number of orders on an Orders Report Page varies, due to that a given order can have more than one item.

In the Orders Report Page itself, no price is shown. Additionally, the amount of info about the title and author varies; it appears, for example, that if there's a long title, the author info isn't shown.

For each order, there is a "View Order" button. Clicking it brings one to what I'll the Order Report.

As mentioned, an order can have more than one item.

Within the Order Report, each item is listed. I'm unsure whether long orders end up having multiple pages.

Within the Order Report, items with long titles have only a truncated title. The title text is itself a link to the item's "regular" page (containing a picture, reviews, links to ordering from 3rd party sellers, links to other editions, etc.).

Within the Order Report, each item's author is shown--um, that is, if there is an author. It appears that if an item isn't a book or the like, then there is no "By: " and no author listed. Multiple authors might be listed.


My goal is to end up with the following fields:

Title
Author(s)
Purchase date
Purchase price
(ideally) published date
(ideally) shipping cost. (Note--for part of the period, I had Amazon's "prime" service, such that I paid an annual fee and had [two-days] no shipping charge for items I bought from Amazon (rather than 3rd parties).
(ideally) link to the item's regular page

Nice-to-haves would include shipping costs. Oh, and the editorial reviews, and the top-rated reader-written reviews.

This all seems so complicated that I thought I'd start out with an inquiry post, hoping for some combination of that someone's already got this running, or to get feedback on whether it'd be faster simply to "scrape" this info manually rather than trying to build an iMacro.

This is my first post here. I'd welcome any feedback (including negative, of course).

Thanks!
macropolis
 
Posts: 1
Joined: Tue Sep 09, 2008 12:51 pm

Re: Amazon Order History--can it be scraped?

by craterbaiter on Sun Nov 09, 2008 9:35 am

Hi,

I have exactly the same requirement with Amazon accounts in USA and UK going back to 1999..

It's a tedious process copy pasting into excel

Was there more on the subject ?

I have also asked Amazon a few times and the responses have been long winded but unhelpful..

Your advice would be appreciated.
craterbaiter
 
Posts: 1
Joined: Sun Nov 09, 2008 9:26 am

Re: Amazon Order History--can it be scraped?

by Tech Support on Fri Nov 14, 2008 1:47 am

iMacros can be used to web-scrape Amazon.com. Creating an extraction macro is quite simple, for example this macro extracts the price:

amazon extract.png
amazon extract.png (89 KIB) Viewed 9907 times

Code: Select all
URL GOTO=http://www.amazon.com/Testing-Applications-Web-Planning-Internet-Based/dp/0471201006/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1226653950&sr=8-1
TAG POS=1 TYPE=TD ATTR=TXT:Price:
TAG POS=R1 TYPE=B ATTR=TXT:* EXTRACT=TXT


If you task is more complicated. e. g. if you have many if/then conditions, you will want to use the iMacros Scripting Edition and control iMacros from VBS or any other Scripting language. You can also start iMacros from within Excel using VBA.

With scripting you can also automatically click the "Next" links (or 1,2,3... links). Here is a tutorial about this: http://wiki.imacros.net/VBS_looping
User avatar
Tech Support
 
Posts: 4996
Joined: Tue Sep 20, 2005 12:25 pm

Re: Amazon Order History--can it be scraped?

by Gadrin on Fri Nov 14, 2008 7:45 am

Except that's not the order history.

The order history requires a login to your account, then you need to find the correct Year, then it's a multi-page, lookup.
There's a summary page first with all your orders (it too maybe multiple pages).

Then you have to go into each order, grab the order # and then do a Detail Lookup of that order (that's from the original poster).


>
Thanks, Gadrin
Gadrin
 
Posts: 13
Joined: Tue Feb 12, 2008 12:16 pm

Re: Amazon Order History--can it be scraped?

by Tech Support on Fri Nov 14, 2008 4:28 pm

Except that's not the order history.

Sure, that is only an example. But extracting the order history works exactly the same way.

then it's a multi-page, lookup.
There's a summary page first with all your orders (it too maybe multiple pages).

I mentioned this in my post: With scripting you can also automatically click the "Next" links (or 1,2,3... links). Here is a tutorial about this: http://wiki.imacros.net/VBS_looping

I estimate that it takes between 3-5 hours to create this setup (including testing).

If you prefer a turnkey solution, we also offer a custom macro and script creation service. The pricing is based on the complexity of the project (which is 3-5h in this case). If you are interested in this option, please contact us at http://www.iopus.com/service/consulting/ and we send you a quote with a fixed price in a few days.
User avatar
Tech Support
 
Posts: 4996
Joined: Tue Sep 20, 2005 12:25 pm

Re: Amazon Order History--can it be scraped?

by Gadrin on Fri Nov 14, 2008 7:34 pm

Well, I'm not trying to butt-heads with you guys :lol:

I wrote the MSIE DOM script in VBS while I was at the hospital and it took me about 20 minutes. Basically you point MSIE at the page
after logging in and selecting the one you want and it'll inventory the summaries for each order and pull the Order# off each and stores
them in an array.

Next I'll have to configure the script to .navigate via the MSIE object model to each URL that contains the Order# and download those
pages (or process them and store them, haven't decided yet).

Might be about 2 hours total or so with testing, so comparable to your tool. I'm not sure if I can do it all behind the scenes and not
even use MSIE or not. I'm not sure about the FORM postings for logins and such. It might be possible using the WinINET or HTTrack
which can do logins. I don't think wGET can handle it. AutoHotKey is another possibility. Whew!

I just saw where Firebug (addon for Firefox) has it's own add-on called Firespider. It can go grab links on a page and then use XMLHTTP requests to tell you the status of the links. We're trying to see if the guy who wrote it is interested in doing things like exporting the results and or the URLs to a CSV file or text.

It'd be very cool to say go into Firebug, pick a TAG or a group of tags and have Firespider go download them or grab their SRC or HREFs.

Anyway, good luck.

>
Thanks, Gadrin
Gadrin
 
Posts: 13
Joined: Tue Feb 12, 2008 12:16 pm

Re: Amazon Order History--can it be scraped?

by koffee on Tue Mar 03, 2009 2:10 pm

I just installed the FireFox iMacros plugin and its pretty neat but...

for this particular problem I would use the Amazon Associates Web Service to acquire most of the information needed rather than bother scraping all those pages from the website.

Page 272 of the Amazon Associates Web Service Developer Guide (API Version 2009-01-06) on the TransactionLookup operation states:

On the retail web site, the Transaction ID is called the Order Number. To find one, point your browser at http://www.amazon.com. Click on the following links: Your Account>Where's My Stuff? > Open and recently shipped orders. If you have not purchased anything recently, you can use the See More dropdown list to select, for example, Orders placed in an entire year. On the page that lists the transactions, use the Order Number for the Transaction ID.

so unfortunately there doesn't seem to be a way to get a list of all your purchases via the web service. Therefore I'd use Python and the mechanize module to log into an Amazon account on the website and navigate thru the appropriate transaction year pages collecting all the relevant TransactionIDs.

Given a TransactionID use the TransactionLookup operation via the Zolera SOAP Infrastructure (ZSI) module or a REST request (and the BeautifulStoneSoup module or ElementTree) to get the numerous transaction details you are interested in.

It'd be interesting to see a comparison on implementation time, efficiency, accuracy, and maintenance costs for a custom Python solution vs iMacros.
koffee
 
Posts: 1
Joined: Tue Mar 03, 2009 1:09 pm


Return to iMacros for Firefox

Who is online

Users browsing this forum: Bing [Bot] and 3 guests

Website Monitoring