The Roe

February 10, 2004 | View Comments (15) | Category: 9rules

Summary: 9rules latest creation: The Roe Store

Since this is the month where I try new projects I thought I would show you my latest creation. The Roe Store (arugably the smallest river in the world) is an online book store using Amazon Web Services (AWS) that focuses on "web" books. The only page that required a significant amount of time to build was the index page and that still needs a lot of work. All the other pages are dynamically generated through an XML feed provided by Amazon. They also host all the images, so bandwidth is kept to a minimum. This was more of a learning project for myself than something that I expected to become big time. I do plan however on building some Amazon sites in the future that leverage this technology so this was a great exercise.

Things I Learned

Firstly, for any affiliate site to become a mini-success it has to be in the search engines. With AWS, all links for books require the use of query strings in the url, so you might be left with something that resembles this:

store/browse.php?Browse=34564&Sort=Featured&Page=1

This is obviously a url that is not search engine friendly so the chances of Google or other search engines spidering it becomes minimal. We certainly can't have that so I decided to dive into my .htaccess file and use some rewrite rules. Here is one example of a rewrite rule that I am using:

RewriteRule ^browse/([0-9]+)/([a-zA-Z]+)/([0-9]+)/index.php$ browse.php?Browse=$1&Sort=$2&Page=$3

Which allows me to take the nasty url created by the feed and turn it into this:

store/browse/34564/Featured/3/index.php

Now I have urls that the search engines should like. What this also does is effectively give me around 3,000 pages that can be indexed on the site, even though they do not exist. Spiders follow the links all around the site thinking the pages exist on the site when they really don't. If I placed all the categories of Amazon on the site I could effectively have millions of pages indexed by Google over time.

I have only scratched the surface with the features that are offered through AWS. As of right now I only have browse categories, search, and individual pages enabled on the site. Over time I will be adding more stuff once I get the hang of things and the Roe store will grow. This also give the audience another way to contribute to this site.

Also on a sidenote, this will be the last gray site I do for some time. Gotta break out of that habit. Even though the site is in early stages so the colors may change.

Trackback URL: http://9rules.com/cgi-bin/mt/mt-tb.cgi/143

Comments

#1

Are there really any search engines of note that can't handle normal amazon URLs? Google indexes amazon quite nicely. Altavista and the other fallen captains of the search industry were fine with them before they were crushed by google. Is alltheweb ok with them? that's the only other one I can think of offhand that has a shred of market share, with google powering yahoo and AOL.

They're more 'people-friendly' than 'search engine-friendly' really. But it's not really terribly relevant. Looks decent and I'm sure you picked up a lot of stuff on web services while doing it.

Now try one with weather.com using their XML feed to build a site with XSLT, and implement caching as they require. :-)

JC (http://thelionsweb.com/weblog)

#2

Nice little site there, Scrivs. I remember getting pretty excited about XML and web services at least a year ago (maybe longer), but I never did play with it much beyond my initial fiddling.

Also, Not to go completely off topic, but where exactly is Weather.com's XML feed? All I have found is their stupid "add weather to your site" program. Do you have to sign up for that first, then hack the url out of there?

Paul G (http://www.relativelyabsolute.com)

#3

I forgot to mention that I use caching as well because Amazon has a one search per one second type deal with AWS.

Scrivs (http://www.9rules.com/whitespace/)

#4

Cool. I too downloaded the AWS code some time ago, but have yet to do anything with it.

The one suggestion that might be cool to implement on your site would be to somehow incorporate your comments in above the others.

Example:

Scrivs says:...

Other comments include:...

That way you personalize it a bit more.

Mark Fusco (http://www.lightpierce.com/ltshdw)

#5

It's not advertised so far as I can tell, but if you look at the source code or try to use something like a CF custom tag that's supposed to parse weather.com to grab weather info, you'll find this in a comment tag:

Want The Weather Channel data? Sign up for the Weather XML Data Feed online at http://www.weather.com/services/xmloap.html

JC (http://thelionsweb.com/weblog)

#6

Or incorporate your comments in a similiar fashion as Look and Feel Media did with their recommended book library.

Mark Fusco (http:/www.lightpierce.com/ltshdw)

#7

That's a pretty good idea and shouldn't be that hard to implement. Looks like I am going back a little bit into programming which is kind of a hassle for me since I am a perfectionist when it comes to my code. Always trying to refactor the damn stuff.

Scrivs (http://9rules.com/whitespace/)

#8

Yeh. I'm diving head first into getting more serious on the coding end of things. So serious in fact, that I purchased Visual Studio.NET a couple of weeks ago.

Massive.

Mark Fusco (http://www.lightpierce.com/ltshdw)

#9

You should make sure HTML is escaped in the output.

For Example, Zeldmans book features a Comment which includes a [blink]-Tag.


That looks somewhat odd "...dozen or so tags in its beginnings to the tag and others that..." with the last half of the sentence blinking...

;)

levin (http://levin.grundeis.net/)

#10

Yep, that's the problem with pulling data from an external source. Unless you make the effort to scrub it, you'll almost invariably get that one strange piece of bad data that completely hoses everything (or at the very least looks ugly). Possibly my least favorite programming job, but fulfilling when you get it right.

P.S. JC: Thanks for the link.

Paul G (http://relativelyabsolute.com/)

#11

htmlspecialchars() is your very good friend. Or a regex that just strips out anything inside HTML tags, I suppose.

And sure thing on the link. Have fun. It works quite well once it works at all. They don't provide any sample code, just a PDF with some very basic instruction, some of which doesn't work (nothing terribly important). If I hadn't had websphere studio (which has an excellent XML/XSL debugger), I'd have been SOL.

JC (http://thelionsweb.com/weblog)

#12

What about this function - I know it's not that known, but definitely very useful.

strip_tags
(PHP 3>= 3.0.8, PHP 4 )

strip_tags -- Strip HTML and PHP tags from a string

string strip_tags ( string str [, string allowable_tags])

dusoft (http://www.ambience.sk)

#13

oooh, forgot that one. that's probably even better here.
That's why I like PHP so much... every time I find myself contemplating some complicated task, especially for string manipulation, it turns out there's a command that does it for me. And the reverse is my primary complaint with ColdFusion as well.

JC (http://http;//www.thelionsweb.com/weblog)

#14

I will definitely look into implementing these later today. Thanks for the tips guys.

Scrivs (http://www.9rules.com/whitespace/)

#15

I can't remember exactly what the concern was, but I've read that strip_tags() is less than perfect. Wish I could remember. I'm working on a small program that needs to be able to translate HTML into a custom formatted plain-text version, and I ended up using preg_replace() with a fairly large array of regex replacements.

There's some more about weather feeds over at the scientific flower garden, and in the comments
http://dealmeida.net/blosxom/en/Programming/a_garden_of_cellular_automata

Justin (http://bluealpha.com)

Keep track of comments to all entries with the Comments Feed