Sunday, 24 April 2011

Amazon SimpleDB Developer Guide - unofficial errata etc

Bookmark and Share

Updated 28 April 2011: my review of this book has now been published on Slashdot. They edited it down. Here's the complete review as submitted, complete with links to Amazon's current free-tier offer, and cloud computing cartoons!

These are my notes of errata, typos, queries/issues, and hoped-for improvements to the 2010 Packt book "Amazon SimpleDB Developer Guide" by Prabhakar Chaganti and Rich Helms - aka, all the mistakes I made or spotted, so you don’t have to!

I’ve no comments on the PHP code as I only tried the Java and Python, using Windows. Forgive the ugly "pre" blocks for some of the code, but that was the only way I could stop WordPress from turning normal quotes into the dreaded "smart" curly quotes that prevent code from running.

Page by page


p 5 - link at the bottom is wrong – extra slash, link doesn’t work in the PDF.

pp 25-26 - needs “keep with next” for pics and captions.

p 27 - link at the top doesn't work.

p 28 and throughout - they should have put "awsAccessId, awsSecretKey" in a different font to make it really obvious that you insert your actual keys there rather than, eg, thinking there'd be a prompt to enter your keys when you run the code. Going further, the book should have made it crystal clear that you need quotes around the keys - they're strings.

pp 28-31 – no typica imports were given – the book should provide them once, then they can be used throughout the book, but the first time would help a lot, especially given that this is a "getting started" book, because Eclipse suggests several options and it's not clear which is the correct one. In Chapters 2 onwards the minimum imports needed (some need more) are generally:
import com.xerox.amazonws.sdb.Domain;
import com.xerox.amazonws.sdb.SDBException;
import com.xerox.amazonws.sdb.SimpleDB;
(alternatively, the easiest if laziest solution is to import com.xerox.amazonws.sdb.*; )

Cf Chapters 9 and 10, eg p 194, which do give all the code, complete with all imports and even “main” – why the inconsistency? It would be easier for readers if the full code were provided in the early chapters. Contrast with this SimpleDB typica tutorial, which gives all imports (and makes it crystal clear that the keys go in as strings).

There are also inconsistencies in the Python code, eg p 211 gives the preliminary code to import boto and set up the connection etc, whereas some earlier chapters leave that out. All the Python code should be similarly complete, for the convenience of those readers who (as seems most likely) try different chapters at different times: don't assume readers will work through the whole book in a single sitting. In contrast, the Amazon Web Services toolkit for Eclipse took seconds to install, a few more seconds to enter my credentials, and the SimpleDB sample code given ran immediately.

p 38 – this Chapter should explain installation for Windows too, ie open a command window in the boto-[whatever] folder, then it's python setup.py install. Add environment variables for your keys as user variables in the normal way eg through Computer Properties -> Advanced System Settings -> Advanced -> Environment Variables). This is a strange omission as it’s in an IBM Developerworks tutorial on SimpleDB/Python/botowritten by one of this book's authors.

p 40 – there should be a True at the bottom of the page for the output you get after creating new item. Similarly with top of p 41.

p 42 - the last one:
sdb_connection.get_attributes('prabhakar-dom-1',car1')

should be:
sdb_connection.get_attributes('prabhakar-dom-1','car1')

- ie there's a missing open quote.

p 59 – I don’t get Domain:Cars as the output in the penultimate line. Also, the code for creating the domain has a double underscore in the name cars__domain – but it needs to be single underscore ie cars_domain, or else copy/pasting the subsequent code (which uses cars_domain) won’t work.

p 60 – needs a space after the import ie it's import SPACE inspect. Also, pp.pprint(inspect.getmembers(cars_domain, inspect.ismethod)) won’t work because the name has a single underscore here, see p 59. And so on.

p 63 - The line
Domain domain = sdb.getDomain("songs");

should be
Domain domain = sdb.getDomain("Cars");

p 64 – pasting the Python code shown here won’t work unless the double underscore on p 59 is fixed, or you use a double underscore here instead, ie cars__domain

p 70 – “It makes sure you call save() to actually persist your additions to SimpleDB” is misleading and gives the impression that add_value automatically includes a save() - cf p.71, which reads (correctly) “You must once again call save() in order to persist the changes.” The p 70 sentence should read something like “After calling add_values, make sure that you also call save()...”

p 75 - cars __domain should be (see p 59) cars_domain
Missing code (this should be line 4):
myitem2 = cars_domain.get_item('Car 2’)

p 78 – I get
u'dealer'

in the results of running the code, not
'dealer'

p 88 – Java code gives the body of the method provided for zeropadding; but readers may be more interested in the use of the method, eg
String encoded = DataUtils.encodeZeroPadding(int number, int maxNumDigits);
or
int decoded = DataUtils.decodeZeroPaddingInt("0000234");

p 93 – again it would be more useful if rather than providing the method body this page provided code showing its use, like: Date aDate = new Date(); String encodedDate = DataUtils.encodeDate(aDate); System.out.println(encodedDate); - and similarly with the decodeDate() method.

p 111 – the p 116 info on quoting should be given here, and in the main body of the text rather than a side “warning” - I personally find those warnings easily missed, possibly because they’re in a smaller font. Using the backtick ` (above the Tab key) instead of a single quote ‘ isn’t obvious, especially to someone typing out the code instead of copy/pasting, so it merits major highlighting. The Amazon guide is much clearer on when ` must be used. To emphasise, in SELECT queries you must use ` around the domain name if the name contains, eg, a hyphen or underscore, or else it won't work. (And you're not "escaping" here, you're quoting with a backtick.)

p 115 – there's info missing about You’re a Strange Animal, whereas info about that item was added in p 108 and 110 - the example should be carried through in full ie:
>> 1045845425 {u'Genre': u'Rock', u'Rating': u'****', u'Song': u"You're a Strange Animal", u'Artist': u'Gowan', u'Year': u'1985'}

(cf pp 122, 124, 125, 129, 130, 131, 132, 134 which are consistent on that front).

p 136 – getAttributes in Java  - this code won’t run, and I can’t find the getItemsAttributes() method in http://typica.s3.amazonaws.com/com/xerox/amazonws/sdb/Domain.html

p 146 – the download link for JetS3t is now http://jets3t.s3.amazonaws.com/downloads.html. And the info here is incomplete – “Add the jets3t-0.7.2.jar to your classpath” is not good enough. You also have to add commons-httpclient-*.jar (in the jets3t libs directory) to the classpath, or else it won’t work.

By the way, this isn't mentioned in the book but, when testing stuff on S3, a good way to check the results of running the code is to use JetS3t Cockpit (run the script in the JetS3t bin directory eg cockpit.bat if you're on Windows). And if you try the book's examples, you might want to use a different bucket name from packt_songs, or, alternatively, don't forget to delete that bucket when you're through. Bucket names are unique throughout the whole of AWS, not just to your account, so if you don't delete it, no other readers will be able to use the same bucket name.

p 148 – “We will use a MD5 hash that is generated from the name of the song, name of the artist, and year.” – but, the code given doesn’t in fact use the year.

p 149 – why is the line with user key details commented out?

p 149-151 – isn't there more efficient “for” code to do this, like the Python version on p 152, instead of going through each item individually?

p 151 – code is missing for “You’re a Strange Animal”. p154 – “/songs_folder” is used here, cf "/Users/prabhakar/Documents/SimpleDB Book/songs/” on p 160 – another inconsistency. More importantly, the code doesn’t run unless the mimes.type file from the jets3t configs folder is added to the classpath (I copied it to a lib folder in my Eclipse SimpleDB then added that lib folder to the project’s build path as Class folder in the project’s properties). Also, this code doesn’t allocate keys for the uploaded files using the relevant data from SimpleDB; the keys here are just the filenames. Either the book should provide code that uses SimpleDB data as the keys (as the Python code on p 159 does), or else it should explain clearly to readers that this can’t be done in Java.

p 160 – songs.select should be songs_domain.select in order to work with the previously-given code. Also, it wouldn't hurt to remind Windows users to escape the backslash in the file/folder path eg C:\path to\songs/%s.mp3

p 161 – why not use more efficient code with a “for” loop? In any event, this code wouldn’t run: first, “The method downloadObjects(S3Bucket, DownloadPackage[]) in the type S3ServiceSimpleMulti is not applicable for the arguments (S3Bucket, DownloadPackage[])”, then on casting downloadPackages to DownloadPackage[] and trying to run it: “Unable to determine S3 Object key name from signed URL: null”. And also warnings of deprecated methods/types. Also, it’s not clear what's the local location files get downloaded to, cf p 164 for Python which makes clear what the specified download directory will be. The info on http://www.ibm.com/developerworks/library/ar-cloudaws2/ with the comments in the code is clearer as to which code is mean to do what, and it would have helped if the code in the book had been similarly commented.

p 164 – see p 160 comment on “songs_domain” – occurs twice.

p 172 – “This sample will print the following values to the console:” – not exactly, the requestID will of course vary with the user.

pp 186-188 – memcached is also available for Windows http://www.splinedancer.com/memcached-win32/ - installation instructions are on that page, your directory structure may vary of course.

p 189 – why so specific on “Copy the JAR file named java_memcached-release_2.5.0.jar to a folder that is on your classpath.”? Why not just say, add it to your classpath? (adding it as an external jar also works, for instance). This page should also include instructions for memcached Windows, as p 38 sort of does - ie download the python-memcached library, extract the files, run cmd, cd to the folder, use “python setup.py install”; start the server with the command “c:pathtomemcached.exe -d start”.

p 190 – it can't be a bad idea to remind readers to start the memcached server running first, here.

p 192 – “mc = memcache.Client(['127.0.0.1:12312'])” – why is the port said to be 12312 here? Cf p 190 where it’s port 11211 for the Java. Only 11211 works for me, at least when using memcached for Windows with Python.

pp 194-196 – the Java code didn’t work for me, it still keeps retrieving the data afresh from SimpleDB – even though the Java test on p 190-191 showed that the memcached server is working fine, and the caching certainly works in Python (p 202).

p 201 – the code starting at the bottom of the page should be saved into a file called sdb_memcache.py – a big omission. Newbies – best to save the py files to the same folder as your Python installation eg the Lib subfolder; and NB you have to fix the indents if you copy/paste.

p 202 – if using memcached for Windows, it won’t work unless you use port number 11211 ie: sdb_mc = SDBMemcache("127.0.0.1","11211") p 205 - "In this chapter, we will explore how to run parallel operations against SimpleDB using boto." - but it's not just using boto. The page number's missing from this page.

p 213 - "Here is a simple Python script that updates items by making three different calls to SimpleDB, but in a serial fashion, that is one call after another." - but, no script was actually given?? And why not give the code for "Running this through time"?

p 213-216 - it would have been more helpful to give the explanations as comments against the relevant parts of the code, so that it's clear which bit of the code does what. That's a general point about the earlier Java code in this book too.

p 221 - to install eggs you have to first install easy_install. Although I'd already installed setuptools, I still had to download ez_install.py for this to work.

Digg This