s3 – Software is too expensive to build cheaply….

I had all sorts of problems getting my head around how cache files work with Hadoop. Finally, I stumbled across the answer – when you add a cache file (see HadoopMain#48), it’s available to read as a local file inside the mapper (MyMapper#36).

When running in Elastic MapReduce, the file URI can be an S3 file, using either s3://bucket/path or s3n://bucket/path – this may or may not work in other Hadoop implementations, but the general approach would work fine.

See the gist at https://gist.github.com/twasink/8813628

	martin Schönert on DTSTTCPW – What does it…
	Robert Watkins on DTSTTCPW – What does it…
	martin Schönert on DTSTTCPW – What does it…
	William on Using shorter UUIDs
	Thiago Romão Barcala on Using shorter UUIDs

Tag: s3

Example Hadoop Job that reads a cache file loaded from S3

Share this: