XML, java.io, and the Composite Pattern

Tony Obermeit posted a question to the AJUG-QLD mailing list asking for help with a problem reading XML snippets. So, I thought I’d help him out, and this is the answer I gave.

*Update*: As pointed out by Pepijn Schmitz, Sun already provided the same solution: java.io.SequenceInputStream

*Another update*: I raised an RFE with Sun for the vararg support Vote early, vote often!


Tony’s problem was that he had a large (1GB+) file full of XML snippets. A section of the file looked like this:

  Line one
  Line two
  Line three</message<
  You get the idea

The file is not valid XML, because there is no root document – just a bunch of elements one after another.

Some background: a SAX parser in Java needs to have an InputSource (org.xml.sax.InputSource). You can create an InputSource from an java.io.InputStream, a java.io.Reader, or a string which is essentially a file URI. There are also convenience methods on the java.xml.parsers.SAXParser class, but these just delegate down to the InputSource.

So, in order to solve this problem, what we need is an InputStream that will decorate the contents of one input stream with a pre-amble and a post amble: at the start, and at the end.

This is what I came up with:

public class CompositeInputStream extends InputStream {
  private InputStream[] streams;
  private int index = 0;

  public CompositeInputStream(InputStream ... streams) {
    this.streams = streams;
  }
  // ... more code goes in here. (maybe)
  public int read() {
    if (index == streams.length) { return -1; }
    int value = streams[index].read();
    if (value < 0) { // end of stream
      index++;
      return read();
    }
    return value;
  }
  
  public void close() {
    for (InputStream stream : streams) {
    stream.close();
  }
  }
}

You could then compose an input stream for parsing like so:

InputStream toStream(String text) {
byte[] textBytes = test.getBytes("UTF-8");
return new ByteArrayInputStream(textBytes);
}

InputStream preamble = toStream("”);
InputStream body = new FileInputStream(“myfile.xml”);
InputStream postamble = toStream(“”);

InputStream xmlStream = new CompositeInputStream(preamble, body, postamble);
InputSource = new InputSource(xmlStream);
// read it in…
xmlStream.close();

And there you go.


For the record: using the SequenceInputStream that Pepjin pointed out would look like this:

InputStream preamble = toStream("");
InputStream body = new FileInputStream("myfile.xml");
InputStream postamble = toStream("");

InputStream xmlStream = new SequenceInputStream(preamable, new SequenceInputStream(body, postamble));

Perhaps Sun could update this to support varargs, to give the nicer constructor. Just for giggles, I’ve lodged this as a bug with Sun. I’ll update this again when I get the bug number.

Advertisements

Author: Robert Watkins

My name is Robert Watkins. I am a software developer and have been for over 18 years now. I currently work for people, but my opinions here are in no way endorsed by them (which is cool; their opinions aren’t endorsed by me either). My main professional interests are in Java development, using Agile methods, with a historical focus on building web based applications. I’m also a Mac-fan and love my iPhone, which I’m currently learning how to code for. I live and work in Brisbane, Australia, but I grew up in the Northern Territory, and still find Brisbane too cold (after 16 years here). I’m married, with two children and one cat. My politics are socialist in tendency, my religious affiliation is atheist (aka “none of the above”), my attitude is condescending and my moral standing is lying down.

4 thoughts on “XML, java.io, and the Composite Pattern”

  1. Actually, I vacillated back and forth between trying to decide if this was a Decorator implemented via composition, or a Composite that was intended to do decoration. I’m still not sure. In the long run, it doesn’t really matter.

    Composition is normally about aggregating behaviour; here, we’re aggregating data. That makes me wonder if perhaps it isn’t even a Chain-Of-Responsibility implementation… 🙂

  2. Well, whadda ya know… 🙂 I’ve never needed this functionality before, so I didn’t know that Sun had anticipated me on it.

    Sure enough:

    public class SequenceInputStream
    extends InputStream

    A SequenceInputStream represents the logical concatenation of other input streams. It starts out with an ordered collection of input streams and reads from the first one until end of file is reached, whereupon it reads from the second one, and so on, until end of file is reached on the last of the contained input streams.

    Since:
    JDK1.0

    You learn something new every day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s