Tony Obermeit posted a question to the AJUG-QLD mailing list asking for help with a problem reading XML snippets. So, I thought I’d help him out, and this is the answer I gave.
*Update*: As pointed out by Pepijn Schmitz, Sun already provided the same solution: java.io.SequenceInputStream
*Another update*: I raised an RFE with Sun for the vararg support Vote early, vote often!
Tony’s problem was that he had a large (1GB+) file full of XML snippets. A section of the file looked like this:
Line one Line two Line three</message< You get the idea
The file is not valid XML, because there is no root document – just a bunch of elements one after another.
Some background: a SAX parser in Java needs to have an InputSource (org.xml.sax.InputSource
). You can create an InputSource from an java.io.InputStream
, a java.io.Reader
, or a string which is essentially a file URI. There are also convenience methods on the java.xml.parsers.SAXParser
class, but these just delegate down to the InputSource.
So, in order to solve this problem, what we need is an InputStream that will decorate the contents of one input stream with a pre-amble and a post amble: at the start, and
at the end.
This is what I came up with:
public class CompositeInputStream extends InputStream { private InputStream[] streams; private int index = 0; public CompositeInputStream(InputStream ... streams) { this.streams = streams; } // ... more code goes in here. (maybe) public int read() { if (index == streams.length) { return -1; } int value = streams[index].read(); if (value < 0) { // end of stream index++; return read(); } return value; } public void close() { for (InputStream stream : streams) { stream.close(); } } }
You could then compose an input stream for parsing like so:
InputStream toStream(String text) {
byte[] textBytes = test.getBytes("UTF-8");
return new ByteArrayInputStream(textBytes);
}
InputStream preamble = toStream("”);
InputStream body = new FileInputStream(“myfile.xml”);
InputStream postamble = toStream(“”);
InputStream xmlStream = new CompositeInputStream(preamble, body, postamble);
InputSource = new InputSource(xmlStream);
// read it in…
xmlStream.close();
And there you go.
For the record: using the SequenceInputStream that Pepjin pointed out would look like this:
InputStream preamble = toStream(""); InputStream body = new FileInputStream("myfile.xml"); InputStream postamble = toStream(""); InputStream xmlStream = new SequenceInputStream(preamable, new SequenceInputStream(body, postamble));
Perhaps Sun could update this to support varargs, to give the nicer constructor. Just for giggles, I’ve lodged this as a bug with Sun. I’ll update this again when I get the bug number.
It’s somewhat a Decorator pattern as well, since you’re “decorating” the content of the original input stream 🙂
Actually, I vacillated back and forth between trying to decide if this was a Decorator implemented via composition, or a Composite that was intended to do decoration. I’m still not sure. In the long run, it doesn’t really matter.
Composition is normally about aggregating behaviour; here, we’re aggregating data. That makes me wonder if perhaps it isn’t even a Chain-Of-Responsibility implementation… 🙂
This already exists in Java! It’s called the java.io.SequenceInputStream and it’s been there since Java 1.0. Your version has a nicer constructor though… 😉
Well, whadda ya know… 🙂 I’ve never needed this functionality before, so I didn’t know that Sun had anticipated me on it.
Sure enough:
public class SequenceInputStream
extends InputStream
A SequenceInputStream represents the logical concatenation of other input streams. It starts out with an ordered collection of input streams and reads from the first one until end of file is reached, whereupon it reads from the second one, and so on, until end of file is reached on the last of the contained input streams.
Since:
JDK1.0
You learn something new every day.