Using Apache POI to create MS Office documents

A question was posted on LinkedIn which caused me to get out of my shell and post again. After all, I know all too well how frustrating it is to try to get something to work and not being able to find any meaningful examples on the web (YES... we have all grown fond of Google to find examples and get lazy researching topics, but that's life...)

I decided to post some simple examples on how to create MS Word documents from scratch. This is all based on my postings on LinkedIn on this very topic. To read the thread, you may go here.

First question: What is Apache POI? In simple terms, this is an free, open-source software (FOSS) that provide an application interface for Microsoft documents. This product was created by the Apache Foundation POI team. My examples are based on version 3.8. The latest version is 3.9 and there is a beta 3.10 available.

Second question: What kind of Microsoft documents? Simply, MS Office documents (i.e. Word, Excel, Power Point, Publisher, Visio, Outlook message, etc.)

My first posting (hopefully I will not get lazy and post other examples) in for Microsoft Word documents. And here it goes.... The first thing you should be aware of is that there are three different types of Microsoft Word Documents. Each of these types are realized by a POI class. These are:

  • XWPFDocument for Word 2007 and later
  • HWPFDocument for Word 97 - Word 2003
  • HWPFOldDocument for Word 95
  • I will be skipping over the Word 95 example. I really hope that there isn't someone out there with this particular need. But if you are out there and you run into this post, let me know and I will post an example just for you.... How about that!?

    I will start with the easiest and probably the most relevant of all: Microsoft Word 2007 and later.

    Going back to basics, if you need to create a blank Word document (or any blank file for that matter) all you need to do is create an output stream based on some file instance (i.e. "mydoc.doc") and you basically have you blank (Word) document:

    
    OutputStream ostream = new FileOutputStream(new File("mydoc.docx"));
    ostream.close();
    

    File association is what determines what program is used to open your files. Files with ".doc" and ".docx" extensions are opened using MS Word by default. If the file extension is not associated correctly, it really doesn't matter what your program does to create the file; that system will not be able to open the file until you correct this problem.

    Also, remember that output streams are used to write to and create files. Input streams assume the file exists. Otherwise, it will throw an exception (can't read from a file that does not exist).

    The second part is to obtain the parts of the document (i.e. header, footer, paragraph, etc.) and add it to your existing document. For this, you are going to need the appropriate POI class (XWPFDocument or HWPFDocument), and the output stream you just created so you can write to that document.

    
    XWPFDocument docx = new XWPFDocument();
    

    This is where the differences start. If you use HWPFDocument, you cannot create an instance of your document unless you create an input stream. For now, I am assuming you need to create a new Word 2007 or later file; since it is easier to explain.

    Now that you have an instance of a Word document object, all you have to do is to set the parts. This is where familiarization with the API is necessary. For now, I am going to create a simple paragraph with "Hello World!" in it. To create paragraphs, you need to create two objects: the Paragraph container to hold the text and a character Run object to set the text and all text properties such as Font Family, size, color, etc.

    To create a paragraph, use the document object to obtain an instance of paragraph as follows:

    
    XWPFParagraph paragraph = docx.createParagraph();
    

    Make sure you use the correct paragraph class for the document type you are using. Typically these classes start with 'X' for all of the Office 2007 and later document types. Once you obtain an instance of paragraph, you use the paragraph object to obtain an instance of character Run:

    
    XWPFRun charRun = paragraph.createRun();
    

    Lastly, you have to set the text, write it to the document, and close the stream:

    
    charRun.setText("Hello World!");
    docx.write(ostream);
    ostream.close();
    

    The close() method flushes (saves) the stream and closes it. Therefore, it is redundant to call the flush() method before close(). You want to use flush() only if you want to save the file, but keep the stream open for further write operations.

    This is about the simplest example on how to create a Word document from scratch. In my case, the document is created using Calibri font family with a font size of 11 and left-justified.

    All together...

    
    import java.io.File;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.OutputStream;
    
    import org.apache.poi.xwpf.usermodel.XWPFDocument;
    import org.apache.poi.xwpf.usermodel.XWPFParagraph;
    import org.apache.poi.xwpf.usermodel.XWPFRun;
    
    /**
     * Creates Word 2007 and later documents
     * @author Hector Fontanez
     *
     */
    public class DocxCreator
    {
        public static void main(String[] args)
        {
            File file = new File("mydoc.docx");
            XWPFDocument docx;
            OutputStream ostream;
    
            try
            {
                docx = new XWPFDocument();
                XWPFParagraph paragraph = docx.createParagraph();
                XWPFRun charRun = paragraph.createRun();
                charRun.setBold(true);
                charRun.setFontFamily("Consolas");
                charRun.setFontSize(16);
                charRun.setText("Hello World!");
                ostream = new FileOutputStream(file);
                docx.write(ostream);
                ostream.close();
            }
            catch (IOException e)
            {
                e.printStackTrace();
            }
        }
    }
    

    If you need to create Word documents using HWPFDocument, there are a few things you have to be aware of. The most important, the HWPFDocument constructor requires a POIFSFileSystem object, a DirectoryNode object, or an InputStream object. There is no no-arg constructor for this class like for XWPFDocument. Furthermore, this class does not have ability to create a Word document from scratch. Therefore, you must start with a blank document ".doc" or document template ".dot" file. The easiest is to create a blank document file and add it to your JAR. The following steps create a blank Word document from an existing template file (the only way you can, as silly as it sounds):

    
    InputStream istream = new FileInputStream(new File("template.doc"));
    HWPFDocument doc = new HWPFDocument(istream);
    istream.close(); // This stream is no longer needed after doc is created
    OutputStream ostream = new FileOutputStream(new File("mydoc.doc"));
    doc.write(ostream);
    ostream.close();
    

    Now is a matter of obtaining the document parts and adding text to the document. I personally do not know how to change the font family because there is no getter method for this property. So, this example will show you how to set some of the properties, but not font family since I do not know how to. You will have to research this more in detail if this is important to you.

    The process is similar as before, with the exception you need to use a new class to create instance of Paragraph: Range. To create a Range object, use the document object to obtain the instance:

    
    Range range = doc.getRange();
    

    Create a Paragraph instance using the Range object just created:

    
    Paragraph paragraph = range.getParagraph(0);
    

    Alternatively, you can skip a step and do this:

    
    Paragraph paragraph = doc.getRange().getParagraph(0);
    

    Since this is an empty document, there are no paragraphs. If you use anything other than zero at this point, it will throw an index out of bounds exception. Once you have paragraphs in the document, you can insert before or after any existing paragraph index.

    Lastly, you create a CharacterRun instance using the paragraph object and set all the properties (text, text size, color, etc). An interesting point is that to set the font size, the passed argument is in half points. Therefore, to set a font size of 16, you must pass a value of 32 (twice the half point size). The following few lines set these properties. Remember to do this before calling the document's write method:

    
    CharacterRun charRun = paragraph.insertBefore("Hello World!");
    charRun.setBold(true);
    charRun.setFontSize(32);
    

    This example should create a Word 97-2003 with a bold, 16-point phrase "Hello World!" in the first (and only) paragraph using the default font family. I have never tried it, but I am pretty sure that creating Word 95 documents (HWPFOldDocument) is not much different than this. This is what it looks like when put all together:

    
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.OutputStream;
    
    import org.apache.poi.hwpf.HWPFDocument;
    import org.apache.poi.hwpf.usermodel.CharacterRun;
    import org.apache.poi.hwpf.usermodel.Paragraph;
    
    /**
     * Creates Word 97-2003 documents
     * @author Hector Fontanez
     *
     */
    >public class DocCreator
    {
        public static void main(String[] args)
        {
            File file = new File("mydoc.doc");
            HWPFDocument doc;
            OutputStream ostream;
            InputStream istream;
    
            try
            {
                istream = new FileInputStream(new File("template.doc"));
                doc = new HWPFDocument(istream);
    istream.close(); Paragraph paragraph = doc.getRange().getParagraph(0); CharacterRun charRun = paragraph.insertBefore("Hello World!"); charRun.setBold(true); charRun.setFontSize(32); ostream = new FileOutputStream(file); doc.write(ostream); ostream.close(); } catch (IOException e) { e.printStackTrace(); } } }

    I hope this blog has been of some help to someone out there. Again, if you want an example for Word 95, let me know by adding a comment to the blog. Also, comments about this topic or what you will like to see in the future, are always welcome.

    Comments

    Popular posts from this blog

    Combining State and Singleton Patterns to Create a State-Machine

    Exception Handling: File CRUD Operations Example

    The Beauty of the Null Object Pattern