java.net member

Rechercher dans ce site

XML Parsing in Java part 1

>> 04 June 2010

Using the DOM parser provided with the Java SDK (nothing to install)

Structure of XML Documents
Any XML document has elements and text.

Three essential components
1.Header
2.Root element
3.Child elements

The root element can have one or more child element, and child element can have a child element or a text or both.

XML elements can have attributes.

XML Header
<?xml version="1,0" encoding="UTF-8"?>

An example an Eclipse template
The root element is <templates> and every template is inside <template> and </template>
These are child elements of the root. Elements have attributes (key/value), like autoinsert and a text (the content of the template)

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<templates>
    <template autoinsert="true" context="catchblock_context" deleted="false"     description="Code in new catch blocks" enabled="true"     id="org.eclipse.jdt.ui.text.codetemplates.catchblock" name="catchblock">
    // ${todo} Auto-generated catch block
    ${exception_var}.printStackTrace();</template>
</templates>

What is a parser?
A parser is a program that reads an XML file, verify it structure and get it's elements

Java library parsers

  • DOM parser -> reads a document into a tree structure
  • SAX parser -> produces events with XML elements

How to use the DOM parser included in the JDK

1-Get a DocumentBuilder object, using a DocumentBuilderFactory
 
DocumentBuilderFactory     documentBuilderFactory=DocumentBuilderFactory.newInstance();
  DocumentBuilder documentBuilder=documentBuilderFactory.newDocumentBuilder();

2-Pass a File object (you can use a URL) to DocumentBuilder object to get a reference to the document :

  File file=new File("fileName");
  Document document=documentBuilder.parse(file);

3-With a reference to the Document you can get (Node objects):
  -The root element
  -Child elements
  -Content and attributes of elements

Elements can have name, attributes, text, etc...

4-Get the root
  Element root=document.getDocumentElement();

5-Get children
  //NodeList is a collection of Nodes
    NodeList children = root.getChildNodes();
    for (int i = 0; i < children.getLength(); i++)
    {
      
    Node child = children.item(i);
    //get rid of white spaces
    if(child instanceof Element)
     {
        ...
        ...
     }//~if a child is element
    }//~children

6-Get attributes
  NamedNodeMap attributes=child.getAttributes();
          
  for(int j=0;j<attributes.getLength();j++)
    {
    Node attribute=attributes.item(j);
    if(!(attribute instanceof Node) )
        continue;
    String attributeName=attribute.getNodeName();
    String attributeValue=attribute.getNodeValue();
              
    }//~attributes

7-More operations
  You can :
  - Get the name of an element : getTagName()
  - Get an attribute node by name : getAttributeNode(String name)
  - Remove an attribute by name : removeAttribute(String name)
  See the Javadoc for more details

Here is a demonstration program
package com.java_javafx.ka;

import java.io.File;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import org.xml.sax.SAXException;

/*
@author Kaesar ALNIJRES
*/

public class Test {
public void test()
{
//parse a file, get root element, children elements and their attributes

try {
    //prepare to read the xml document, by getting a DocumentBuilder
    DocumentBuilderFactory     documentBuilderFactory=DocumentBuilderFactory.newInstance();
    DocumentBuilder      documentBuilder=documentBuilderFactory.newDocumentBuilder();
    File file=new File("put_a_path_to_xml_file");
    Document document=documentBuilder.parse(file);
  
    //get the root element
    Element root=document.getDocumentElement();
  
    //get the root's name
    String string2=root.getTagName();
  
    System.out.println("root "+string2);
  
    //get a list of children
    NodeList children = root.getChildNodes();
    for (int i = 0; i < children.getLength(); i++)
    {
      
        Node child = children.item(i);
        //no white spaces
        if(child instanceof Element)
        {
            //cast a child to Element  
            Element element = (Element) child;
          
            //get the name of the child (element)         
            String childName=element.getTagName();
          
          
            System.out.println("current element "+childName);
            //get the content of the current element
            Text content = (Text) child.getFirstChild();
            String string3=content.getData().trim();
          
          
            System.out.println("content "+string3);
            //Get attributes of the current child
            NamedNodeMap attributes=child.getAttributes();
          
            for(int j=0;j<attributes.getLength();j++)
            {
                Node attribute=attributes.item(j);
                if(!(attribute instanceof Node) )
                    continue;
                String attributeName=attribute.getNodeName();
                String attributeValue=attribute.getNodeValue();
                          
                System.out.println("att "+attributeName+" = "+attributeValue);
            }//~attributes
          
        }//~if a child is element
    }//~children
  
} catch (ParserConfigurationException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (SAXException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
  

}
public static void main(String[] args) {
    Test t=new Test();
    t.test();
}
}

Eclipse templates ready to use
I've prepared the very first version of an Eclipse template to parse an xml file. You can find it in the download section of this site.
Import the xml file into Eclipse. In a java editor type xml -> use CTRL+SPACE bar, use one of the templates.

0 comments:

Post a Comment

  © Blogger template Simple n' Sweet by Ourblogtemplates.com 2009

Back to TOP