XML Parsing in Java part 1
>> 04 June 2010
Using the DOM parser provided with the Java SDK (nothing to install)
Structure of XML Documents
Any XML document has elements and text.
Three essential components
1.Header
2.Root element
3.Child elements
The root element can have one or more child element, and child element can have a child element or a text or both.
XML elements can have attributes.
XML Header
<?xml version="1,0" encoding="UTF-8"?>
An example an Eclipse template
The root element is <templates> and every template is inside <template> and </template>
These are child elements of the root. Elements have attributes (key/value), like autoinsert and a text (the content of the template)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<templates>
<template autoinsert="true" context="catchblock_context" deleted="false" description="Code in new catch blocks" enabled="true" id="org.eclipse.jdt.ui.text.codetemplates.catchblock" name="catchblock">
// ${todo} Auto-generated catch block
${exception_var}.printStackTrace();</template>
</templates>
What is a parser?
A parser is a program that reads an XML file, verify it structure and get it's elements
Java library parsers
- DOM parser -> reads a document into a tree structure
- SAX parser -> produces events with XML elements
How to use the DOM parser included in the JDK
1-Get a DocumentBuilder object, using a DocumentBuilderFactory
DocumentBuilderFactory documentBuilderFactory=DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder=documentBuilderFactory.newDocumentBuilder();
2-Pass a File object (you can use a URL) to DocumentBuilder object to get a reference to the document :
File file=new File("fileName");
Document document=documentBuilder.parse(file);
3-With a reference to the Document you can get (Node objects):
-The root element
-Child elements
-Content and attributes of elements
Elements can have name, attributes, text, etc...
4-Get the root
Element root=document.getDocumentElement();
5-Get children
//NodeList is a collection of Nodes
NodeList children = root.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
{
Node child = children.item(i);
//get rid of white spaces
if(child instanceof Element)
{
...
...
}//~if a child is element
}//~children
6-Get attributes
NamedNodeMap attributes=child.getAttributes();
for(int j=0;j<attributes.getLength();j++)
{
Node attribute=attributes.item(j);
if(!(attribute instanceof Node) )
continue;
String attributeName=attribute.getNodeName();
String attributeValue=attribute.getNodeValue();
}//~attributes
7-More operations
You can :
- Get the name of an element : getTagName()
- Get an attribute node by name : getAttributeNode(String name)
- Remove an attribute by name : removeAttribute(String name)
See the Javadoc for more details
Here is a demonstration program
package com.java_javafx.ka;
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import org.xml.sax.SAXException;
/*
@author Kaesar ALNIJRES
*/
public class Test {
public void test()
{
//parse a file, get root element, children elements and their attributes
try {
//prepare to read the xml document, by getting a DocumentBuilder
DocumentBuilderFactory documentBuilderFactory=DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder=documentBuilderFactory.newDocumentBuilder();
File file=new File("put_a_path_to_xml_file");
Document document=documentBuilder.parse(file);
//get the root element
Element root=document.getDocumentElement();
//get the root's name
String string2=root.getTagName();
System.out.println("root "+string2);
//get a list of children
NodeList children = root.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
{
Node child = children.item(i);
//no white spaces
if(child instanceof Element)
{
//cast a child to Element
Element element = (Element) child;
//get the name of the child (element)
String childName=element.getTagName();
System.out.println("current element "+childName);
//get the content of the current element
Text content = (Text) child.getFirstChild();
String string3=content.getData().trim();
System.out.println("content "+string3);
//Get attributes of the current child
NamedNodeMap attributes=child.getAttributes();
for(int j=0;j<attributes.getLength();j++)
{
Node attribute=attributes.item(j);
if(!(attribute instanceof Node) )
continue;
String attributeName=attribute.getNodeName();
String attributeValue=attribute.getNodeValue();
System.out.println("att "+attributeName+" = "+attributeValue);
}//~attributes
}//~if a child is element
}//~children
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args) {
Test t=new Test();
t.test();
}
}
Eclipse templates ready to use
I've prepared the very first version of an Eclipse template to parse an xml file. You can find it in the download section of this site.
Import the xml file into Eclipse. In a java editor type xml -> use CTRL+SPACE bar, use one of the templates.
0 comments:
Post a Comment