Extensible Markup Language (XML)



Introduction

XML is used to aid the exchange of data. It makes it possible to define data in a clear way. Both the sending and the receiving party will use XML to understand the kind of data that's been sent. XML is a meta-language is a that's used to define other languages. XML can be used to Define data structures, make these structures platform independent and process XML defined data automatically. XML is designed to separate syntax from semantics to provide a common framework for structuring information (browser rendering semantics is completely defined by stylesheets). It allows tailor-made markup for any application domain. XML is designed to support internationalization (Unicode) and platform independence. Basically XML is a framework for defining markup languages. There are no fixed collection of markup tags, we may define our own tags, tailored for our kind of information. Each XML language is targeted at its own application domain, but the languages will share many features

XML Tags

In XML, we can define our own tags. To use a tag, we will have to define it's meaning. This definition is stored in DTD (Document Type Definition). You can define your own DTD or use an existing one. Defining a DTD actually means defining a XML language. XML tags are created like HTML tags. There's a start tag and a closing tag <TAG>content</TAG>. The closing tag uses a slash after the opening bracket, just like in HTML. The text between the brackets is called an element. The following rules are used for using XML tags:

XML Page

The first line of an XML document is the XML declaration. It's a special kind of tag:
<?xml version="1.0"?>
The version 1.0 is the actual version of XML. The XML declaration makes clear that we're talking XML and also which version is used. All XML documents must have a root element. All other elements in the same document are children of this root element.The root element is the top level of the structure in an XML document.
Structure of an typical XML page

DTD

To be of practical use, an XML document needs to be valid i.e. the document must apply to the rules as defined in a Document Type Definition (DTD). A DTD contains the rules for a particular type of XML-documents. Actually it's the DTD that defines the language. A DTD describes elements. It uses the following syntax:
The text <! ELEMENT, followed by the name of the element, followed by a description of the element e.g.
<!ELEMENT brand (#PCDATA)>
This DTD description defines the XML tag <brand>. The description (#PCDATA) stands for parsed character data. It's the tag that is shown and also will be parsed (interpreted) by the program that reads the XML document.We can also define (#CDATA), this stands for character data. CDATA will not be parsed or shown. An element that contains sub elements is described thus:
<ELEMENT car (brand, type) > <!ELEMENT brand (#PCDATA) > <!ELEMENT type (#PCDATA) > This means that the element car has two subtypes: brand and type. Each subtype can contain characters. DTD can be external or internal ( included in the XMl document itself).

XML Doc Presentation

XML is about defining data. To show these documents we can use CSS or XSL ( eXtensible Style sheet Language) XSL can convert XML documents into HTML thus making them visible to any browser.
reference: http://www.spiderpro.com/bu/buxmlm001_struc.html

Experiment

I have used the XML Perl Parser module to parse a XML document and extract the hyperlinks from it. Subsequently used HTTP protocol to get the data from the link. The following code in perl does the same.
#!/usr/bin/perl -w use Socket; use XML::Parser::Expat; socket(SOCK,PF_INET,SOCK_STREAM,getprotobyname('tcp'));
connect(SOCK,pack('S n a4 x8',2,8080,inet_aton("144.16.67.8")))
|| die "could not:$!";
print "Connected to ";
$parser = new XML::Parser::Expat;
$parser->setHandlers('Start' => \&sh,
'End' => \&eh,
'Char' => \&ch);
open(FOO, "doc.xml") or die "Couldn't open";
$parser->parse(*FOO);
close(FOO);
sub sh
{
my ($p, $el, %atts) = @_;
foreach $key (keys %atts){
print $atts{$key}."\n";
$sendbuff = "GET"." ".$atts{$key}." HTTP/1.1 Range : 0-1999\n\n";
print $sendbuff;
send(SOCK,$sendbuff,0);
print "Data Sent to proxy...\n";
recv(SOCK,$recvbuff,200000,0) or die "couldnot receive";
print "Data Rxed.....\n";
print $recvbuff;
close(SOCK);
}
}
sub eh
{
my ($p, $el) = @_;
}
sub ch
{
my ($p,$str) = @_;
}