XML merupakan bahasa markup yang merupakan singkatan dari Extensible Markup Language. Ini dikembangkan oleh World Wide Web Consortium (W3C) untuk menentukan sintaks untuk menyandikan dokumen yang dapat dibaca oleh manusia dan mesin. File ini berisi tag markup. Ada perbedaan antara HTML dan XML. Dalam HTML, tag markup menjelaskan struktur halaman, dan dalam xml, menjelaskan arti dari data yang terkandung dalam file. Bahasa R, dapat membaca file xml dengan menginstal paket "XML" ke dalam lingkungan R. Paket ini akan diinstal dengan bantuan perintah familiar yaitu, install.packages.
1. Install packages : install.packages("XML")
2. Buat file xml : file.xml
<records>
<employee_info>
<id>1</id>
<name>Shubham</name>
<salary>623</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
<employee_info>
<id>2</id>
<name>Nishka</name>
<salary>552</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
<employee_info>
<id>1</id>
<name>Gunjan</name>
<salary>669</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
<employee_info>
<id>1</id>
<name>Sumit</name>
<salary>825</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
<employee_info>
<id>1</id>
<name>Arpita</name>
<salary>762</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
<employee_info>
<id>1</id>
<name>Vaishali</name>
<salary>882</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
<employee_info>
<id>1</id>
<name>Anisha</name>
<salary>783</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
<employee_info>
<id>1</id>
<name>Ginni</name>
<salary>964</salary>
<date>1/1/2012</date>
<dept>IT</dept>
</employee_info>
</records>
3. Read file xml
3.1. Read dalam bentuk list
# Loading the package required to read XML files.
library("XML")
# Also loading the other required package.
library("methods")
# Giving the input file name to the function.
result <- xmlParse(file = "file.xml")
xml_data <- xmlToList(result)
print(xml_data)
3.2. Mendapatkan jumlah node yang ada dalam file xml.
# Loading the package required to read XML files.
library("XML")
# Also loading the other required package.
library("methods")
# Giving the input file name to the function.
result <- xmlParse(file = "file.xml")
#Converting the data into list
xml_data <- xmlToList(result)
#Printing the data
print(xml_data)
# Exracting the root node form the xml file.
root_node <- xmlRoot(result)
# Finding the number of nodes in the root.
root_size <- xmlSize(root_node)
# Printing the result.
print(root_size)
3.3. Mendapatkan detail node pertama di xml
# Loading the package required to read XML files.
library("XML")
# Also loading the other required package.
library("methods")
# Giving the input file name to the function.
result <- xmlParse(file = "file.xml")
# Exracting the root node form the xml file.
root_node <- xmlRoot(result)
# Printing the result.
print(root_node[1])
3.4. Mendapatkan rincian elemen yang berbeda dari sebuah node.
# Loading the package required to read XML files.
library("XML")
# Also loading the other required package.
library("methods")
# Giving the input file name to the function.
result <- xmlParse(file = "file.xml")
# Exracting the root node form the xml file.
root_node <- xmlRoot(result)
# Getting the first element of the first node.
print(root_node[[1]][[1]])
# Getting the fourth element of the first node.
print(root_node[[1]][[4]])
# Getting the third element of the third node.
print(root_node[[3]][[3]])
4. Convert xml into data frame
# Loading the package required to read XML files.
library("XML")
# Also loading the other required package.
library("methods")
# Giving the input file name to the function xmlToDataFrame.
data_frame <- xmlToDataFrame("file.xml")
#Printing the result
print(data_frame)