Type | Read/write | Author | Availability |
Direct provider | Read | Finbourne | Provided with LUSID |
The Tools.Parse.Xml
provider enables you to write a Luminesce query that reads XML from one or more cells within a table.
Note: The LUSID user running the query must have sufficient access control permissions to use this provider. This should automatically be the case if you are the domain owner.
See also: Tools.Parse.Csv
Basic usage
@data =
select
'<filename>' as Filename,
'<?<XML-declaration>?>
<rows>
<row>
<<column-name>><some-data></<column-name>>
[<<column-name>>...]
</row>
</rows>' as Content;
@parsed =
use Tools.Parse.Xml with @input
--<optional-arguments>
enduse;
select * from @parsed;
Input tables
Tools.Parse.Xml
takes in one input table and outputs a table of data, see example 1.
Options
Tools.Parse.Xml
has options that enable you to refine a query.
An option takes the form --<option>=<value>
, for example --fileFilter=myFile.xml
. Note no spaces are allowed either side of the =
operator. If an option takes a boolean value, then specifying that option (for example --addFileName
) sets it to True; omitting the option specifies False.
The table below provides information on the most commonly used options for Tools.Parse.Xml
:
Option | Value | Status | Information |
--fileFilter | Regex string, for example MyFile.xml , MyFile.* or File[1-3] | Optional | Based on the Filename column, selects the rows which should be processed. If the Filename column is omitted, names files sequentially as File1 , File2 and so on. |
--addFileName | Boolean | Optional | Adds a column to the result set containing the file the row came from. |
--nodePath | String | Optional | XML Path query that selects the nodes to map to rows. Defaults to //* . |
To see a help screen of all available options, their data types, default values, and an explanation for each, run the following query using a suitable tool:
@x = use Tools.Parse.Xml
--help
enduse;
select * from @x;
Examples
Note: For more example Luminesce queries, visit our GitHub repo.
Example 1: Reading a single XML file from a table
@data =
select
'MyFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>some text</TextColumn1>
<TextColumn2>more text</TextColumn2>
<NumberColumn1>1</NumberColumn1>
<NumberColumn2>2</NumberColumn2>
</row>
</rows>' as Content
;
@parsed =
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row
enduse;
select * from @parsed;
The table of data returned looks like this:
Example 2: Reading multiple XMLs at the same time
In this example, the --addFileName
option is specified to ensure the first column in the table of results contains the filename corresponding to each row of data.
@data =
select
'MyFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>some text</TextColumn1>
<TextColumn2>more text</TextColumn2>
<NumberColumn1>1</NumberColumn1>
<NumberColumn2>2</NumberColumn2>
</row>
</rows>' as Content
union all
select
'MyOtherFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>text from my other file</TextColumn1>
<TextColumn2>more text from my other file</TextColumn2>
<NumberColumn1>10</NumberColumn1>
<NumberColumn2>20</NumberColumn2>
</row>
</rows>' as Content
union all
select
'MyThirdFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>some text from my third file</TextColumn1>
<TextColumn2>more text from my third file</TextColumn2>
<NumberColumn1>100</NumberColumn1>
<NumberColumn2>200</NumberColumn2>
</row>
</rows>' as Content
;
@parsed =
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row
enduse;
select * from @parsed;
The table of data returned looks like this:
Example 3: Returning only rows with a particular filename
In this example, the --fileFilter
option is specified to only select rows with a Filename
value that contains ReportFile
.
@data =
select
'ReportFile_1.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Jane Bloggs</EmployeeName>
<Id>26</Id>
</row>
</rows>' as Content
union all
select
'ReportFile_2.xml',
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Joe Bloggs</EmployeeName>
<Id>58</Id>
</row>
</rows>'
union all
select
'UnrelatedFile.xml',
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<UnrelatedColumn1>some unrelated text</UnrelatedColumn1>
<UnrelatedColumn2>some more text</UnrelatedColumn2>
</row>
</rows>'
;
@parsed =
use Tools.Parse.Xml with @data
--fileFilter=ReportFile.*
--nodePath=/rows/row
enduse;
select * from @parsed;
The table of data returned looks like this, with UnrelatedFile.Xml
being ignored for not matching the specified --fileFilter
value:
Example 4: Returning only particular rows without any filenames
In this example, the --fileFilter
option is specified to only select particular rows to process. As no Filename
column is specified in @data
, Tools.Parse.Xml
gives each row a sequential name which can then be referred to in the --fileFilter
option.
@data =
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Jane Bloggs</EmployeeName>
<Id>26</Id>
</row>
</rows>' as Content
union all
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Joe Bloggs</EmployeeName>
<Id>58</Id>
</row>
</rows>'
union all
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Jane Doe</EmployeeName>
<Id>73</Id>
</row>
</rows>'
union all
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>John Doe</EmployeeName>
<Id>107</Id>
</row>
</rows>'
;
@parsed =
use Tools.Parse.Xml with @data
--fileFilter=File[2,4]
--addFileName
--nodePath=/rows/row
enduse;
select * from @parsed;
The table of data returned looks like this, containing only the second and fourth rows from @data
: