Type | Read/write | Author | Availability |
Direct provider | Read | Finbourne | Provided with LUSID |
The Drive.Csv provider enables you to write a Luminesce query that extracts data from one or more CSV or similar pipe-delimited or row-based text files stored in Drive.
Note: The LUSID user running the query must have sufficient access control permissions to both use the provider and enumerate target files and folders in Drive. This should automatically be the case if you are the domain owner.
The query returns a table of data assembled from the contents of the file or files in the order they are read.
See also: Drive.Excel, Drive.Sqlite, Drive.Xml, Drive.RawText, Drive.File
Basic usage
<options>
enduse;
select * from @x
Options
Drive.Csv has options that enable you to filter or refine a query.
Note: The --file option is mandatory.
An option takes the form --<option>=<value>, for example --file=trade-file.csv. Note no spaces are allowed either side of the = operator.
If an option:
- takes a boolean value, then specifying that option (for example --noHeader) sets it to True; omitting the option specifies False
- takes multiple string values, then specify a comma-separated list, for example --names=My,Column,Names
To see a help screen of available options, their data types, default values, and an explanation for each, run the following query using a suitable tool:
enduse;
select * from @x
Examples
In the following examples, the select * from @x syntax at the end prints the table of data assembled by the query.
Note: For more examples, try the Luminesce Github repo.
Example 1: Extract data from a particular CSV file
--file=/trade-files/eod.csv
enduse;
select * from @x
Example 2: Extract data from a matching CSV file
In this example, @@today is a scalar variable that must resolve to exactly one column and one row of data.
@x = use Drive.Csv with @@today
--file=/trade-files/eod-{@@today}.csv
enduse;
select * from @x
Example 3: Extract data from multiple CSV files matched using a regular expression
If --file is a folder in Drive, you must specify the --folderFilter option with a regular expression to nominate one or more files to read.
In this example, data is extracted from three files eod-1.csv, eod-2.csv and eod-3.csv, but not from eod-4.csv. The --addFileName option adds an extra column to the table of results showing the source of each record.
--file=/trade-files
--folderFilter=eod-[1-3].csv
--addFileName
enduse;
select * from @x
In this example, data is extracted from any file with a File_ prefix, 8 digits, an underscore, and then 6 digits, for example File_01072021_123456.csv:
--file=/TestLocation/
--folderFilter=File_\d{8}_\d{6}.csv
enduse;
select * from @datatoload
Example 4: Extract data from multiple files stored in a ZIP archive
In this example, daily.zip is stored in the root Drive folder; data is extracted from any file within it that just has alphabetic characters in its file name.
--file=daily.zip
--zipFilter=[a-zA-Z].csv
--addFileName
enduse;
select * from @x
Example 5: Extract data from a CSV file with no header row, instead supplying column names
Note the number of column names specified must match the inbound number of columns.
--file=/trade-files/daily.csv
--noHeader
--names=Equity,Units,Cost
enduse;
select * from @x
Example 6: Extract encoded data from a CSV file
In this example, the --encoding
option is specified to extract UTF-8 encoded data. See the Drive.Csv
help screen to view all supported encoding systems.
--file=/trade-files/daily.csv
--encoding=utf-8
enduse;
select * from @x
Example 7: Infer data types of columns from the first 10 rows
In this example, the --skipInvalidRows option is specified to ignore cells that cannot be successfully converted.
--file=/trade-files/daily.csv
--inferTypeRowCount=10
--skipInvalidRows
enduse;
select * from @x