I've started learning Azure Synapse (and other Azure data engineering related stuff). I wanted to run a sample queries from Azure Synapse gallery, unfortunatelly in majority of cases I am getting errors:
"cannot be opened because it does not exist or it is used by another process"
It works when I am querying files on my storage account, it works when I am adding datasets from Synapse gallery and querying it.
It works when I am trying to query, for example file (from synapse gallery):
.csv
but it fails when I am trying to query files from
for example:
.csv
I am watching some tutorial, and on video it works without any other additional settings.
/* Explicitly specify schema */
select top 10 *
from openrowset(
bulk '.csv',
format = 'csv',
parser_version ='2.0',
firstrow = 2
) with (
date_rep date 1,
cases int 5,
geo_id varchar(6) 8
) as rows
/* Windows style new line */
SELECT *
FROM OPENROWSET(
BULK '.csv',
FORMAT = 'CSV', PARSER_VERSION = '2.0',
FIELDTERMINATOR =',',
ROWTERMINATOR = '
'
)
WITH (
[country_code] VARCHAR (5) COLLATE Latin1_General_BIN2,
[country_name] VARCHAR (100) COLLATE Latin1_General_BIN2,
[year] smallint,
[population] bigint
) AS [r]
WHERE
country_name = 'Luxembourg'
AND year = 2017;
1st is running ok, 2nd ends with error. Both from synapse gallery.
Any ideas how to run it and/or why is it not running? Thank you in advance
I've started learning Azure Synapse (and other Azure data engineering related stuff). I wanted to run a sample queries from Azure Synapse gallery, unfortunatelly in majority of cases I am getting errors:
"cannot be opened because it does not exist or it is used by another process"
It works when I am querying files on my storage account, it works when I am adding datasets from Synapse gallery and querying it.
It works when I am trying to query, for example file (from synapse gallery):
https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv
but it fails when I am trying to query files from
https://sqlondemandstorage.blob.core.windows.net
for example:
https://sqlondemandstorage.blob.core.windows.net/public-csv/population/population.csv
I am watching some tutorial, and on video it works without any other additional settings.
/* Explicitly specify schema */
select top 10 *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',
format = 'csv',
parser_version ='2.0',
firstrow = 2
) with (
date_rep date 1,
cases int 5,
geo_id varchar(6) 8
) as rows
/* Windows style new line */
SELECT *
FROM OPENROWSET(
BULK 'https://sqlondemandstorage.blob.core.windows.net/public-csv/population/population.csv',
FORMAT = 'CSV', PARSER_VERSION = '2.0',
FIELDTERMINATOR =',',
ROWTERMINATOR = '
'
)
WITH (
[country_code] VARCHAR (5) COLLATE Latin1_General_BIN2,
[country_name] VARCHAR (100) COLLATE Latin1_General_BIN2,
[year] smallint,
[population] bigint
) AS [r]
WHERE
country_name = 'Luxembourg'
AND year = 2017;
1st is running ok, 2nd ends with error. Both from synapse gallery.
Any ideas how to run it and/or why is it not running? Thank you in advance
As you mentioned you are able to query the file from the ADLS.
The below are the steps that help you to query using the OPENROWSET
Function.
You need to create an external table on top of the file that is sitting in the ADLS.
Because at the time of external table creation you will be creating DATA SOURCE.
IF NO DATA SOURCE:
Using OPENROWSET
can read file contents directly from the URL specified in the BULK
option without requiring a DATA_SOURCE
.
USING DATA SOURCE:
OPENROWSET
with DATA_SOURCE
allows access to files stored in a specified storage account.
Know more about how to use OPENROWSET using serverless SQL pool in Azure Synapse Analytics
SELECT TOP 100 *
FROM OPENROWSET(
BULK 'new/ecdc_cases.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE
) AS [file];
Results:
Below is the script to create an external table:
CREATE DATABASE <Your Serverless DB NAME>
Use <Your Serverless DB NAME>
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<YOUR MASTER KEY PASSWORD>'
CREATE DATABASE SCOPED CREDENTIAL <YOUR DATABASE SCOPED CREDENTIALS>
WITH IDENTITY ='SHARED ACCESS SIGNATURE',
SECRET ='Your SAS Token'
GO;
CREATE EXTERNAL DATA SOURCE <YOUR DATA SOURCE NAME>
WITH( LOCATION = 'https://<YOUR STORAGE ACCOUNT>.dfs.core.windows.net/'
, CREDENTIAL = credentialobject20
);
CREATE EXTERNAL FILE FORMAT <YOUR FILE FORMAT>
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR = ',', STRING_DELIMITER = '"', FIRST_ROW = 2)
)
CREATE EXTERNAL TABLE <YOUR EXTERNAL TABLE> (
[dateID] INT,
[product_id] varchar(500),
[location_id] varchar(500),
[visits] INT
)
WITH(
LOCATION ='</FILE_PATH/TO.CSV>',
DATA_SOURCE = <YOUR DATA SOURCE NAME>,
FILE_FORMAT = <YOUR FILE FORMAT>
)
GO
You can also refer to the Create and use native external tables using SQL pools in Azure Synapse Analytics
You can also create an external table using the file as well.
below are the steps:
Once you create the external table you will able to see the entire script which is auto genrated by the system and when you run the select statement it gives results.
Results: