Looking for a SQL parser for C#? dotnet?

If you dig around a little you’d quickly find that there isn’t any free SQL parser library around.

Commercial parsers for example General SQL Parser http://www.sqlparser.com costs around USD450+(But thing is, it supports all major database vendor specific SQL syntax)
Microsoft TSql100Parser (http://msdn.microsoft.com/en-us/library/microsoft.data.schema.scriptdom.sql.tsql100parser(v=vs.100).aspx) is nice but you’d need Microsoft.Data.Schema.ScriptDom.dll amd Microsoft.Data.Schema.ScriptDom.Sql.dll – unfortunately you need Visual Studio Team System 2008 Database Edition to have access to these two dll’s.
Offers from Canadian Developer Sergey Gorbenko via Code Project is free (http://www.codeproject.com/Articles/32524/SQL-Parser) but unfortunately only exposes OrderByClause and WhereClause

There are couple Open Source SQL Parsers, but nothing from dotnet.

Java

http://jsqlparser.sourceforge.net/

http://zql.sourceforge.net/

C++

https://parsersql.codeplex.com/

If you want a dotnet SQL parser, and you don’t want to pay for it (Or that you prefer open source), then your only options are Irony parser http://irony.codeplex.com/ (Generic language parser), with SqlGrammar (https://github.com/Alxandr/Irony/blob/master/Irony.Samples/SQL/SqlGrammar.cs)

Even legendary Scott Hanselman also written a piece on this http://www.hanselman.com/blog/TheWeeklySourceCode59AnOpenSourceTreasureIronyNETLanguageImplementationKit.aspx
However, it’s not without limitations. Here’s a simple test drive.

Example 1: Simple SELECT
SELECT Col1, Col2, Col3 FROM SomeTable WHERE SomeTable=1

From Visual Studio Watch Window:
“whereClauseOpt” = ExprTree.Root.ChildNodes[0].ChildNodes[5].Term.Name
SomeTable = ExprTree.Root.ChildNodes[0].ChildNodes[5].ChildNodes[1].ChildNodes[0].ChildNodes[0].Token.Text
“=” = ExprTree.Root.ChildNodes[0].ChildNodes[5].ChildNodes[1].ChildNodes[1].ChildNodes[0].Token.Text
1 = ExprTree.Root.ChildNodes[0].ChildNodes[5].ChildNodes[1].ChildNodes[2].Token.Value

Example 2: Alias handling
This is OK:
SELECT Col1, Col2, Col3 FROM SomeTable left join SomeOtherTable on SomeTable.Id=SomeOtherTable.Id WHERE SomeTable=1
This is also OK (Column alias):
SELECT Col1 as XXX, Col2 as YYY, Col3 as ZZZ FROM SomeTable left join SomeOtherTable on SomeTable.Id=SomeOtherTable.Id WHERE SomeTable=1
SELECT Col1 XXX, Col2 YYY, Col3 ZZZ FROM SomeTable left join SomeOtherTable on SomeTable.Id=SomeOtherTable.Id WHERE SomeTable=1

This failed (Column Alias) with ParserMessages ‘Syntax error, expected: ., AS, id_simple, ,, INTO, FROM, WHERE, GROUP, HAVING, ORDER, ), ;, CREATE, ALTER, DROP, SELECT, INSERT, UPDATE, DELETE, GO
SELECT Col1 ‘XXX’, Col2 ‘YYY’, Col3 ‘ZZZ’ FROM SomeTable left join SomeOtherTable on SomeTable.Id=SomeOtherTable.Id WHERE SomeTable=1

This also failed (Table Alias) with ParserMessages ‘Syntax error, expected: ., ), ,, INNER, LEFT, RIGHT, JOIN, WHERE, GROUP, HAVING, ORDER, ;, CREATE, ALTER, DROP, SELECT, INSERT, UPDATE, DELETE, GO, ON’
SELECT Col1, Col2, Col3 FROM SomeTable a left join SomeOtherTable b on a.Id=b.Id WHERE SomeTable=1

Example 3. Subquery in SELECT list
The following failed with ParserMessages ‘Syntax error, expected: INTO, FROM, WHERE, GROUP, HAVING, ORDER, ), ;, CREATE, ALTER, DROP, SELECT, INSERT, UPDATE, DELETE, GO’

SELECT *, (select productName from Product where Orders.ProductId=Product.ProductId) Last_OrderDate FROM Orders

Example 4. Subquery in FROM clause
The following failed with ParserMessages ‘Syntax error, expected: id_simple’

SELECT ProductId, ProductName FROM (select top 10 ProductId, ProductName from Products) as SomeTable

Example 5. Subquery in WHERE clause
This following statement was parsed successfully!

SELECT OrderId, ProductName FROM Orders WHERE CustomerId in (SELECT CustomerId FROM BannedCustomers)

Example 6. Stored Procedures and UDF’s calls?
They all failed with ParserMessages ‘Syntax error, expected: CREATE, ALTER, DROP, SELECT, INSERT, UPDATE, DELETE, GO’

EXEC sp_SomeStoredProc
EXEC SomeDB..sp_SomeStoredProc

Also tried function/UDF calls:
EXEC @ret = dbo.ufnGetSalesOrderStatusText @Status= 5

Example 7. Declare variables?
Failed with ParserMessages ‘Syntax error, expected: CREATE, ALTER, DROP, SELECT, INSERT, UPDATE, DELETE, GO’

DECLARE @ret nvarchar(15)= NULL;

Example 8. CREATE PROCEDURE|FUNCTION|VIEW
The following failed with ParserMessages ‘Syntax error, expected: TABLE, UNIQUE, INDEX’

CREATE PROCEDURE spSelectStudent (@Course INTEGER, @Grade INTEGER) as BEGIN SELECT * FROM Courses WHERE Std_Course=@Course AND Std_Grade <= @Grade END
CREATE PROC spSelectStudent (@Course INTEGER, @Grade INTEGER) as BEGIN SELECT * FROM Courses WHERE Std_Course=@Course AND Std_Grade <= @Grade END
CREATE FUNCTION ISOweek (@DATE datetime) RETURNS int AS BEGIN DECLARE @ISOweek int; SET @ISOweek= DATEPART(wk,@DATE)+1; RETURN(@ISOweek);RETURN(@ISOweek); END
CREATE VIEW view_name AS SELECT Col1, Col2, Col3 FROM table_name WHERE Id<100

Example 9. USE SomeDatabase GO
Failed with ParserMessages ‘Syntax error, expected: CREATE, ALTER, DROP, SELECT, INSERT, UPDATE, DELETE, GO’

Quick scans at SqlGrammar following constructs also not supported:
a. CURSOR
b. IF-ELSE
c. TRY-CATCH

Example 10. And, of course, if your SQL is dynamically concatenated…

declare @stmt as varchar(255)

declare @SomeTable as varchar(25)

set @SomeTable = “TABLE_ABC”

set @stmt = “SELECT * FROM ” + @SomeTable

 

Almost forgot to mention … Irony is great. Except you need to drill quite deep into the object’s attribute hierarchy to find the Select|Where|GroupByClause’s …

This said, you need to do some digging into the object’s attribute hierarchy, even if you pay license fee for a commercial SQL parser such as http://www.sqlparser.com/.

Lets say you want to parse this SQL statement and identify Source tables and the one Destination table.
insert into Table1 select Col1, Col2, Col3 from Table2 a inner join (select Col3 from Table3 left join Table4 on Table3.DT=Table4.DT where Table4.Col4='US') b on a.ID=b.ID

With General SQL Parser, table nodes are of type TLz_Attr, with Location = TLzOwnerLocation.eltable. However, there’s nothing in the object’s attribute hierarchy with which you can distinguish Destination tables from Source table. You need to hard code the logic in your code: First table is the one Destination table.

Adding to the challenge – imagine your objective, in additional to extracting Source tables/Destination table, that you also need to track which field(s) is coming from which Source table?

2 comments

Leave a reply to Dan Kelly Cancel reply