I transpile AutoHotkey v2 code to C# using ANTLR4 and Roslyn. An example using only a few grammar elements, described by these rules:
singleExpression := singleExpression
, for example a := 1
or a := b := 1
. White spaces are optional and newlines are allowed on both sides of the assignment operator.
a:=
1
is valid. (a := 1) := 2
causes 2
to be assigned to a
.a := 1 2
concatenates the digits to 12
and assigns to a
. End of line is allowed only if the concatenation is inside parenthesis or brackets.
a := 1
hello
would be considered two statements: assignment of 1
to a
and function call of hello
function.
a := (1
2)
is considered one and a
is assigned 12
. Explicit concatenation is also possible with the .
operator, in which case there must be a white space/newline on both sides of it. If there aren't white spaces on both sides then it's an object member access.::
, followed by either another hotkey (on a separate line) or a statement. For example a::b := 1
means "create a hotkey for the key a
which then assigns 1
to variable b
". a::MsgBox
triggers a function call for MsgBox
.::
, followed by another key. For example a::b
creates functionality where pressing a
sends b
instead. A remap takes priority over hotkey, so if the second key identifier matches a key name it's considered a remap, otherwise a hotkey. a::MsgBox
is a hotkey only because a key named MsgBox
doesn't exist.I'm trying to write the grammar performant. The expression statement a := 1
repeated 300,000 times is parsed and executed by AutoHotkey in < 2 seconds, whereas the following simplified grammar takes about 5 seconds in C# only to parse. I'd consider acceptable parsing performance < 10 seconds.
Simple.g4:
grammar Simple;
options {
caseInsensitive = true;
}
program: sourceElements EOF;
sourceElements: sourceElement+;
sourceElement
: statement EOL
| hotkey EOL
| remap EOL
| EOL+
;
hotkey
: HotkeyTrigger WS? statement
;
remap
: RemapKey
;
statement
: expressionStatement
| functionStatement
;
expressionStatement
: singleExpression (s? ',' s? singleExpression)*
;
singleExpression
: singleExpression WS singleExpression
| singleExpression s '.' s singleExpression
| <assoc = right> singleExpression WS? ':=' WS? singleExpression
| primaryExpression
;
primaryExpression
: Identifier
| primaryExpression ('.' primaryExpression)+ // Member access
| DecimalLiteral
| '(' singleExpression ')'
;
functionStatement
: primaryExpression
| primaryExpression WS (singleExpression (WS? ',' WS? singleExpression?)*)
;
s: (WS | EOL)+;
RemapKey : HotkeyCharacter '::' HotkeyCharacter;
HotkeyTrigger : HotkeyCharacter '::';
OpenParen : '(';
CloseParen : ')';
Comma : ',';
Dot : '.';
Assign : ':=';
DecimalLiteral : '0' | [1-9] [0-9_]*;
Identifier : IdentifierStart IdentifierPart*;
WS : [\t ]+;
EOL : [\r\n]+;
UnexpectedCharacter : . ;
fragment IdentifierPart : IdentifierStart | [\p{Mn}] | [\p{Nd}] | [\p{Pc}] | '\u200C' | '\u200D';
fragment IdentifierStart: [\p{L}] | [$_];
fragment HotkeyCharacter
: 'F1'
| 'Enter'
| ~[`\r\n ]
;
Example C#:
using System.Text;
using Antlr4.Runtime;
using Antlr4.Runtime.Atn;
using System.Diagnostics;
namespace AntlrCSharp
{
class Program
{
private static void Main(string[] args)
{
try
{
string input = "";
StringBuilder text = new StringBuilder();
string filePath = @"test.txt";
try
{
string fileContent = File.ReadAllText(filePath);
text.Append(fileContent);
}
catch (FileNotFoundException)
{
Console.WriteLine($"The file at {filePath} was not found.");
}
catch (Exception ex)
{
Console.WriteLine($"An error occurred: {ex.Message}");
}
StartSimpleParser(text);
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex);
}
}
public static void StartSimpleParser(StringBuilder text)
{
Console.WriteLine("Start");
AntlrInputStream inputStream = new AntlrInputStream(text.ToString());
SimpleLexer simpleLexer = new SimpleLexer(inputStream);
CommonTokenStream commonTokenStream = new CommonTokenStream(simpleLexer);
SimpleParser simpleParser = new SimpleParser(commonTokenStream);
/*
foreach (var token in SimpleLexer.GetAllTokens())
{
Console.WriteLine($"Token: {SimpleLexer.Vocabulary.GetSymbolicName(token.Type)}, Text: '{token.Text}'" + (token.Channel == MainLexer.Hidden ? " (hidden)" : ""));
}
*/
simpleParser.ErrorHandler = new BailErrorStrategy();
simpleParser.AddErrorListener(new DiagnosticErrorListener());
simpleParser.Interpreter.PredictionMode = PredictionMode.LL_EXACT_AMBIG_DETECTION;
SimpleParser.ProgramContext programContext = simpleParser.program();
Console.WriteLine("Parsed");
MainVisitor visitor = new MainVisitor();
visitor.Visit(programContext);
Console.WriteLine("End");
}
}
}
This grammar has a few problems:
singleExpression s? ':=' s? singleExpression
causes reportAttemptingFullContext error with LL_EXACT_AMBIG_DETECTION.RemapKey
definition HotkeyCharacter '::' HotkeyCharacter
means I have to separately parse it later in the visitor.How do I resolve these issues?