February 8, 2013 in VBA
A frequent question for most database platforms is how to use regular expressions in SQL queries. Regular expressions are an awesome tool for working with text and are built in to just about every widely used programming language, but are not standardized in SQL. Sometimes relational databases are employed to work on chunks of text and having regular expressions available is mighty helpful.
Oracle supports regular expressions natively as does Teradata 14 (first released in February 2012), but most other database platforms require some sort of extensibility mechanism. For SQL Server you have to build CLR wrappers, in DB2 and any version of Teradata before 14.0 you’d have to build a wrapper around Java. In Microsoft Access you also need a wrapper, but it’s really easy to build using the VBScript Regular Expressions engine. In this article I’ll show how to do it.
VBScript Regular Expressions
Describing regular expressions is outside the scope of this post. If you’re not already familiar with them, two great resources are the Regular Expressions documentation for the .NET Framework and the site Regular-Expressions.info. However, you also need to know that the VBScript regular expressions engine does not support all the features most other engines do. So you when you’re constructing your expressions make sure you reference the VBScript-specific regular expression documentation. It can be hard to find when searching MSDN (it no longer has a place in the main library hierarchy), so you may want to bookmark the page for safe keeping.
Bookmark this page: Introduction to Regular Expressions (Scripting)
Ok, let’s do some coding. With a database open in Microsoft Access, open up the VB IDE and add a new standard module. I’ve named mine
RegExpUtil. Now go to Tools > References and check Microsoft VBScript Regular Expressions 5.5, or the latest version available on your system. Now we’ll build VBA wrappers around each of the methods implemented by the RegExp object.
The RegExp object has three properties that affect how regular expressions are applied:
- IgnoreCase: I trust this is self-explanatory
- Global: Determines whether or not to find all possible matches in the input string. If Global is set to false, only the first match will be found or replaced, as applicable.
- MultiLine: Determines whether matches can span accross line breaks.
In each of the VBA methods below, I’ll include optional arguments corresponding to these three properties. Reflecting the fact the SQL is generally case-insensitive by default, I set the default value of IgnoreCase to true. Also, “Global” is a reserved word in VBA, so I use the name “MatchGlobal” for the argument instead.
Let’s start with the simplest function: Test, which just returns a Boolean indicating whether or not a match was found. This is equivalent to the .NET Regex.IsMatch method. In this case I don’t provide a parameter for setting Global, because that doesn’t make any sense for Test. By definition as soon as one match is encountered Test should exit and return true.
Public Function RxTest( _ ByVal SourceString As String, _ ByVal Pattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True) As Boolean With New RegExp .MultiLine = MultiLine .IgnoreCase = IgnoreCase .Global = False .Pattern = Pattern RxTest = .Test(SourceString) End With End Function
The logical next step is to actually extract the match. In this function we are just returning the first match found, so we still don’t provide a parameter to set Global. Now one consideration is what to return if there is no match at all. In languages where strings are value types (like classic VB and VBA) you would normally return an empty string. However, my target for these functions is integrating with SQL, where null semantics are the norm. Put simply, I want a column to be null if there is no match, not populated with a zero-length string, which is not the same thing. In order to be able to return null we have to declare the return type of the function as a Variant (or don’t specify a return type at all, which is the same thing in VBA). Note also that the MatchCollection object returned by the RegExp.Execute Method has a 0-based index, unlike most VBA and COM collections, which are 1-based.
Public Function RxMatch( _ ByVal SourceString As String, _ ByVal Pattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True) As Variant Dim oMatches As MatchCollection With New RegExp .MultiLine = MultiLine .IgnoreCase = IgnoreCase .Global = False .Pattern = Pattern Set oMatches = .Execute(SourceString) If oMatches.Count > 0 Then RxMatch = oMatches(0).Value Else RxMatch = Null End If End With End Function
Sometimes you want all the matches. This function is less likely to be used from SQL, where there is no concept of an array or collection of values, but I use it all the time in other VBA code. The option here is whether to return a Collection or an Array. They both have their own strengths, but I’ll go with an Array. Specifically, I’ll return a Variant array because 0-length statically typed arrays are rather tricky in VBA. Also in VBA we can’t actually declare a function return type as an array so we’ll once again have to declare it as a Variant. Finally, this time we will actually provide a Global parameter. The default is set to true because normally if you’re asking for Matches (plural) you do in fact want more than the first match.
Public Function RxMatches( _ ByVal SourceString As String, _ ByVal Pattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True, _ Optional ByVal MatchGlobal As Boolean = True) As Variant Dim oMatch As Match Dim arrMatches Dim lngCount As Long ' Initialize to an empty array arrMatches = Array() With New RegExp .MultiLine = MultiLine .IgnoreCase = IgnoreCase .Global = MatchGlobal .Pattern = Pattern For Each oMatch In .Execute(SourceString) ReDim Preserve arrMatches(lngCount) arrMatches(lngCount) = oMatch.Value lngCount = lngCount + 1 Next End With RxMatches = arrMatches End Function
Finally, we need a Replace function. This is where the Global option has the most impact. The return type is a String because in the event there is no match we’ll just get the entire input string intact, not an empty string. This is also the only place we’ll actually call the RegExp.Replace method. For those like myself who enjoy the named submatches provided by many regular expressions I must disappoint you: VBScript RegExp only supports numbered sub expressions. When you have nested parenthesis this can get tricky, so make sure you test, test, test. Of course you already do that, right?
Public Function RxReplace( _ ByVal SourceString As String, _ ByVal Pattern As String, _ ByVal ReplacePattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True, _ Optional ByVal MatchGlobal As Boolean = True) As String With New RegExp .MultiLine = MultiLine .IgnoreCase = IgnoreCase .Global = MatchGlobal .Pattern = Pattern RxReplace = .Replace(SourceString, ReplacePattern) End With End Function
Using the VBScript Regular Expressions engine, you can easily incorporate regular expressions into Access VBA and SQL functions. You have to decide whether you want to favor VBA semantics and return empty strings or SQL semantics and return nulls in the case that there is no match. VBScript Regular Expressions don’t support all the options many other regex engines do, so be sure to check the documentation on MSDN (link above).