May 6, 2013 in VBA
Language designers and runtime authors are always telling us to ignore implementation details. For the most part, including for VBA, that’s good advice. But sometimes you need or want know how things really work under the covers. In VBA this often arises when importing windows API functions which require pointers. To really understand pointers in VBA you first have to understand variables in VBA. In this article I’ll delve into the gritty implementation details of variables in VBA.
Language vs runtime: An aside on terminology
The VBA language and the VB runtime are not technically the same thing. The language is the syntax described by the VBA language specification, while the runtime is a bundle of utility functions that make implementing the language a lot easier. The runtime is a specific COM dll: msvbvmXX.dll for stand-alone classic VB applications, VBEX.dll for Office. You could use the functions in the VB runtime library from any COM-aware language. The actual implementation of VBA is in the compiler included with the last version of Visual Studio before .NET, and with Microsoft Office. These compilers in turn use the VB runtimes for a lot of the implementation work. For the purposes of this article I’ll treat the language, the compiler, and the runtime as the same thing. The fact is there isn’t any widely distributed implementation of the VBA language apart from Microsoft’s VB compiler and runtime.
VBA is a COM Automation wrapper
The VBA language and its runtime are built on and for COM Automation. As a result, the implementation details of VBA’s data types closely follow the types and interfaces defined for Automation. Some key examples include Automation’s BSTR, CURRENCY, DECIMAL, SAFEARRAY, and VARIANT types.
VBA Variables in general
I don’t work for Microsoft, and I haven’t tried to decompile the VB runtime. So this is really just speculation. But just as an orbital model of the atom works pretty well for a lot of chemistry even if it’s not technically correct, I can say that the following describes how variables work in VBA. Put another way: variables in VBA behave as if all the following were true.
The variable table
I think of the VB runtime as maintaining a table of variables. Each variable contains at least three pieces of information. While it is not possible to directly read this table (if it exists) such as with RtlMoveMemory, it is possible to get two of the three pieces of information using functions provided by VBA:
|Attribute||Obtained from function||Description|
|Scope||n/a||The scope of the variable, used to determine when to automatically erase it.|
|Data Type||VarType||An enumeration value from VbVarType, which tells the runtime how to interpret the data stored by the variable. The values of VbVarType are in fact a subset of the COM VARENUM enumeration.|
|Address||VarPtr||The memory address where the content of the variable is stored.|
Variable content <> variable value
In the above table, I indicate that the
VarPtr function returns the memory address for the variable’s content. This is not necessarily the variable’s value in the semantic sense. For example, after the following code executes, the value of the variable
myVar is the string “Hello”. But if you read the memory at the address returned by
VarPtr(myVar), you would not get the start of the string “Hello.” What you would get instead is described in detail in the rest of this article.
Dim myVar As Variant myVar = "Hello"
VBA Data Types, by Implementation
One of the purposes of a runtime is to simplify and hide complicated implementation details, like how variables actually work. At a VBA language or semantic level, there are only two basic types of variables : values and references. But from an implementation perspective there are really six types of variables. The rest of this article is my description of these six different variable implementation types.
For each category of variable, I list the following information:
|Var Types||The built-in variable types that are in this category, per the VbVarType enumeration|
|Contents of variable||If you directly read the memory at the address returned by
|Description||General information about the category.|
|On Assignment||Exactly what happens when you assign a value to a variable in this category|
|On Erase||Exactly what happens when the variable goes out of scope and is “erased” by the runtime|
Short value type
|Var Types||Bool, Byte, Currency, Date, Integer, Long, LongPtr, LongLong, Single, Double|
|Contents of variable||The literal value|
|Description||All of these numeric value types fit in 4 or 8 bytes (one or two 32-bit words, or a single 64-bit word). The actual numeric values of these variables are stored at the address returned by
|On Assignment||The VB runtime copies the literal value from the source variable’s content address to the destination variable’s content address.|
|On Erase||Deallocate the small chunk of memory at the variable’s content address|
|Contents of variable||The address of the 5th byte of a BSTR structure.|
|Description||If you follow a string variable’s pointer you won’t actually get the start of a string. Instead, you’ll get another pointer. If you follow that pointer you’ll get the 5th byte of a BSTR structure, a COM Unicode string buffer. A BSTR starts with a 4-byte integer which specifies the length of the string, and is terminated by two null characters. But per the BSTR spec the pointer value points to the start of the actual string, in fact the beginning of the 5th byte of the BSTR structure. This allows BSTRs to be used like pointers to normal C-style null-terminated strings. But a COM consumer can back up 4 bytes, read the length, and know exactly how long the string is without having to scan for the null terminator. This also allows COM strings to include null characters within the string.|
|On Assignment||The VB runtime truly creates a new copy of a string upon assignment. Both a new pointer (for the string variable) and a new BSTR structure (for the string’s value) are allocated. Then the contents of the source BSTR are copied to the destination BSTR, and the address of the new BSTR is written to the string variable’s content address.|
|On Erase||Deallocate the memory occupied by the target BSTR structure, and deallocate the pointer that constitutes the string variable itself.|
|Var Types||Object, (and all COM object types)|
|Contents of variable||The address of a COM object — that is, an IUknown interface|
|Description||The Object type is the core connection between VBA and COM as a whole. The VBA Object type itself does not expose the three methods of the actual IUnknown interface to VBA programmers, but it uses them. Whenever the
|On Assignment||The VB runtime calls the AddRef method on the target object, then writes the pointer to the object’s IUnknown interface to the variable’s content address.|
|On Erase||Call the target object’s Release method, then deallocate the pointer that constitutes the object variable itself.|
|Var Types||All array types|
|Contents of variable||The address of the start of a SAFEARRAY header record.|
|Description||The VB runtime uses COM safearrays to implement its array types. Safearrays come with a whole suite a manipulation functions, but all of these are hidden from the VBA programmer. Instead the runtime invokes them all behind the scenes to carry out the actions required by the VB program. Most of the time these actions don’t even seem like method invocations to the VBA programmer. For example, consider the following snippet:
Dim myArray() As Long ReDim myArray(1 to 2, -3 to 3) myArray(1,-1) = 5
To the VBA programmer, no methods have been invoked. But behind the scenes the VB runtime has been busy. The ReDim statement invokes all of the following:
Then, the assignment statement actually invokes the SafeArrayPutElement method, which in turns calls the SafeArrayLock method before assigning the element and the SafeArrayUnlock method afterwords. Oh, and by the way if the element that was overwritten was an Object (IUnknown) then that object’s Release method will first be called; if the overwritten element was a String (BSTR) or Variant (VARIANT) it will also be cleaned up and the memory released.
Three pointers for an array
Arrays in VBA are built from at least 3 pointers. First, calling
|On Assignment||Although arrays are technically reference types (array variables contain a pointer), the VBA language semantically treats arrays as value types. That means assigning one array to another causes the entire array content to be copied. Behind the scenes that means a new SAFEARRAY struct must be created, a new buffer allocated for the array elements, and all the elements copied. The Automation API SafeArrayCopy method does all of this and returns a pointer to the new SAFEARRAY struct. The VBA runtime then simply writes this pointer to the content of the new array variable.|
|On Erase||If the variable is a dynamic length array that has not been initialized, nothing needs to be done beyond reclaiming the 4 or 8 bytes of the array variable itself. If the array has been initialized, the SafeArrayDestroy method is called to clean up all the individual elements contained in the array, and then the SAFEARRAY structure. Finally the runtime reclaims the pointer that constitutes the array variable itself.|
|Contents of variable||The address of the start of the VARIANT structure.|
|Description||The COM VARIANT is a wrapper type the can contain or point to any other type, except user-defined types declared with the
The second half of the VARIANT holds the variable’s actual contents. For data types that require less than 8 bytes, the Variant puts the content at the start of these 8 bytes and leaves the rest empty. In other words, a Variant/Byte will put the actual Byte value in the first byte and least the remaining 7 bytes blank. A 16-bit Integer will occupy the first 2 bytes and leave the remaining 6 bytes empty. A 32-bit Long, a Single, and a pointer type (array, object, or string) on a 32-bit platform will occupy the first 4 bytes and leave the remaining 4 bytes empty. Doubles, Dates, and pointers on 64-bit platforms of course use all 8 bytes.
There is a special type of Variant, too: a ByRef Variant. ByRef Variants are only created by the runtime when a non-Variant variable is passed as a method parameter that was declared ByRef as a Variant type. In this case, the VARENUM value at the start of the variant will include the VT_BYREF flag (0x4000), and the content of the variant will always be a pointer. The pointer points to another VBA variable; it is the exact same pointer as returned by calling
|On Assignment||As with arrays, the VB runtime invokes a number of COM methods behind the scenes to work with Variant variables (see Variant Manipulation Functions). First, a VARIANT structure must be allocated using the VariantInit method. Next, the runtime assigns the specific value of the variant following all the steps described for the other types here, and places the result in the second half of the VARIANT structure. The VARIANT structure allocated is the content of the VBA variable; the
|On Erase||Unless the Variant has the VT_BYREF flag set, the runtime first erases the variant’s inner content by calling the VariantClear method, which in turn takes all the steps to erase the specific object or data contained in the Variant. Then the runtime deallocates the VARIANT structure. The content of a Variant variable is the VARIANT structure, so nothing more needs to be done.|
User Defined Type
|Var Types||All user defined types|
|Contents of variable||The contents of the user-defined type structure.|
|Description||Most languages support compound data types. C, C++, and C# call them
|On Assignment||The VB runtime allocates a new chunk of memory big enough to hold the new UDT value, and copies all the data to the new memory.
|On Erase||Simply deallocate the memory used by the UDT.|