VBA Internals: What’s in a variable

May 6, 2013 in VBA

Language designers and runtime authors are always telling us to ignore implementation details. For the most part, including for VBA, that’s good advice. But sometimes you need or want know how things really work under the covers. In VBA this often arises when importing windows API functions which require pointers. To really understand pointers in VBA you first have to understand variables in VBA. In this article I’ll delve into the gritty implementation details of variables in VBA.

VBA background

Language vs runtime: An aside on terminology

The VBA language and the VB runtime are not technically the same thing. The language is the syntax described by the VBA language specification, while the runtime is a bundle of utility functions that make implementing the language a lot easier. The runtime is a specific COM dll: msvbvmXX.dll for stand-alone classic VB applications, VBEX.dll for Office. You could use the functions in the VB runtime library from any COM-aware language. The actual implementation of VBA is in the compiler included with the last version of Visual Studio before .NET, and with Microsoft Office. These compilers in turn use the VB runtimes for a lot of the implementation work. For the purposes of this article I’ll treat the language, the compiler, and the runtime as the same thing. The fact is there isn’t any widely distributed implementation of the VBA language apart from Microsoft’s VB compiler and runtime.

VBA is a COM Automation wrapper

The VBA language and its runtime are built on and for COM Automation. As a result, the implementation details of VBA’s data types closely follow the types and interfaces defined for Automation. Some key examples include Automation’s BSTR, CURRENCY, DECIMAL, SAFEARRAY, and VARIANT types.

VBA Variables in general

I don’t work for Microsoft, and I haven’t tried to decompile the VB runtime. So this is really just speculation. But just as an orbital model of the atom works pretty well for a lot of chemistry even if it’s not technically correct, I can say that the following describes how variables work in VBA. Put another way: variables in VBA behave as if all the following were true.

The variable table

I think of the VB runtime as maintaining a table of variables. Each variable contains at least three pieces of information. While it is not possible to directly read this table (if it exists) such as with RtlMoveMemory, it is possible to get two of the three pieces of information using functions provided by VBA:

Attribute Obtained from function Description
Scope n/a The scope of the variable, used to determine when to automatically erase it.
Data Type VarType An enumeration value from VbVarType, which tells the runtime how to interpret the data stored by the variable. The values of VbVarType are in fact a subset of the COM VARENUM enumeration.
Address VarPtr The memory address where the content of the variable is stored.
Variable content <> variable value

In the above table, I indicate that the VarPtr function returns the memory address for the variable’s content. This is not necessarily the variable’s value in the semantic sense. For example, after the following code executes, the value of the variable myVar is the string “Hello”. But if you read the memory at the address returned by VarPtr(myVar), you would not get the start of the string “Hello.” What you would get instead is described in detail in the rest of this article.

Dim myVar As Variant
myVar = "Hello"

VBA Data Types, by Implementation

One of the purposes of a runtime is to simplify and hide complicated implementation details, like how variables actually work. At a VBA language or semantic level, there are only two basic types of variables : values and references. But from an implementation perspective there are really six types of variables. The rest of this article is my description of these six different variable implementation types.

Key

For each category of variable, I list the following information:

Var Types The built-in variable types that are in this category, per the VbVarType enumeration
Contents of variable If you directly read the memory at the address returned by VarPtr, this is what you would get
Description General information about the category.
On Assignment Exactly what happens when you assign a value to a variable in this category
On Erase Exactly what happens when the variable goes out of scope and is “erased” by the runtime
Short value type
Var Types Bool, Byte, Currency, Date, Integer, Long, LongPtr, LongLong, Single, Double
Contents of variable The literal value
Description All of these numeric value types fit in 4 or 8 bytes (one or two 32-bit words, or a single 64-bit word). The actual numeric values of these variables are stored at the address returned by VarPtr. The length and binary format of the data is of course determined by the specific data type.
On Assignment The VB runtime copies the literal value from the source variable’s content address to the destination variable’s content address.
On Erase Deallocate the small chunk of memory at the variable’s content address
String
Var Types String
Contents of variable The address of the 5th byte of a BSTR structure.
Description If you follow a string variable’s pointer you won’t actually get the start of a string. Instead, you’ll get another pointer. If you follow that pointer you’ll get the 5th byte of a BSTR structure, a COM Unicode string buffer. A BSTR starts with a 4-byte integer which specifies the length of the string, and is terminated by two null characters. But per the BSTR spec the pointer value points to the start of the actual string, in fact the beginning of the 5th byte of the BSTR structure. This allows BSTRs to be used like pointers to normal C-style null-terminated strings. But a COM consumer can back up 4 bytes, read the length, and know exactly how long the string is without having to scan for the null terminator. This also allows COM strings to include null characters within the string.
On Assignment The VB runtime truly creates a new copy of a string upon assignment. Both a new pointer (for the string variable) and a new BSTR structure (for the string’s value) are allocated. Then the contents of the source BSTR are copied to the destination BSTR, and the address of the new BSTR is written to the string variable’s content address.
On Erase Deallocate the memory occupied by the target BSTR structure, and deallocate the pointer that constitutes the string variable itself.
COM Object
Var Types Object, (and all COM object types)
Contents of variable The address of a COM object — that is, an IUknown interface
Description The Object type is the core connection between VBA and COM as a whole. The VBA Object type itself does not expose the three methods of the actual IUnknown interface to VBA programmers, but it uses them. Whenever the Set operator is invoked to assign a reference to an Object variable, the runtime calls the AddRef method to increment the target object’s reference count. When a Object variable releases its reference either by going out of scope or by explicitly setting it to Nothing, the runtime calls the target object’s Release method to decrement the object’s reference count. Finally, the runtime (late-binding) member resolution provided by VBA when using a variable declared as either Variant or Object is implemented by using the QueryInterface method.
On Assignment The VB runtime calls the AddRef method on the target object, then writes the pointer to the object’s IUnknown interface to the variable’s content address.
On Erase Call the target object’s Release method, then deallocate the pointer that constitutes the object variable itself.
Array
Var Types All array types
Contents of variable The address of the start of a SAFEARRAY header record.
Description The VB runtime uses COM safearrays to implement its array types. Safearrays come with a whole suite a manipulation functions, but all of these are hidden from the VBA programmer. Instead the runtime invokes them all behind the scenes to carry out the actions required by the VB program. Most of the time these actions don’t even seem like method invocations to the VBA programmer. For example, consider the following snippet:

Dim myArray() As Long 
ReDim myArray(1 to 2, -3 to 3)
myArray(1,-1) = 5

To the VBA programmer, no methods have been invoked. But behind the scenes the VB runtime has been busy. The ReDim statement invokes all of the following:

Then, the assignment statement actually invokes the SafeArrayPutElement method, which in turns calls the SafeArrayLock method before assigning the element and the SafeArrayUnlock method afterwords. Oh, and by the way if the element that was overwritten was an Object (IUnknown) then that object’s Release method will first be called; if the overwritten element was a String (BSTR) or Variant (VARIANT) it will also be cleaned up and the memory released.

Three pointers for an array

Arrays in VBA are built from at least 3 pointers. First, calling VarPtrArray() on a array variable (see VBA Internals: Getting Pointers) will get you the address to the variable content. If you directly read the variable content you will get another pointer — to the start of a SAFEARRAY struct. Finally, if you directly read from byte offset 12 (the pvData field) of the SAFEARRAY, you will get a pointer to the start of the array elements data.

On Assignment Although arrays are technically reference types (array variables contain a pointer), the VBA language semantically treats arrays as value types. That means assigning one array to another causes the entire array content to be copied. Behind the scenes that means a new SAFEARRAY struct must be created, a new buffer allocated for the array elements, and all the elements copied. The Automation API SafeArrayCopy method does all of this and returns a pointer to the new SAFEARRAY struct. The VBA runtime then simply writes this pointer to the content of the new array variable.
On Erase If the variable is a dynamic length array that has not been initialized, nothing needs to be done beyond reclaiming the 4 or 8 bytes of the array variable itself. If the array has been initialized, the SafeArrayDestroy method is called to clean up all the individual elements contained in the array, and then the SAFEARRAY structure. Finally the runtime reclaims the pointer that constitutes the array variable itself.
Variant
Var Types Variant
Contents of variable The address of the start of the VARIANT structure.
Description The COM VARIANT is a wrapper type the can contain or point to any other type, except user-defined types declared with the Type keyword. A VARIANT is always 16 bytes long, though not all of those bytes are used. The first two bytes indicate the data type that the variant currently holds, in the form of a 16-bit VARENUM enumeration value. In VBA, this will always be one of the VbVarType constants, which are a subset of the full COM VARENUM enumeration. Except for the DECIMAL subtype, the next 6 bytes are empty.

The second half of the VARIANT holds the variable’s actual contents. For data types that require less than 8 bytes, the Variant puts the content at the start of these 8 bytes and leaves the rest empty. In other words, a Variant/Byte will put the actual Byte value in the first byte and least the remaining 7 bytes blank. A 16-bit Integer will occupy the first 2 bytes and leave the remaining 6 bytes empty. A 32-bit Long, a Single, and a pointer type (array, object, or string) on a 32-bit platform will occupy the first 4 bytes and leave the remaining 4 bytes empty. Doubles, Dates, and pointers on 64-bit platforms of course use all 8 bytes.

ByRef Variants

There is a special type of Variant, too: a ByRef Variant. ByRef Variants are only created by the runtime when a non-Variant variable is passed as a method parameter that was declared ByRef as a Variant type. In this case, the VARENUM value at the start of the variant will include the VT_BYREF flag (0x4000), and the content of the variant will always be a pointer. The pointer points to another VBA variable; it is the exact same pointer as returned by calling VarPtr on that source variable.

On Assignment As with arrays, the VB runtime invokes a number of COM methods behind the scenes to work with Variant variables (see Variant Manipulation Functions). First, a VARIANT structure must be allocated using the VariantInit method. Next, the runtime assigns the specific value of the variant following all the steps described for the other types here, and places the result in the second half of the VARIANT structure. The VARIANT structure allocated is the content of the VBA variable; the VarPtr function returns the address directly to the VARIANT structure.
On Erase Unless the Variant has the VT_BYREF flag set, the runtime first erases the variant’s inner content by calling the VariantClear method, which in turn takes all the steps to erase the specific object or data contained in the Variant. Then the runtime deallocates the VARIANT structure. The content of a Variant variable is the VARIANT structure, so nothing more needs to be done.
User Defined Type
Var Types All user defined types
Contents of variable The contents of the user-defined type structure.
Description Most languages support compound data types. C, C++, and C# call them structs. In COM, they’re called User Defined Types, or UDTs for short, and are declared with the Type statement. Being a COM-based language, Visual Basic (and VBA) uses the term UDTs as well. UDTs are if limited use in VBA, as they cannot be stored in Variant variables. This a major drawback because so many of VBA’s built-in functions operate on Variants — including the type description functions VarType and TypeName. COM itself now provides ways to store UDTs in COM VARIANTS, but these depend on the UDT being defined in a compiled IDL file, which VBA projects do not have.
On Assignment The VB runtime allocates a new chunk of memory big enough to hold the new UDT value, and copies all the data to the new memory. VarPtr returns the address directly to the start of the allocated memory.
On Erase Simply deallocate the memory used by the UDT.