VBA Internals: String Variables and Pointers in Depth

May 31, 2013 in VBA

This is the next installment of a series of deep-dives into the structure and implementation of variables in Visual Basic for Applications. For the previous posts, see the following:

In this post, I will cover the details of string variables and pointers. See Scalar Variables and Pointers in Depth for additional background and for the code for the utility functions HexPtr and Mem_ReadHex.

Pointers and memory for string variables

Even though string variables are treated semantically as value types, they are reference types by implementation. The contents of a string variable is actually a pointer to another memory location where the actual string characters are stored. With VBA we can either get the address to the variable itself using VarPtr, or we can go straight to the start of the character buffer by using StrPtr. For a variable declared as a String, then, directly reading the memory at the address returned by VarPtr should give you the same pointer value as calling StrPtr.

Strings are BSTR structures

As noted in VBA Internals: What’s in a variable, strings in VBA are implemented using the COM BSTR structure. The BSTR structure actually starts with an unsigned 32-bit integer which indicates the length of the character buffer. Note this length is in bytes, not characters, and it does not include the two bytes of the terminating null character. However, the BSTR specification requires that implementers pass around the pointer to the start of the character buffer itself (rather than the preceding length field), so that a BSTR* can be passed directly to functions expecting pointers to C-style null-terminated strings. In order to directly read this length field, then, we need to take the pointer returned by StrPtr and back up 4 bytes.

In the example below, I show the full BSTR structure by getting the length in bytes of the string buffer itself using LenB, back up 4 bytes to include the length field, and read a total of 6 extra bytes to include both the length field at the start and the null character at the end.

Code
Sub StringPointerExample()
    
    Dim strVar As String, ptrVar As LongPtr, ptrBSTR As LongPtr
    
    strVar = "Hello"
    ptrVar = VarPtr(strVar)
    Mem_Copy ptrBSTR, ByVal ptrVar, PTR_LENGTH
    
    Debug.Print "ptrVar  : 0x"; HexPtr(ptrVar); _
                       " : 0x"; Mem_ReadHex(ptrVar, PTR_LENGTH)
    Debug.Print "ptrBSTR : 0x"; HexPtr(ptrBSTR)
    Debug.Print "StrPtr(): 0x"; HexPtr(StrPtr(strVar))
    Debug.Print "Memory  : 0x"; Mem_ReadHex(ptrBSTR - 4, LenB(strVar) + 6)
    
End Sub
Output
ptrVar  : 0x0039F4F0 : 0xE43A3508
ptrBSTR : 0x08353AE4
StrPtr(): 0x08353AE4
Memory  : 0x0A000000480065006C006C006F000000

Explanation

The variable table in this case is pretty simple:

Variables
Name Type Address
strVar String 0x0039F4F0

The functions used and memory layout revealed take a little more explaining. First, when we directly read the memory at the address returned by VarPtr, we get the bytes of the pointer to the character buffer. Since my machine is little-endian the raw bytes appear backwards. The printout shows that calling StrPtr returns the exact same pointer value as in ptrBSTR.

Finally, we actually display the bytes of the BSTR. It starts with the 4-byte length field. Again, this is little-endian so we have to reverse the bytes to correctly interpret it. When we do we indeed see a value of 10, for the 10 bytes of the 5-character Unicode string “Hello”. Next is the character buffer. The characters are in the order expected, but the two bytes within each 16-bit code point are once again little-endian. Finally, there is a two-byte null character at the end.

*VarPtr
Address 0 1 2 3
0x0039F4Fx E4 3A 35 08
 
 
= 0x08353AE4
*StrPtr
Address 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x08353AEx 0A 00 00 00 48 00 65 00 6C 00 6C 00 6F 00 00 00
Length Prefix
0A 00 00 00
= 0x0000000A = 1010
Chars
48 00
= 0x0048 = H
65 00
= 0x0065 = e
6C 00
= 0x006C = l
6C 00
= 0x006C = l
6F 00
= 0x006F = o
Null term
00 00