VBA Internals: Array Variables and Pointers in Depth

June 5, 2013 in VBA

This is the next installment of a series of deep-dives into the structure and implementation of variables in Visual Basic for Applications. For the previous posts, see the following:

In this post, I will cover the details of array variables and pointers. See Scalar Variables and Pointers in Depth for additional background and for the code for the utility functions HexPtr and Mem_ReadHex.

Pointers and memory for array variables

Like strings, arrays in VBA are treated semantically like value types but are implemented as reference types. Also like strings, arrays in VBA are implemented using a COM automation structure. For arrays the supporting COM type is the safe array, which comes with a large group of utility functions.

The SAFEARRAY structure

To be specific, safe arrays are implemented with a SAFEARRAY structure (which itself contains one or more SAFEARRAYBOUND structures) and a data vector. The SAFEARRAY is a short structure which contains information about the array data, but no actual array content. When I write SAFEARRAY in all capitals I am referring specifically to the SAFEARRAY structure, which is just the “header” or metadata for the safe array object as a whole. When I write “safe array” in lower case I am referring to the array object as a whole.

SAFEARRAYs are somewhat tricky because their size depends both on processor architecture (32- or 64-bit) and on the number of dimensions in the array. The VBA declaration for a 1-dimensional SAFEARRAY structure looks like this:

Type SAFEARRAYBOUND
    cElements    As Long
    lLbound      As Long
End Type

Type SAFEARRAY_VECTOR
    cDims        As Integer
    fFeatures    As Integer
    cbElements   As Long
    cLocks       As Long
    pvData       As LongPtr
    rgsabound(0) As SAFEARRAYBOUND
End Type

Note that the second-to-last field of the SAFEARRAY is a LongPtr, which is 4 bytes on a 32-bit system but 8 bytes on a 64-bit system. The last field is a fixed array of SAFEARRAYBOUND structures. The real COM type can have any number SAFEARRAYBOUND elements, corresponding to the number of dimensions in the safearray. To declare an array in a user-defined type in VBA, however, the number of elements must be fixed. This is because arrays inside user-defined types are not “real” arrays. That is, they are not implemented as safearrays, but are instead simple value vectors like C-style arrays. The size of a UDT must be fixed at compile time, so the size of any arrays in a UDT must also be fixed. Which is all to say, if you want to examine the SAFEARRAY headers of arrays with more than one dimension, you will either have to do some manual pointer arithmetic or declare multiple versions of a SAFEARRAY UDT, each with a different number of elements in the rgsabound field.

VBA Arrays: Pointers to pointers to pointers

Technically, SAFEARRAY headers can be pointed to any vector of actual data just by changing the value of the pvData field. But VBA adds even another layer of indirection in that the content of an array variable itself is not a SAFEARRAY header, but instead a pointer to a SAFEARRAY header. This is distinct from UDT variables, which contain the entire structure in the variable’s content. Put another way, calling VarPtr on a UDT typed variable will get you the address of the start of the UDT structure, but calling VarPtr on a array variable will get you an address to yet another pointer.

I think diagrams make this all a lot clearer, so here we go with the sample code and diagrams for a simple array of Longs. Note in order to get the pointer to an actual array variable, you need to manually declare a different signature for the VarPtr function, traditionally aliased as “VarPtrArray”. See Getting Pointers for more details.

Code
Private Declare PtrSafe Function VarPtrArray Lib "VBE7" Alias _
    "VarPtr" (ByRef Var() As Any) As LongPtr

Private Type SAFEARRAYBOUND
    cElements    As Long
    lLbound      As Long
End Type

Private Type SAFEARRAY_VECTOR
    cDims        As Integer
    fFeatures    As Integer
    cbElements   As Long
    cLocks       As Long
    pvData       As LongPtr
    rgsabound(0) As SAFEARRAYBOUND
End Type

Sub ArrayPtrExample()
    Dim aLongs() As Long, i As Long
    Dim ptrToArrayVar As LongPtr
    Dim ptrToSafeArray As LongPtr
    Dim ptrToArrayData As LongPtr
    Dim ptrCursor As LongPtr
    Dim lngValue As Long
    Dim uSAFEARRAY As SAFEARRAY_VECTOR
    
    ReDim aLongs(3 To 12)
    For i = 3 To 12
        ' Triangular sum of i
        aLongs(i) = i * (i + 1) / 2
    Next
    
    ' Get pointer to array *variable*
    ptrToArrayVar = VarPtrArray(aLongs)
    
    ' Get the pointer to the *SAFEARRAY* by directly
    ' reading the variable's address
    CopyMemory ptrToSafeArray, ByVal ptrToArrayVar, PTR_LENGTH
    
    ' Read the SAFEARRAY struct
    CopyMemory uSAFEARRAY, ByVal ptrToSafeArray, LenB(uSAFEARRAY)
    
    ' Get the pointer to the actual vector of longs
    ptrToArrayData = uSAFEARRAY.pvData
    
    Debug.Print " ptrToArrayVar  : 0x"; HexPtr(ptrToArrayVar)
    Debug.Print "*ptrToArrayVar  : 0x"; Mem_ReadHex(ptrToArrayVar, PTR_LENGTH)
    Debug.Print " ptrToSafeArray : 0x"; HexPtr(ptrToSafeArray)
    Debug.Print "*ptrToSafeArray : 0x"; Mem_ReadHex(ptrToSafeArray, LenB(uSAFEARRAY))
    Debug.Print " ptrToArrayData : 0x"; HexPtr(uSAFEARRAY.pvData)
    Debug.Print "*ptrToArrayData : 0x"; Mem_ReadHex(uSAFEARRAY.pvData, 40)
      
    ' Demonstrate pointer arithmetic on value vector
    ptrCursor = ptrToArrayData
    For i = 0 To 9
        ' Fetch the Long value
        CopyMemory lngValue, ByVal ptrCursor, 4
        ' Print the pointer and its dereferenced value
        Debug.Print "ptrToArrayData[" & i & "] : 0x"; HexPtr(ptrCursor); _
                    " : 0x"; Hex$(lngValue); " = "; lngValue
        ' Increment the pointer
        ptrCursor = ptrCursor + 4
    Next
End Sub
Output
 ptrToArrayVar  : 0x0036EEF0
*ptrToArrayVar  : 0x80DB4600
 ptrToSafeArray : 0x0046DB80
*ptrToSafeArray : 0x01008000040000000000000A0DB4600
                    0A00000003000000
 ptrToArrayData : 0x0046DBA0
*ptrToArrayData : 0x060000000A0000000F00000015000000
                    1C000000240000002D00000037000000
                    420000004E000000
ptrToArrayData[0] : 0x0046DBA0 : 0x6 =  6 
ptrToArrayData[1] : 0x0046DBA4 : 0xA =  10 
ptrToArrayData[2] : 0x0046DBA8 : 0xF =  15 
ptrToArrayData[3] : 0x0046DBAC : 0x15 =  21 
ptrToArrayData[4] : 0x0046DBB0 : 0x1C =  28 
ptrToArrayData[5] : 0x0046DBB4 : 0x24 =  36 
ptrToArrayData[6] : 0x0046DBB8 : 0x2D =  45 
ptrToArrayData[7] : 0x0046DBBC : 0x37 =  55 
ptrToArrayData[8] : 0x0046DBC0 : 0x42 =  66 
ptrToArrayData[9] : 0x0046DBC4 : 0x4E =  78 

Explanation

The variable table in this case is simple:

Variables
Name Type Address
aLongs Long() 0x0036EEF0

But we have to take a number of hops to get from the variable to the actual Long values that make up the array. Along the way, we get the content of the SAFEARRAY structure. Here’s how it all maps out in detail. As always, the byte order and pointer size depends on the architecture; in my case it’s 32-bit office on a little-endian Intel processor:

*VarPtrArray
Address 0 1 2 3
0x0036EEFx 80 DB 46 00
 
 
= 0x0046DB80
*pSAFEARRAY
Address 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0046DB8x 01 00 08 00 04 00 00 00 00 00 00 00 A0 DB 46 00
0x0046DB9x 0A 00 00 00 03 00 00 00
cDims
01 00
= 0x0001 ⇒ 1 dimension
fFeatures
08 00
= 0x0008 ⇒ FADF_HAVEVARTYPE1
cbElements
04 00 00 00
= 0x00000004 ⇒ 4 bytes per element
cLocks
00 00 00 00
= 0x00000000 ⇒ No locks on array
pvData
A0 DB 46 00
= 0x0046DBA0 ⇒ Pointer to data vector
rgsabound[0] (SAFEARRAYBOUND):
cElements
0A 00 00 00
= 0x0000000A = 1010 ⇒ 10 elements in bound
lLbound
03 00 00 00
= 0x00000003 ⇒ Lower bound is 3
*pvData
Address 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0046DBAx 06 00 00 00 0A 00 00 00 0F 00 00 00 15 00 00 00
0x0046DBBx 1C 00 00 00 24 00 00 00 2D 00 00 00 37 00 00 00
0x0046DBCx 42 00 00 00 4E 00 00 00

1 See fFeatures section in SAFEARRAY structure