With the iPhone 3GS, the iOS platform gained programmable shader support through the PowerVR SGX 535. Accompanied with an increase from 2 to 8 texture units, artists and developers have limitless creative expression by using any material in their 3d scenes that they can describe in GLSL. The unsung hero of these upgraded GPU’s are their “real” Vertex Buffer Objects (VBO).
Vertex Buffer Objects allow the vertex level data to be moved from slower main memory to memory that can be accessed more rapidly by the GPU. Instead of making a separate GL call to render each polygon, the VBO’s can be initialized when a model is loaded and a single GL call instructs the GPU to iterate through the vertices and rendering polygons is performed by the GPU. Even though the new IOS platforms have a UMM (Unified Memory Model) where the CPU and GPU share the same ram chips and bus, utilizing VBO’s frees up the ARM CPU core during rendering for other game pipeline tasks such as physics calculation, AI, and audio.
Although the VBO function calls don’t fail on earlier IOS devices, they are implemented in Apple’s OpenGLES framework as a simple loop of immediate mode calls that runs on the CPU. With the iPhone 3GS and its PowerVR SGX 535 GPU, real hardware support of VBO’s is now enabled. Initial tests of the performance advantage of the VBO’s gave mixed results. With small meshes, the frame-rate increased by 10x; however, larger models performed the same with and without utilizing VBO’s. Apple’s Instruments utility showed that the ARM CPU was the bottleneck at 100% usage on the larger meshes. Stepping into the framework code with GDB revealed that Apple’s code was falling back to an immediate mode rendering loop any time there were more than 65535 vertices in the VBO.
This is not a memory limitation, but rather due to a 16-bit integer VBO index register in the PowerVR SGX 535 silicon. I was able to attain the same 10x frame-rate increase with the larger models by splitting the vertexes up into multiple 65k index VBO objects.
I tested this limitation across several IOS devices:
|iPhone 3G||VBO’s are only emulated on the CPU with no performance advantage to utilizing them|
|iPhone 3Gs||VBO’s are supported in hardware, up to 65535 vertices, regardless of the number of vertex attributes per vertex. Multiple VBO’s can be created to exceed this limit.|
|iPhone 4, iPad 1||The A4 chip still has the 65535 vertex limit, but with a doubling of vertex throughput.|
|iPad 2||The upgraded GPU in the iPad 2’s A5 chip still has the 65535 vertex limit, but has even greater gains possible by splitting the VBO’s.|
I used the OpenGLES Performance Detective to measure the benefits of splitting the large VBO’s into multiple smaller VBO’s using a real-world model. The scene has 3484923 vertices, 1161641 textured triangles, and only applies back-face culling. One VBO is created for each object, with 4 vertex attributes containing 11 GL_FLOAT values per vertex. Some objects in scene have > 65k vertices while others are < 65k vertices.
|Large VBO’s||Split VBO’s|
|iPhone 4M||1.8 fps||6.3 fps|
|iPad 2||2.7 fps||17.0 fps|
Thanks to hardware VBO support, It is now possible to increase the scale and complexity of the scenes, while achieving frame rates that vastly surpass the non-VBO enabled rendering engines.