Alain Galvan ·9/8/2019 8:30 PM · Updated 1 year ago
An overview on how to program a Hello Triangle Vulkan application from the ground up. Learn the core data structures and modern GPU execution model needed to render raster based graphics.
Tags: blogvulkanhelloworldtriangleintroductionbeginneropengl
Vulkan is a new low level Graphics API released February 2016 by the Khronos Group that maps directly to the design of modern GPUs.
Vulkan is used by Game Developers, Rendering Engineers and Scientists looking to do real-time rendering, raytracing, data visualization, GPGPU computations, machine learning, physics simulations, etc.
Graphic Processing Units (GPUs) were originally simple Application Specific Integrated Circuits (ASICs), but since then they have become programmable computational units of their own with a focus on throughput over latency [Fatahalian 2018]. Older APIs like OpenGL or DirectX 9 and below were designed for hardware that's drastically changed since the early 90s when they were first released, so Vulkan was designed from scratch to match the way GPUs are engineered today.
Currently Vulkan 1.x supports the following platforms:
🖼️ Windows
🐧 Linux
🤖 Android
With Apple MacOS, iOS, and iPad OS supporting Vulkan through MoltenVK, a Vulkan-Metal compatibility layer that's licensed as Apache 2.0.
🍎 Mac OS
📱 iOS / iPad OS
In addition to other surprising platforms such as TVs, game consoles, etc.
🎮 Nintendo Switch
📺 NVIDIA Shield
🌐 Google Stadia
And languages such as:
C - Through the official bindings for Vulkan, as C is Vulkan's official language.
C++ - Through Vulkan-Hpp the official Vulkan C++ library.
Rust - Through Vulkano, an intuitive Rust wrapper with a heavy focus on compile time safety.
JavaScript - Through Node Vulkan, node.js bindings for native web applications.
Python - Through pyVulkan, a Python FFI to the C implementation of Vulkan.
I've prepared a Github Repo with everything we need to get started. We're going to walk through a Hello Triangle app in modern C++ 17, a program that creates a triangle, processes it with a shader, and displays it on a window.
First install:
A Text Editor such as Visual Studio Code.
An IDE such as Visual Studio, XCode, or a compiler such as GCC.
Then type the following in your terminal.
# 🐑 Clone the repo
git clone https://github.com/alaingalvan/vulkan-seed --recurse-submodules
# 💿 go inside the folder
cd vulkan-seed
# 👯 If you forget to `recurse-submodules` you can always run:
git submodule update --init
# 🖼️ To build your Visual Studio solution on Windows x64
cmake -B build -A x64
# 🍎 To build your XCode project on Mac OS / iOS
cmake -B build -G Xcode
# 🐧 To build your .make file on Linux
cmake -B build
# 🔨 Build on any platform:
cmake --build buildRefer to this blog post on designing C++ libraries and apps for more details on CMake, Git Submodules, etc.
As your project becomes more complex, you'll want to separate files and organize your application to something more akin to a game or renderer, check out this post on game engine architecture and this one on real time renderer architecture for more details.
├─ 📂 external/ # 👶 Dependencies
│ ├─ 📁 crosswindow/ # 🖼️ OS Windows
│ ├─ 📁 crosswindow-graphics/ # 🎨 Vulkan Surface Creation
│ └─ 📁 glm/ # ➕ Linear Algebra
├─ 📂 src/ # 🌟 Source Files
│ ├─ 📄 Utils.h # ⚙️ Utilities (Load Files, Check Shaders, etc.)
│ ├─ 📄 Renderer.h # 🔺 Triangle Draw Code
│ ├─ 📄 Renderer.cpp # -
│ └─ 📄 Main.cpp # 🏁 Application Main
├─ 📄 .gitignore # 👁️ Ignore certain files in git repo
├─ 📄 CMakeLists.txt # 🔨 Build Script
├─ 📄 license.md # ⚖️ Your License (Unlicense)
└─ 📃readme.md # 📖 Read Me!CrossWindow - A cross platform system abstraction library written in C++ for managing windows and performing OS tasks.
CrossWindow-Graphics - A library to simplify creating an Vulkan Surface with CrossWindow.
Vulkan SDK - The official Vulkan SDK distributed by LunarG. This should be installed separately.
GLM - A C++ library that allows users to write glsl like C++ code, with types for vectors, matrices, etc.
We'll be writing our application using Vulkan's C++ API through vulkan.hpp, a type safe abstraction of vulkan.h.
In this application we will need to do the following:
Initialize the API - Create a Vulkan Instance to access inner functions of the Vulkan API. Pick the best Physical Device from every device that supports Vulkan on your machine. Create a Logical Device , Surface, Queue, Command Pool, Semaphores, Fences.
Create Commands - Describe everything that'll be rendered on the current frame in your command buffers.
Initialize Resources - Create a Descriptor Pool, Descriptor Set Layout, Pipeline Layout, Vertex Buffer/Index Buffer and send it to GPU Accessible Memory, describe our Input Attributes, create a Uniform Buffer, Render Pass, Frame Buffers, Shader Modules, and Pipeline State.
Setup Commands for each command buffer to set the GPU state to render the triangles.
Render - Use an Update Loop to switch between different frames in your swapchain as well as to poll input devices/window events.
Destroy any data structures once the application is asked to close.
The following will explain snippets that can be found in the Github repo, with certain parts omitted, and member variables (mMemberVariable) declared inline without the m prefix so their type is easier to see and the examples here can work on their own.
We're using CrossWindow to handle cross platform window creation, so creating a window and updating it is very easy:
#include "CrossWindow/CrossWindow.h"
#include "Renderer.h"
#include <iostream>
void xmain(int argc, const char** argv)
{
// 🖼 Create Window
xwin::WindowDesc wdesc;
wdesc.title = "Vulkan Seed";
wdesc.name = "MainWindow";
wdesc.visible = true;
wdesc.width = 640;
wdesc.height = 640;
wdesc.fullscreen = false;
xwin::Window window;
xwin::EventQueue eventQueue;
if (!window.create(wdesc, eventQueue))
{ return; };
// 🌋 Create a renderer
Renderer renderer(window);
// 🏁 Engine loop
bool isRunning = true;
while (isRunning)
{
bool shouldRender = true;
// ♻️ Update the event queue
eventQueue.update();
// 🎈 Iterate through that queue:
while (!eventQueue.empty())
{
//Update Events
const xwin::Event& event = eventQueue.front();
// 💗 On Resize:
if (event.type == xwin::EventType::Resize)
{
const xwin::ResizeData data = event.data.resize;
renderer.resize(data.width, data.height);
shouldRender = false;
}
// ❌ On Close:
if (event.type == xwin::EventType::Close)
{
window.close();
shouldRender = false;
isRunning = false;
}
eventQueue.pop();
}
// ✨ Update Visuals
if (shouldRender)
{
renderer.render();
}
}
}As an alternative to CrossWindow, you could use another library like GLFW, SFML, SDL, QT, or just interface directly with your OS windowing API.
Similar to the OpenGL context, a Vulkan application begins when you create an instance. This instance must be loaded with some information about the program such as its name, engine, and minimum Vulkan version, as well any extensions and layers you want to load.
void findBestExtensions(const std::vector<vk::ExtensionProperties>& installed,
const std::vector<const char*>& wanted,
std::vector<const char*>& out)
{
for (const char* const& w : wanted)
{
for (vk::ExtensionProperties const& i : installed)
{
if (std::string(i.extensionName).compare(w) == 0)
{
out.emplace_back(w);
break;
}
}
}
}
void findBestLayers(const std::vector<vk::LayerProperties>& installed,
const std::vector<const char*>& wanted,
std::vector<const char*>& out)
{
for (const char* const& w : wanted)
{
for (vk::LayerProperties const& i : installed)
{
if (std::string(i.layerName).compare(w) == 0)
{
out.emplace_back(w);
break;
}
}
}
}
uint32_t getQueueIndex(vk::PhysicalDevice& physicalDevice,
vk::QueueFlagBits flags)
{
std::vector<vk::QueueFamilyProperties> queueProps =
physicalDevice.getQueueFamilyProperties();
for (size_t i = 0; i < queueProps.size(); ++i)
{
if (queueProps[i].queueFlags & flags)
{
return static_cast<uint32_t>(i);
}
}
// Default queue index
return 0;
}
uint32_t getMemoryTypeIndex(vk::PhysicalDevice& physicalDevice,
uint32_t typeBits,
vk::MemoryPropertyFlags properties)
{
auto gpuMemoryProps = physicalDevice.getMemoryProperties();
for (uint32_t i = 0; i < gpuMemoryProps.memoryTypeCount; i++)
{
if ((typeBits & 1) == 1)
{
if ((gpuMemoryProps.memoryTypes[i].propertyFlags & properties) ==
properties)
{
return i;
}
}
typeBits >>= 1;
}
return 0;
};Extension - Anything that adds extra functionality to Vulkan, such as support for Win32 windows, or enabling drawing onto a target.
Layer - Middleware between existing Vulkan functionality, such as checking for errors. Layers can range from runtime debugging checks like LunarG's Standard Validation tools to hooks to the Steam renderer so your game can behave better when you Ctrl + Shift to switch to the Steam overlay.
You'll want to begin by determining which extensions/layers you want, and compare that with which are available to you by Vulkan.
// 👋 Declare handles
vk::Instance instance;
// 🔍 Find the best Instance Extensions
std::vector<vk::ExtensionProperties> installedExtensions = vk::enumerateInstanceExtensionProperties();
std::vector<const char*> wantedExtensions =
{
VK_KHR_SURFACE_EXTENSION_NAME,
#ifdef VK_USE_PLATFORM_WIN32_KHR
VK_KHR_WIN32_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_MACOS_MVK
VK_MVK_MACOS_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_XCB_KHR
VK_KHR_XCB_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_ANDROID_KHR
VK_KHR_ANDROID_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_XLIB_KHR
VK_KHR_XLIB_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_XCB_KHR
VK_KHR_XCB_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_WAYLAND_KHR
VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_MIR_KHR || VK_USE_PLATFORM_DISPLAY_KHR
VK_KHR_DISPLAY_EXTENSION_NAME
#elif VK_USE_PLATFORM_ANDROID_KHR
VK_KHR_ANDROID_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_IOS_MVK
VK_MVK_IOS_SURFACE_EXTENSION_NAME
#endif
};
std::vector<const char*> extensions = {};
findBestExtensions(installedExtensions, wantedExtensions, extensions);
// 🔎 Find the best Instance Layers
std::vector<vk::LayerProperties> installedLayers =
vk::enumerateInstanceLayerProperties();
std::vector<const char*> wantedLayers = {
#ifdef _DEBUG
"VK_LAYER_LUNARG_standard_validation"
#endif
};
std::vector<const char*> layers = {};
findBestLayers(installedLayers, wantedLayers, layers);
// ⚪ Create an Instance
vk::ApplicationInfo appInfo;
appInfo = {.pApplicationName = "MyApp",
.applicationVersion = VK_MAKE_VERSION(1, 0, 0),
.pEngineName = "MyAppEngine",
.engineVersion = VK_MAKE_VERSION(1, 0, 0),
.apiVersion = VK_API_VERSION_1_2};
vk::InstanceCreateInfo ci = vk::InstanceCreateInfo(
vk::InstanceCreateFlags(), &appInfo, layers, extensions);
vk::Instance instance = vk::createInstance(ci);In Vulkan, you have access to all enumerable devices that support it, and can query for information like their name, the number of heaps they support, their manufacturer, etc.
// 👋 Declare handles
vk::PhysicalDevice physicalDevice;
// 💡 Initialize Devices
std::vector<vk::PhysicalDevice> physicalDevices = instance.enumeratePhysicalDevices();
physicalDevice = physicalDevices[0];This is useful for choosing the fastest device to use, however you could use the
KHX_device_groupextension presented at GDC 2017 to help with multi-gpu processing.
You can then create a logical device from a physical device handle. A logical device can be loaded with its own extensions/layers, can be set to work with graphics, GPGPU computations, handle sparse memory and/or memory transfers by creating queues for that device.
A logical device is your interface to the GPU, and allows you to allocate data and queue up tasks.
// 👋 Declare handles
uint32_t queueFamilyIndex;
vk::SurfaceKHR surface;
vk::Device device;
// 👪 Queue Family
queueFamilyIndex = getQueueIndex(physicalDevice, vk::QueueFlagBits::eGraphics);
// ⏹ Get Vulkan Surface with CrossWindowGraphics
surface = xgfx::getSurface(&window, instance);
if (!physicalDevice.getSurfaceSupportKHR(queueFamilyIndex, surface))
{
// Check if queueFamily supports this surface
return;
}
// 📦 Queue Creation
std::vector<vk::DeviceQueueCreateInfo> queueCreateInfos;
float queuePriority = 0.5f;
vk::DeviceQueueCreateInfo qcinfo;
qcinfo = {.queueFamilyIndex = queueFamilyIndex,
.queueCount = 1,
.pQueuePriorities = &queuePriority};
queueCreateInfos.emplace_back(qcinfo);
// 🎮 Logical Device
std::vector<vk::ExtensionProperties> installedDeviceExtensions =
physicalDevice.enumerateDeviceExtensionProperties();
std::vector<const char*> wantedDeviceExtensions = {
VK_KHR_SWAPCHAIN_EXTENSION_NAME
};
std::vector<const char*> deviceExtensions = {};
findBestExtensions(installedDeviceExtensions,
wantedDeviceExtensions,
deviceExtensions);
vk::DeviceCreateInfo dinfo = {{}, queueCreateInfos, deviceExtensions};
device = physicalDevice.createDevice(dinfo);Once you have a virtual device, you can access the queues you requested when you created it:
// 👋 Declare handles
vk::Queue queue;
// 📦 We only allocated one queue earlier,
//so there's only one available on index 0.
queue = device.getQueue(queueFamilyIndex, 0);If your application is idle for too long, the Vulkan API will throw a vk::OutOfDateKHRError error, requiring you to re-initialize your graphics API.
A command pool is a means of allocating command buffers. Any number of command buffers can be made from command pools, with you as the developer responsible for managing when and how they're created and what is loaded in each.
A command pool cannot be used in multiple threads, but you can create one for each thread and manage them on a per thread level.
// 👋 Declare handles
vk::CommandPool commandPool;
// 🏊 Create a command pool
vk::CommandPoolCreateInfo commandPoolInfo = vk::CommandPoolCreateInfo(
vk::CommandPoolCreateFlags(vk::CommandPoolCreateFlagBits::eResetCommandBuffer),
queueFamilyIndex
);
commandPool = device.createCommandPool(commandPoolInfo);
// Later, once your ⛓️ vk::Swapchain has been created
// Lets allocate 1 command buffer for each swapchain image.
std::vector<vk::CommandBuffer> commandBuffers = device.allocateCommandBuffers(
vk::CommandBufferAllocateInfo(
commandPool,
vk::CommandBufferLevel::ePrimary,
swapchainBuffers.size()
)
);A descriptor pool is a means of allocating Descriptor Sets, a set of data structures containing implementation-specific descriptions of resources. to make a descriptor pool, you need to describe exactly how many of each type of descriptor you need to allocate.
To do that you need to provide a collection of the size of each descriptor type.
// 👋 Declare handles
vk::DescriptorPool descriptorPool;
std::vector<vk::DescriptorPoolSize> dpsizes =
{
vk::DescriptorPoolSize(
vk::DescriptorType::eUniformBuffer,
1
)
};
// 🎱 Create Descriptor Pool
vk::DescriptorPoolCreateInfo dpci({}, 1, dpsizes);
descriptorPool = device.createDescriptorPool(dpci);Like command buffers, we'll come back to descriptor sets later.
While these work well enough, using bindless resources is significantly more easy, Matt Pettineo (@MyNameIsMJP) wrote a chapter in Ray Tracing Gems 2 about this.
Knowing what Color formats your GPU supports will play a crucial role in determining what you can display and what kind of buffers you can allocate.
// 👋 Declare handles
vk::SurfaceFormatKHR surfaceColorFormat;
vk::ColorSpaceKHR surfaceColorSpace;
vk::Format surfaceDepthFormat;
// 🔴🟢🔵 Check to see if we can display rgb colors.
std::vector<vk::SurfaceFormatKHR> surfaceFormats = physicalDevice.getSurfaceFormatsKHR(surface);
if (surfaceFormats.size() == 1 && surfaceFormats[0].format == vk::Format::eUndefined)
surfaceColorFormat = vk::Format::eB8G8R8A8Unorm;
else
surfaceColorFormat = surfaceFormats[0].format;
surfaceColorSpace = surfaceFormats[0].colorSpace;
// Since all depth formats may be optional, we need to find a suitable depth format to use
// Start with the highest precision packed format
std::vector<vk::Format> depthFormats =
{
vk::Format::eD32SfloatS8Uint,
vk::Format::eD32Sfloat,
vk::Format::eD24UnormS8Uint,
vk::Format::eD16UnormS8Uint,
vk::Format::eD16Unorm
};
for (vk::Format& format : depthFormats)
{
vk::FormatProperties depthFormatProperties = physicalDevice.getFormatProperties(format);
// Format must support depth stencil attachment for optimal tiling
if (depthFormatProperties.optimalTilingFeatures & vk::FormatFeatureFlagBits::eDepthStencilAttachment)
{
surfaceDepthFormat = format;
break;
}
}A Swapchain is a structure that manages the allocation of frame buffers to be cycled through by your application. It's here that your application sets up V-Sync via double buffering or triple buffering.
One approach to setting this up is to take in a JSON file at the start of your application, say
config.json, which determines if you'll be using V-Sync, your screen resolution, any any other global data you want to configure.
// 👋 Declare handles
vk::Rect2D renderArea;
vk::Extent2D surfaceSize;
vk::Viewport viewport;
vk::SwapchainKHR swapchain;
void setupSwapchain(unsigned width, unsigned height)
{
// Setup viewports, vsync
vk::Extent2D swapchainSize = vk::Extent2D(width, height);
// All framebuffers / attachments will be the same size as the surface
vk::SurfaceCapabilitiesKHR surfaceCapabilities = physicalDevice.getSurfaceCapabilitiesKHR(surface);
if (!(surfaceCapabilities.currentExtent.width == -1 || surfaceCapabilities.currentExtent.height == -1)) {
swapchainSize = surfaceCapabilities.currentExtent;
renderArea = vk::Rect2D(vk::Offset2D(), swapchainSize);
viewport = vk::Viewport(0.0f, 0.0f, static_cast<float>(swapchainSize.width), static_cast<float>(swapchainSize.height), 0, 1.0f);
}
// VSync
std::vector<vk::PresentModeKHR> surfacePresentModes = physicalDevice.getSurfacePresentModesKHR(surface);
vk::PresentModeKHR presentMode = vk::PresentModeKHR::eImmediate;
for (vk::PresentModeKHR& pm : surfacePresentModes) {
if (pm == vk::PresentModeKHR::eMailbox) {
presentMode = vk::PresentModeKHR::eMailbox;
break;
}
}
// ⛓️ Create Swapchain, Images, Frame Buffers
device.waitIdle();
vk::SwapchainKHR oldSwapchain = swapchain;
// Some devices can support more than 2 buffers,
// but during my tests they would crash on fullscreen
// Tested on an NVIDIA 1080 and 165 Hz 2K display ~ @alainxyz
uint32_t backbufferCount = std::clamp(surfaceCapabilities.maxImageCount, 1U, 2U);
swapchain = device.createSwapchainKHR(
vk::SwapchainCreateInfoKHR(
vk::SwapchainCreateFlagsKHR(),
surface,
backbufferCount,
surfaceColorFormat,
surfaceColorSpace,
swapchainSize,
1,
vk::ImageUsageFlagBits::eColorAttachment,
vk::SharingMode::eExclusive,
1,
&queueFamilyIndex,
vk::SurfaceTransformFlagBitsKHR::eIdentity,
vk::CompositeAlphaFlagBitsKHR::eOpaque,
presentMode,
VK_TRUE,
oldSwapchain
)
);
surfaceSize = vk::Extent2D(std::clamp(swapchainSize.width, 1U, 8192U), std::clamp(swapchainSize.height, 1U, 8192U));
renderArea = vk::Rect2D(vk::Offset2D(), surfaceSize);
viewport = vk::Viewport(0.0f, 0.0f, static_cast<float>(surfaceSize.width), static_cast<float>(surfaceSize.height), 0, 1.0f);
// Destroy previous swapchain
if (oldSwapchain != vk::SwapchainKHR(nullptr))
{
device.destroySwapchainKHR(oldSwapchain);
}
// Resize swapchain buffers for use later
swapchainBuffers.resize(backbufferCount);
}A View in Vulkan is a handle to a particular resource on a GPU, such as an Image or a Buffer, and provides information on how that resource should be processed.
// 👋 Declare handles
vk::ImageView depthImageView;
depthImageView = device.createImageView(
vk::ImageViewCreateInfo(
vk::ImageViewCreateFlags(),
depthImage,
vk::ImageViewType::e2D,
surfaceDepthFormat,
vk::ComponentMapping(),
vk::ImageSubresourceRange(
vk::ImageAspectFlagBits::eDepth | vk::ImageAspectFlagBits::eStencil,
0,
1,
0,
1
)
)
);A render pass describes the attachments that are expected to be used when executing a graphics pipeline and their relationship with each other. This can be useful in tile based rendering for having information in advance to better optimize cache flushes.
// 👋 Declare handles
vk::RenderPass renderPass;
void createRenderPass()
{
std::vector<vk::AttachmentDescription> attachmentDescriptions =
{
vk::AttachmentDescription(
vk::AttachmentDescriptionFlags(),
surfaceColorFormat,
vk::SampleCountFlagBits::e1,
vk::AttachmentLoadOp::eClear,
vk::AttachmentStoreOp::eStore,
vk::AttachmentLoadOp::eDontCare,
vk::AttachmentStoreOp::eDontCare,
vk::ImageLayout::eUndefined,
vk::ImageLayout::ePresentSrcKHR
),
vk::AttachmentDescription(
vk::AttachmentDescriptionFlags(),
surfaceDepthFormat,
vk::SampleCountFlagBits::e1,
vk::AttachmentLoadOp::eClear,
vk::AttachmentStoreOp::eDontCare,
vk::AttachmentLoadOp::eDontCare,
vk::AttachmentStoreOp::eDontCare,
vk::ImageLayout::eUndefined,
vk::ImageLayout::eDepthStencilAttachmentOptimal
)
};
std::vector<vk::AttachmentReference> colorReferences =
{
vk::AttachmentReference(0, vk::ImageLayout::eColorAttachmentOptimal)
};
std::vector<vk::AttachmentReference> depthReferences = {
vk::AttachmentReference(1, vk::ImageLayout::eDepthStencilAttachmentOptimal)
};
std::vector<vk::SubpassDescription> subpasses =
{
vk::SubpassDescription(
vk::SubpassDescriptionFlags(),
vk::PipelineBindPoint::eGraphics,
0,
nullptr,
static_cast<uint32_t>(colorReferences.size()),
colorReferences.data(),
nullptr,
depthReferences.data(),
0,
nullptr
)
};
std::vector<vk::SubpassDependency> dependencies =
{
vk::SubpassDependency(
~0U,
0,
vk::PipelineStageFlagBits::eBottomOfPipe,
vk::PipelineStageFlagBits::eColorAttachmentOutput,
vk::AccessFlagBits::eMemoryRead,
vk::AccessFlagBits::eColorAttachmentRead | vk::AccessFlagBits::eColorAttachmentWrite,
vk::DependencyFlagBits::eByRegion
),
vk::SubpassDependency(
0,
~0U,
vk::PipelineStageFlagBits::eColorAttachmentOutput,
vk::PipelineStageFlagBits::eBottomOfPipe,
vk::AccessFlagBits::eColorAttachmentRead | vk::AccessFlagBits::eColorAttachmentWrite,
vk::AccessFlagBits::eMemoryRead,
vk::DependencyFlagBits::eByRegion
)
};
renderPass = device.createRenderPass(
vk::RenderPassCreateInfo(
vk::RenderPassCreateFlags(),
static_cast<uint32_t>(attachmentDescriptions.size()),
attachmentDescriptions.data(),
static_cast<uint32_t>(subpasses.size()),
subpasses.data(),
static_cast<uint32_t>(dependencies.size()),
dependencies.data()
)
);
}A frame buffer in Vulkan is a container of Image Views that are bound to a specific render pass.
// ⛓️ The swapchain handles allocating frame images.
std::vector<vk::Image> swapchainImages = device.getSwapchainImagesKHR(swapchain);
// ↘️ Create Depth Image Data
vk::Image depthImage = device.createImage(
vk::ImageCreateInfo(
vk::ImageCreateFlags(),
vk::ImageType::e2D,
surfaceDepthFormat,
vk::Extent3D(surfaceSize.width, surfaceSize.height, 1),
1,
1,
vk::SampleCountFlagBits::e1,
vk::ImageTiling::eOptimal,
vk::ImageUsageFlagBits::eDepthStencilAttachment | vk::ImageUsageFlagBits::eTransferSrc,
vk::SharingMode::eExclusive,
queueFamilyIndices.size(),
queueFamilyIndices.data(),
vk::ImageLayout::eUndefined
)
);
// Search through GPU memory properties to see if this can be device local.
vk::MemoryRequirements depthMemoryReq = device.getImageMemoryRequirements(depthImage);
vk::DeviceMemory depthMemory = device.allocateMemory(vk::MemoryAllocateInfo(
depthMemoryReq.size,
getMemoryTypeIndex(physicalDevice, depthMemoryReq.memoryTypeBits,
vk::MemoryPropertyFlagBits::eDeviceLocal)));
device.bindImageMemory(
depthImage,
depthMemory,
0
);
vk::ImageView depthImageView = device.createImageView(
vk::ImageViewCreateInfo(
vk::ImageViewCreateFlags(),
depthImage,
vk::ImageViewType::e2D,
surfaceDepthFormat,
vk::ComponentMapping(),
vk::ImageSubresourceRange(
vk::ImageAspectFlagBits::eDepth | vk::ImageAspectFlagBits::eStencil,
0,
1,
0,
1
)
)
);
struct SwapChainBuffer {
vk::Image image;
std::array<vk::ImageView, 2> views;
vk::Framebuffer frameBuffer;
};
std::vector<SwapChainBuffer> swapchainBuffers;
swapchainBuffers.resize(swapchainImages.size());
for (int i = 0; i < swapchainImages.size(); i++)
{
swapchainBuffers[i].image = swapchainImages[i];
// 🌈 Color
swapchainBuffers[i].views[0] =
device.createImageView(
vk::ImageViewCreateInfo(
vk::ImageViewCreateFlags(),
swapchainImages[i],
vk::ImageViewType::e1D,
surfaceColorFormat,
vk::ComponentMapping(),
vk::ImageSubresourceRange(
vk::ImageAspectFlagBits::eColor,
0,
1,
0,
1
)
)
);
// ↘️ Depth
swapchainBuffers[i].views[1] = depthImageView;
swapchainBuffers[i].frameBuffer = device.createFramebuffer(
vk::FramebufferCreateInfo(
vk::FramebufferCreateFlags(),
renderPass,
swapchainBuffers[i].views.size(),
swapchainBuffers[i].views.data(),
surfaceSize.width,
surfaceSize.height,
1
)
);
}You could say Pipeline Barriers are the most powerful part of the Vulkan API, since it allows for granular control over preventing data races. ~ Charles Giessen (@charlesgiessen)
Vulkan was designed with concurrency in mind, and features 3 primitives for this, Semaphores, Fences, and programmable Barriers.
Semaphores coordinate operations within the GPU by introducing dependencies between operations.
// 🎌 Semaphore used to ensures that image presentation is complete before starting to submit again
vk::Semaphore presentCompleteSemaphore = device.createSemaphore(vk::SemaphoreCreateInfo());
// 🎌 Semaphore used to ensures that all commands submitted have been finished before submitting the image to the queue
vk::Semaphore renderCompleteSemaphore = device.createSemaphore(vk::SemaphoreCreateInfo());Fences are objects used to synchronize the CPU and GPU, allowing the CPU to be alerted when events have finished such as loading resources.
// 🚧 Fence for command buffer completion
std::vector<vk::Fence> waitFences;
waitFences.resize(swapchainBuffers.size());
for (int i = 0; i < waitFences.size(); i++)
{
waitFences[i] = device.createFence(vk::FenceCreateInfo(vk::FenceCreateFlagBits::eSignaled));
}Synchronization primitives can be used in a variety of queue operations.
// 💬 Usage in Command Buffer
vk::Result result;
vk::PipelineStageFlags waitDstStageMask = vk::PipelineStageFlagBits::eColorAttachmentOutput;
vk::SubmitInfo submitInfo(1, &presentCompleteSemaphore, &waitDstStageMask,
1, &commandBuffers[currentBuffer], 1,
&renderCompleteSemaphore);
result = queue.submit(1, &submitInfo, waitFences[currentBuffer]);
result = queue.presentKHR(
vk::PresentInfoKHR(
1,
&renderCompleteSemaphore,
1,
&swapchain,
¤tBuffer,
nullptr
)
);Though there exists 2 primitive objects, there are functions that also help with synchronization as well, such as pipeline barriers which offer granular control over synchronization within command buffers. We're not using any in this example but keep them in mind for your applications!
The fundamental problem of graphics is how to manage large sets of data. A vertex buffer is an array of rows of relevant vertex information, such as its position, normal, color, etc. Unlike OpenGL where it would handle allocation and handling memory for you, in Vulkan, you must:
For buffers that you want as GPU accessible only, you'll need to also copy that buffer to a GPU exclusive buffer.
Descriptor Sets describe the resources bound to the binding points in a shader (basically uniforms). They connect the binding points of a shader with the buffers and images used for those bindings.
Descriptor sets are composed of Descriptor Set Layouts, which are then composed of Descriptor Set Bindings, the individual bindings a uniform has. Often these are organized as different resource types.
In Vulkan, Uniforms must be contiguous structs of data that are multiples of 128 bits (So SIMD vector sized blocks).
In Facebook's React Fiber engine there's the idea of a frequently updated view and a not frequently updated view. Unreal Engine 4 shares this with two global uniform families for frequently (called variable parameters) and not frequently (constant parameters) updated uniforms. Descriptor Sets are where you would make this distinction in Vulkan.
// 👋 Declare handles
vk::DescriptorBufferInfo descriptor;
// Binding 0: Uniform buffer (Vertex shader)
std::vector<vk::DescriptorSetLayoutBinding> descriptorSetLayoutBindings =
{
vk::DescriptorSetLayoutBinding(
0,
vk::DescriptorType::eUniformBuffer,
1,
vk::ShaderStageFlagBits::eVertex,
nullptr
)
};
std::vector<vk::DescriptorSetLayout> descriptorSetLayouts = {
device.createDescriptorSetLayout(
vk::DescriptorSetLayoutCreateInfo(
vk::DescriptorSetLayoutCreateFlags(),
descriptorSetLayoutBindings.size(),
descriptorSetLayoutBindings.data()
)
)
};
std::vector<vk::DescriptorSet> descriptorSets = device.allocateDescriptorSets(
vk::DescriptorSetAllocateInfo(
descriptorPool,
descriptorSetLayouts.size(),
descriptorSetLayouts.data()
)
);
// 💪 Update c
std::vector<vk::WriteDescriptorSet> descriptorWrites =
{
vk::WriteDescriptorSet(
descriptorSets[0],
0,
0,
1,
vk::DescriptorType::eUniformBuffer,
nullptr,
&descriptor,
nullptr
)
};
// Update
device.updateDescriptorSets(descriptorWrites, nullptr);
// Bind at command buffer generation
cmd.bindDescriptorSets(
vk::PipelineBindPoint::eGraphics,
pipelineLayout,
0,
descriptorSets,
nullptr
);Pipeline layouts are a collection of descriptor sets, the bindings to a shader program. In Vulkan in order to bind a shader to a set of data, you needed to describe how the inputs and outputs are organized as inputs.
Access to descriptor sets from a pipeline is accomplished through a pipeline layout. Zero or more descriptor set layouts and zero or more push constant ranges are combined to form a pipeline layout object which describes the complete set of resources that can be accessed by a pipeline.
A pipeline layout represents a sequence of descriptor sets with each having a specific layout. This sequence of layouts is used to determine the interface between shader stages and shader resources.
A Graphics Pipeline is created using a pipeline layout.
// 👋 Declare handles
vk::PipelineLayout pipelineLayout;
vk::PipelineLayoutCreateInfo plci = {{},descriptorSetLayouts, {}};
pipelineLayout = device.createPipelineLayout(plci);
// 💪 Usage
cmd.bindDescriptorSets(
vk::PipelineBindPoint::eGraphics,
pipelineLayout,
0,
descriptorSets,
nullptr
);
Pipelines are basically a mix of hardware and software functions that do a particular task on the GPU, in Vulkan, there's 4 types:
Color Blending - The function that controls how two objects draw on top of each other.
Depth Stencil - A extra piece of information that describes depth information.
Vertex Input - The actual vertex data you'll be using in your shader.
Shaders - What shaders will be loaded in.
And many more. These can even be cached! These particular draw calls are grouped such that in older graphics APIs, they would trigger shader recompilation.
// Create Graphics Pipeline
std::vector<char> vertShaderCode = readFile("assets/triangle.vert.spv");
std::vector<char> fragShaderCode = readFile("assets/triangle.frag.spv");
vertModule = device.createShaderModule(
vk::ShaderModuleCreateInfo(
vk::ShaderModuleCreateFlags(),
vertShaderCode.size(),
(uint32_t*)vertShaderCode.data()
)
);
fragModule = device.createShaderModule(
vk::ShaderModuleCreateInfo(
vk::ShaderModuleCreateFlags(),
fragShaderCode.size(),
(uint32_t*)fragShaderCode.data()
)
);
pipelineCache = device.createPipelineCache(vk::PipelineCacheCreateInfo());
std::vector<vk::PipelineShaderStageCreateInfo> pipelineShaderStages = {
vk::PipelineShaderStageCreateInfo(
vk::PipelineShaderStageCreateFlags(),
vk::ShaderStageFlagBits::eVertex,
vertModule,
"main",
nullptr
),
vk::PipelineShaderStageCreateInfo(
vk::PipelineShaderStageCreateFlags(),
vk::ShaderStageFlagBits::eFragment,
fragModule,
"main",
nullptr
)
};
vk::PipelineVertexInputStateCreateInfo pvi = vertices.inputState;
vk::PipelineInputAssemblyStateCreateInfo pia(
vk::PipelineInputAssemblyStateCreateFlags(),
vk::PrimitiveTopology::eTriangleList
);
vk::PipelineViewportStateCreateInfo pv(
vk::PipelineViewportStateCreateFlagBits(),
1,
&viewport,
1,
&renderArea
);
vk::PipelineRasterizationStateCreateInfo pr(
vk::PipelineRasterizationStateCreateFlags(),
VK_FALSE,
VK_FALSE,
vk::PolygonMode::eFill,
vk::CullModeFlagBits::eNone,
vk::FrontFace::eCounterClockwise,
VK_FALSE,
0,
0,
0,
1.0f
);
vk::PipelineMultisampleStateCreateInfo pm(
vk::PipelineMultisampleStateCreateFlags(),
vk::SampleCountFlagBits::e1
);
// Dept and Stencil state for primative compare/test operations
vk::PipelineDepthStencilStateCreateInfo pds = vk::PipelineDepthStencilStateCreateInfo(
vk::PipelineDepthStencilStateCreateFlags(),
VK_TRUE,
VK_TRUE,
vk::CompareOp::eLessOrEqual,
VK_FALSE,
VK_FALSE,
vk::StencilOpState(),
vk::StencilOpState(),
0,
0
);
// Blend State - How two primatives should draw on top of each other.
std::vector<vk::PipelineColorBlendAttachmentState> colorBlendAttachments =
{
vk::PipelineColorBlendAttachmentState(
VK_FALSE,
vk::BlendFactor::eZero,
vk::BlendFactor::eOne,
vk::BlendOp::eAdd,
vk::BlendFactor::eZero,
vk::BlendFactor::eZero,
vk::BlendOp::eAdd,
vk::ColorComponentFlags(vk::ColorComponentFlagBits::eR | vk::ColorComponentFlagBits::eG | vk::ColorComponentFlagBits::eB | vk::ColorComponentFlagBits::eA)
)
};
vk::PipelineColorBlendStateCreateInfo pbs(
vk::PipelineColorBlendStateCreateFlags(),
0,
vk::LogicOp::eClear,
static_cast<uint32_t>(colorBlendAttachments.size()),
colorBlendAttachments.data()
);
std::vector<vk::DynamicState> dynamicStates =
{
vk::DynamicState::eViewport,
vk::DynamicState::eScissor
};
vk::PipelineDynamicStateCreateInfo pdy(
vk::PipelineDynamicStateCreateFlags(),
static_cast<uint32_t>(dynamicStates.size()),
dynamicStates.data()
);
pipeline = device.createGraphicsPipeline(
pipelineCache,
vk::GraphicsPipelineCreateInfo(
vk::PipelineCreateFlags(),
static_cast<uint32_t>(pipelineShaderStages.size()),
pipelineShaderStages.data(),
&pvi,
&pia,
nullptr,
&pv,
&pr,
&pm,
&pds,
&pbs,
&pdy,
pipelineLayout,
renderPass,
0
)
);A pipeline cache serves to cache previously created pipelines for reuse later. Since pipelines don't change often, this you can quickly create another for use later.
// 👋 Declare handles
vk::PipelineCache pipelineCache;
// 💵 Create Pipeline Cache
vk::PipelineCacheCreateInfo pcci;
pipelineCache = device.createPipelineCache(pcci);You're even able to compile the pipeline down into binary, and write the pipeline to a a file. This is part of the reason why DOOM 2016 takes a while to first start up when running it on Vulkan [Lottes 2016], with Doom Eternal downloading Vulkan binaries separately in Steam.
Shaders must be passed to Vulkan as Standard Portable Intermediate Representation V or SPIR-V binary, so any compiler that can make SPIR-V is allowed. Shaders are pre-compiled, loaded into memory, transferred to a shader module, bundled in a set of pipelineShaderStages, which is then put into a graphics pipeline.
Shaders are compiled using the glslangvalidator bundled with the Vulkan SDK provided by LunarG.
glslangvalidator -V shader.vert -o shader.vert.spv glslangvalidator -V shader.frag -o shader.frag.spv
Vulkan's GLSL code is the same as OpenGL 4.5:
// Vertex Shader
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable
// Uniforms now come in the form of input layouts
// Each location has a 128 bit alignment,
// so matrices/arrays mean larger strides in location.
layout (location = 0) in vec3 inPos;
layout (location = 1) in vec3 inColor;
layout (binding = 0) uniform UBO
{
mat4 projectionMatrix;
mat4 modelMatrix;
mat4 viewMatrix;
} ubo;
layout (location = 0) out vec3 outColor;
out gl_PerVertex
{
vec4 gl_Position;
};
void main()
{
outColor = inColor;
gl_Position = ubo.projectionMatrix * ubo.viewMatrix * ubo.modelMatrix * vec4(inPos.xyz, 1.0);
}// Fragment Shader
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable
layout (location = 0) in vec3 inColor;
layout (location = 0) out vec4 outFragColor;
void main()
{
outFragColor = vec4(inColor, 1.0);
}Your shaders can be pre-compiled at build time with build scripts.
Shaders are loaded into Pipeline Layouts which are then executed by a command buffer.
// 📈 Create your shader module handles
vk::ShaderModule vertModule = device.createShaderModule(
vk::ShaderModuleCreateInfo(
vk::ShaderModuleCreateFlags(),
vertexShader.size(),
vertexShader.data()
)
);
vk::ShaderModule fragModule = device.createShaderModule(
vk::ShaderModuleCreateInfo(
vk::ShaderModuleCreateFlags(),
fragShader.size(),
fragShader.data()
)
);A command buffer is a container of GPU commands, this is where you would see commands similar to OpenGL's state commands:
bindPipelinebindVertexBuffersbindIndexBuffersetViewportsetScissorblitImageA common pattern for building a command buffer is:
Different command buffer pools allow multiple threads performing generating command buffers, thus you could allocate a thread for each core on the CPU, and split rendering tasks across each core. This could be used to distribute rendering individual objects, differed rendering passes, physics calculations with compute buffers, etc.
Vulkan is a pretty complicated API to wrap your head around, and while this post attempts to make it simple, there's still a lot to bear in mind that other graphics APIs deal with for you. Aspects of the API like memory management, queue indices, descriptor sets, don't exist in other APIs but exist here to make this API much faster at the cost of added complexity to your renderer.
The Khronos Vulkan Specification page serves as a great start to all things Vulkan.
Sascha Willems (@SaschaWillems2) maintains
a very well architected and readable Vulkan
examples page here.
Alexander Overvoorde (@Overv) wrote the Vulkan
Tutorial, a comprehensive overview of
the Vulkan API that goes further into detail than this post.
Baldur Karlsson (@baldurk) wrote Vulkan in 30 minutes, a similar tutorial to this one introducing the API.
V. Blanco's series of articles on Vulkan.
The Graphics Virtual Meetup provided video overviews of a variety of Vulkan tutorials introducing the API.
VKGuide.dev is a comprehensive guide on writing Vulkan applications.
Arseny Kapoulkine (@zeuxcg) wrote an article on how to write an efficient vulkan renderer that goes over the mental model you should have when authoring your renderer.
You'll find all the source code described in this post in the Github repo here.
| [Fatahalian 2018] |
| [Lottes 2016] |