Unity實現(xiàn)Nanite

2026-01-20 19:14:06　來源: 侑虎科技UWA

上海舉報

分享至

【USparkle專欄】如果你深懷絕技，愛“搞點研究”，樂于分享也博采眾長，我們期待你的加入，讓智慧的火花碰撞交織，讓知識的傳遞生生不息！

這是侑虎科技第1939篇文章，感謝作者傻頭傻腦亞古獸供稿。歡迎轉(zhuǎn)發(fā)分享，未經(jīng)作者授權請勿轉(zhuǎn)載。如果您有任何獨到的見解或者發(fā)現(xiàn)也歡迎聯(lián)系我們，一起探討。（QQ群：793972859）

作者主頁：

https://www.zhihu.com/people/tian-cai-ya-gu-shou

一、前序

1. 介紹

Nanite是UE5中虛擬幾何體（Virtualized Geometry System）的系統(tǒng)，主要用途是高效率渲染的高面數(shù)模型。Nanite會為模型自動生成LOD結(jié)構，與傳統(tǒng)LOD不同，Nanite的LOD不再是每個模型的，而是精細到模型中的局部區(qū)域，藝術家不需再為制作或處理LOD煩惱。并且還能享有GPU Driven的高效剔除，單個繪制調(diào)用的好處。

2. 技術要點

Nanite技術結(jié)合了多種技術做到了高效渲染：

1. Cluster Rendering：由Cluster組織三角形，可以享有更高效的剔除。

2. Auto LOD：通過Graph Partitioning技術劃分和簡化模型構建LOD，并且把數(shù)據(jù)組織成BVH結(jié)構在Runtime時候可以高效地并行選擇LOD，通過這種方式構建的LOD過渡非常絲滑。

3. GPU Driven Pipeline：由GPU驅(qū)動的繪制，減少了CPU的性能開銷。

4. Occlusion Culling：更細顆粒的遮擋剔除，用于剔除不可見的三角形。

5. Hardware/Software Rasterization：由于小三角形對于硬件光柵化非常不友好，所以針對這些三角形用Compute Shader執(zhí)行軟光柵提高效率。

6. Visibility Buffer：利用Visibility Buffer減少Overdraw，進一步提高GPU效率。

7. Streaming：加載只看到的相關數(shù)據(jù)，減少幾何體對內(nèi)存的壓力。

3. 本文效果

由于Nanite系統(tǒng)非常龐大和有非常多的工程細節(jié)要處理，所以本文會簡化和略過一些東西，僅實現(xiàn)核心部分，而且會與有UE5的版本有點出入。

下圖是本文實現(xiàn)的效果，每個色塊是一個三角形，可以看出LOD切換和相機剔除都非常絲滑。

色塊表示三角面

色塊表示Cluster

二、實現(xiàn)

1. Clusterize

第一步，在離線階段處理，將復雜的超高精度網(wǎng)格模型高效且合理地分割成更小、更易于管理的簇（Cluster），每個Cluster最多128個三角形。這種劃分不是簡單的切割，而是旨在最小化簇與簇之間連接的邊數(shù)（即切割大小），同時保持每個簇的大小大致均衡。

UE使用的Partition是Metis庫：

https://github.com/KarypisLab/METIS

實現(xiàn)代碼可以參考UE5的源碼部分：

UnrealEngine-release\Engine\Source\Developer\NaniteBuilder\Private\NaniteBuilder.cpp

本文使用meshoptimizer實現(xiàn)Mesh的切分Cluster和Partition功能，這個庫功能還有優(yōu)化Over Draw，Shadow Depth Index等功能：

https://github.com/zeux/meshoptimizer

我們新建一個C++導出DLL的工程，封裝幾個主要函數(shù)讓Unity可以使用。其實代碼量不多，翻譯成C# 直接用也可以。

分別是：

meshopt_buildMeshlets（構建Cluster）
meshopt_partitionClusters（Cluster劃分Partition）
meshopt_buildMeshletsBound（計算Cluster數(shù)量）
meshopt_computeSphereBounds（合并BoundsSphere）

在C# 中引用這些函數(shù)：

                                                           unsafe static List 
                    
   
          clusterize(Vector3[] vertices, int[] indices)
    {
        constint max_vertices = 192; // TODO: depends on kClusterSize, also may want to dial down for mesh shaders
        constint max_triangles = kClusterSize; //128
        constint min_triangles = (kClusterSize / 3) & ~3;
        constfloat split_factor = 2.0f;
        constfloat fill_weight = 0.75f;
        int max_meshlets = BuildMeshletsBound(indices.Length, max_vertices, max_triangles);//meshopt_buildMeshletsBound 
        var meshlets = new Meshlet[max_meshlets * 2];
        var meshlet_vertices = newint[max_meshlets * max_vertices];
        var meshlet_triangles = newbyte[max_meshlets * max_triangles * 3];
        var meshlet_count = BuildMeshletFlex(meshlets, meshlet_vertices, meshlet_triangles, indices, indices.Length, vertices, vertices.Length, sizeof(float) * 3, max_vertices, min_triangles, max_triangles, 0.0f,
            split_factor);//meshopt_buildMeshlets 
        List 
                  
 clusters =  
         new List 
                  
 (meshlet_count); 
         
        for (int i = 0; i < meshlet_count; i++)
        {
            ref Meshlet meshlet = ref meshlets[i];
            fixed (int* ptr = &meshlet_vertices[meshlet.vertex_offset])
            {
                fixed (byte* ptr2 = &meshlet_triangles[meshlet.triangle_offset])
                {
                    OptimizeMeshlet(ptr, ptr2, (int)meshlet.triangle_count, (int)meshlet.vertex_count);
                }
            }

             Cluster cluster = new Cluster();
            cluster.indices = newint[meshlet.triangle_count * 3];
            for (int j = 0; j < meshlet.triangle_count * 3; ++j)
                cluster.indices[j] =
                    meshlet_vertices[meshlet.vertex_offset + meshlet_triangles[meshlet.triangle_offset + j]];

             cluster.parent.error = float.MaxValue;
            clusters.Add(cluster);
        }

         return clusters;
    }

然后可以直接通過meshopt_buildMeshlets函數(shù)，獲得每個cluster的indexs。

2. Build DAG

有了這些Cluster，就可以構建“LOD”了，只需要循環(huán)這個操作：打組->合并->減面->clusterize。如下圖：

這個過程感覺就像Mipmap一樣，一層一層往上合并和簡化，并記錄一個Err誤差值和Bounds用于運行時LOD選擇用。而這些合并的的節(jié)點就叫做Cluster Group。最后得出一個DAG（有向無環(huán)圖，Directed Acyclic Graph）的結(jié)構。

                                                           public struct ClusterGroup
    {
        public List Children;
        public Vector3 Bounds;
        publicfloat radius;
        public Vector3 LODBounds;
        publicfloat MinLODError;
        publicfloat MaxParentLODError;
        publicint MipLevel;
    } 

 publicclassNaniteSubMesh
    {
        public List 
 Group> clusterGroupList;
        public List 
 
 clusterList; 
 
        publicint maxMipLevel;
    }

 static NaniteSubMesh Nanite(Vector3[] vertices,Vector3[] normals, int[] indices)
    {
        NaniteSubMesh res = new NaniteSubMesh();
        List 
 Group> clusterGroupList = new List 
 Group>();
        var clusters = clusterize(vertices, indices);
        res.clusterList = clusters;
        res.clusterGroupList = clusterGroupList;
        res.maxMipLevel = 0;
        for (int i = 0; i < clusters.Count; ++i)
        {
            var c = clusters[i];
            c.self = Bounds(vertices, clusters[i].indices, 0f);
            c.mip = 0;
            clusters[i] = c;
        }

         List pending = new List(clusters.Count);
        int[] remap = newint[vertices.Length];
        for (int i = 0; i < remap.Length; ++i)
            remap[i] = i;
        for (int i = 0; i < clusters.Count; ++i)
            pending.Add(i);

         int curMip = 1;
        byte[] locks = newbyte[vertices.Length];
        while (pending.Count > 1)
        {
            List 
 int>> groups = partition(clusters, pending, remap, vertices);
            if (kUseLocks)
                lockBoundary(locks, groups, clusters, remap);
            pending.Clear();
            List retry = new List();
            int triangles = 0;
            int stuck_triangles = 0;
            for (int i = 0; i < groups.Count; ++i)
            {
                var curGroupClusters = groups[i];
                if (curGroupClusters.Count == 0)
                {
                    continue; // metis shortcut
                }

                 List merged = new List(vertices.Length);
                for (int j = 0; j < curGroupClusters.Count; ++j)
                {
                    merged.AddRange(clusters[curGroupClusters[j]].indices);
                }
                LODBounds groupb = boundsMerge(clusters, curGroupClusters);
                ClusterGroup clusterGroup = new ClusterGroup();
                clusterGroup.Bounds = groupb.center;
                clusterGroup.MaxParentLODError = groupb.error;
                clusterGroup.radius = groupb.radius;
                clusterGroup.Children = new List(merged.Count);
                clusterGroup.MipLevel = curMip - 1;
                for (int j = 0; j < curGroupClusters.Count; ++j)
                {
                    clusterGroup.Children.Add(curGroupClusters[j]);
                }
                clusterGroupList.Add(clusterGroup);

                 // aim to reduce group size in half
                int target_size = (merged.Count / 3) / 2 * 3;
                float error = 0f;
                var simplified = simplify(vertices, normals, merged.ToArray(), kUseLocks ? locks : null, target_size,
                    ref error);
                if (simplified.Count > merged.Count * kSimplifyThreshold)
                {
                    stuck_triangles += merged.Count / 3;
                    for (int j = 0; j < curGroupClusters.Count; ++j)
                    {
                        retry.Add(curGroupClusters[j]);
                    }

                     continue; // simplification is stuck; abandon the merge
                }

                 // enforce bounds and error monotonicity
                // note: it is incorrect to use the precise bounds of the merged or simplified mesh, because this may violate monotonicity

                 var split = clusterize(vertices, simplified.ToArray());
                groupb.error += error; // this may overestimate the error, but we are starting from the simplified mesh so this is a little more correct
                // update parent bounds and error for all clusters in the group
                // note that all clusters in the group need to switch simultaneously so they have the same bounds
                for (int j = 0; j < curGroupClusters.Count; ++j)
                {
                    int clusterIndex = curGroupClusters[j];
                    var t = clusters[clusterIndex];
                    t.parent = groupb;
                    clusters[clusterIndex] = t;
                }

                 for (int j = 0; j < split.Count; ++j)
                {
                    var sj = split[j];
                    sj.self = groupb;
                    sj.mip = curMip;
                    split[j] = sj;
                    clusters.Add(sj); // std::move
                    pending.Add(clusters.Count - 1);
                    triangles += sj.indices.Length / 3;
                }
            }

             curMip++;
        }

         if (pending.Count == 1)
        {
            var c = clusters[pending[0]];
            ClusterGroup clusterGroup = new ClusterGroup();
            clusterGroup.Bounds = c.self.center;
            clusterGroup.MaxParentLODError = c.self.error;
            clusterGroup.radius = c.self.radius;
            clusterGroup.Children = new List(1);
            clusterGroup.MipLevel = curMip - 1;
            clusterGroup.Children.Add(pending[0]);
            clusterGroupList.Add(clusterGroup);
        }

         res.maxMipLevel = curMip - 1;
        return res;
    }

 static void lockBoundary(byte[] locks, List 
 int>> groups, List 
 
 clusters,  
 int[] remap)
    {
        // for each remapped vertex, keep track of index of the group it's in (or -2 if it's in multiple groups)
        int[] groupmap = newint[locks.Length];
        for (int i = 0; i < groupmap.Length; ++i)
            groupmap[i] = -1;

         for (int i = 0; i < groups.Count; ++i)
        {
            var c = groups[i];
            for (int j = 0; j < c.Count; ++j)
            {
                var indices = clusters[c[j]].indices;
                for (int k = 0; k < indices.Length; ++k)
                {
                    var v = indices[k];
                    var r = remap[v];

                     if (groupmap[r] == -1 || groupmap[r] == i)
                        groupmap[r] = i;
                    else
                        groupmap[r] = -2;
                }
            }
        }

         // note: we need to consistently lock all vertices with the same position to avoid holes
        for (int i = 0; i < locks.Length; ++i)
        {
            var r = remap[i];
            locks[i] = (byte)((groupmap[r] == -2) ? 1 : 0);
        }
    }

這樣我們得到各級Mip的一系列Clusters。

3. 加速結(jié)構

即使把三角形劃分成Clusters數(shù)量也太多，使用Compute Shader來做并行結(jié)算效率也不高，于是Nanite就使用了BVH來作為ClusterGroup的加速結(jié)構，然后配合Persistent Threads做查找過濾。

Persistent Threads遍歷BVH部分，有興趣可以參考UE5源碼：Shaders\Private\Nanite\NaniteClusterCulling.usf

UE5中也有不使用Persistent Threads的流程，應該說一般默認就是不使用的。

UE5源碼部分

個人認為Persistent Threads方案在GPU遍歷這種BVH結(jié)構有點暴力和重度，所以簡化了一下，把多個Cluster合并成一個剔除單元（Part），先并行對Part做剔除，再對Part里的Cluster去做并行剔除，兩層結(jié)構來加速作為Persistent Threads的一個簡單替代方案。

然后把多個Part組織成Page用于分塊加載。材質(zhì)處理細節(jié)也不同，UE5的材質(zhì)是每個Cluster會記錄MaterialRange，簡單起見這里實現(xiàn)是每個SubMesh會去構建獨立的Clusters。

代碼如下：

                                                            [Serializable]
    publicstruct NaniteCluster
    {
        publicint indiceIndex;
        publicint indiceCount;
        publicfloat selfErrer;
        publicfloat parentErrer;
        public Vector4 selfSphere;
        public Vector4 parentSphere;
        publicint subMeshID;
        publicint vertexOffset;
    };
    [Serializable]
    publicstruct NaniteClusterGroup
    {
        publicint ClusterStart;
        publicint ClusterCount;
        public Vector3 Bounds;
        publicfloat radius;
        public Vector3 LODBounds;
        publicfloat MinLODError;
        publicfloat MaxParentLODError;
        publicint MipLevel;
    }


     [Serializable]
    publicstruct NaniteMeshPart
    {
        publicint ClusterStart;
        publicint ClusterCount;
        public Vector4 selfSphere;
        publicfloat MaxParentLODError;
    }

                                                           public classNaniteSubMesh
    {
        public List 
         Group> clusterGroupList;
        public List 
                    
 clusterList; 
          
        publicint maxMipLevel;
    }
publicclassBuildPart
    {
        public List clusterList;
        publicint mip;
        publicint subMesh;

     }
public static void BuildNaniteMesh(Mesh mesh)
    {
          var vertices = mesh.vertices;
        var normals = mesh.normals;
        var uvs = mesh.uv;

         int subMeshCount = mesh.subMeshCount;
        int totalClusterCount = 0;
        int totalIndexCount = 0;
        List 
 
 subMeshList =  
 new List 
 
 (); 
 
        for (int i = 0; i < subMeshCount; i++)
        {
            var triangles = mesh.GetTriangles(i);
            var subMesh = Nanite(vertices,normals,triangles);
            subMeshList.Add(subMesh);
            totalClusterCount += subMesh.clusterList.Count;
        }

         List 
 
 buildPartsList =  
 new List 
 
 (totalClusterCount); 
 
        int MAX_PART_PERPAGE = 128;
        int MAX_CLUSTER_PERPART = 8;

         for (int subMeshIndex = 0; subMeshIndex < subMeshList.Count; subMeshIndex++)
        {
            var subMesh = subMeshList[subMeshIndex];
            List 
 
 clusters = subMesh.clusterList; 
 
            var groupsList = subMesh.clusterGroupList;
            BuildPart buildPart = null;
            for (int i = 0; i < groupsList.Count; i++)
            {
                var gIndex = i; // sortGroups[i].OldIndex;
                var g = groupsList[gIndex];
                var childs = g.Children;
                for (int c = 0; c < childs.Count; c++)
                {
                    int cIndex = childs[c];
                    int cMip = clusters[cIndex].mip;
                    totalIndexCount += clusters[cIndex].indices.Length;
                    //new Part
                    if (buildPart == null || buildPart.clusterList.Count >= MAX_CLUSTER_PERPART ||
                        buildPart.mip != cMip)
                    {
                        buildPart = new BuildPart();
                        buildPart.clusterList = new List(MAX_CLUSTER_PERPART);
                        buildPart.mip = cMip;
                        buildPart.subMesh = subMeshIndex;
                        buildPartsList.Add(buildPart);
                    }

                     buildPart.clusterList.Add(cIndex);
                }
            }
        }

         int buildPartCount = buildPartsList.Count;
        NaniteMeshPage[] pageArray = new NaniteMeshPage[(buildPartCount+(MAX_PART_PERPAGE-1))/MAX_PART_PERPAGE];//ceil
        List tempIndiceList = new List(totalIndexCount);
        List mipLists = new List(totalClusterCount);
        int partIndex = 0;
        for (int i = 0; i < pageArray.Length; i++)
        {
            //create new page
            var p = ScriptableObject.CreateInstance 
 
 (); 
 
            pageArray[i] = p;
            tempIndiceList.Clear();
            int partCount =  (i == (pageArray.Length -1)) ? (buildPartCount % MAX_PART_PERPAGE) : MAX_PART_PERPAGE;
            p.parts = new NaniteScene.NaniteMeshPart[partCount];
            List 
 
 pageClusters =  
 new List 
 
 (partCount * MAX_CLUSTER_PERPART); 
 
            for (int j = 0; j < partCount; j++)
            {
                var buildPart = buildPartsList[partIndex];
                var buildPartCluster = buildPart.clusterList;
                //create part
                var part = new NaniteScene.NaniteMeshPart();
                part.ClusterStart = pageClusters.Count; //local index
                part.ClusterCount = buildPartCluster.Count;
                int subMeshID = buildPart.subMesh;
                float maxParentErr = 0f;
                var clusters = subMeshList[subMeshID].clusterList;
                for (int c = 0; c < buildPartCluster.Count; c++)
                {
                    var cluster = clusters[buildPartCluster[c]];
                    mipLists.Add(cluster.mip); 
                    //create Cluster
                    NaniteScene.NaniteCluster naniteCluster = new NaniteScene.NaniteCluster();
                    naniteCluster.indiceIndex = tempIndiceList.Count;
                    naniteCluster.indiceCount = cluster.indices.Length;
                    naniteCluster.parentErrer = cluster.parent.error;
                    naniteCluster.parentSphere = new Vector4(cluster.parent.center.x,cluster.parent.center.y,cluster.parent.center.z, cluster.parent.radius);
                    naniteCluster.selfErrer = cluster.self.error;
                    naniteCluster.selfSphere = new Vector4(cluster.self.center.x,cluster.self.center.y,cluster.self.center.z, cluster.self.radius);
                    naniteCluster.subMeshID = subMeshID;
                    tempIndiceList.AddRange(cluster.indices);
                    maxParentErr = Mathf.Max(naniteCluster.parentErrer, maxParentErr);
                    pageClusters.Add(naniteCluster);
                }

                 LODBounds partBounds =  boundsMerge(clusters, buildPartCluster,true);
                part.selfSphere = new Vector4(partBounds.center.x,partBounds.center.y,partBounds.center.z,partBounds.radius);
                part.MaxParentLODError = maxParentErr;
                p.parts[j] = part;
                partIndex++;
            }
            p.clusterArray = pageClusters.ToArray();
            p.indiceArray = tempIndiceList.ToArray();
            p.clusterMip = mipLists.ToArray();
        }

         string fileName = AssetDatabase.GetAssetPath(mesh);
        string extension = Path.GetExtension(fileName);
        fileName = fileName.Replace(extension, "");
        //Build page
        int totalVerts = 0;
        for (int i = 0; i < pageArray.Length; i++)
        {
            var page = pageArray[i];
            var clusterArray = page.clusterArray;
            var indiceArray = page.indiceArray;
            Dictionary indicesMap = new Dictionary();
            List 
 
 tempVerts =  
 new List 
 
 (vertices.Length); 
 
            List 
 
 tempNormals =  
 new List 
 
 (vertices.Length); 
 
            List 
 
 tempUVs =  
 new List 
 
 (vertices.Length); 
 
            List newIndices = new List(totalIndexCount);
            for (int c = 0; c < clusterArray.Length; c++)
            {
                refvar cluster = ref clusterArray[c];
                var indexStart = cluster.indiceIndex;
                var indexEnd = indexStart+cluster.indiceCount;
                for (int index = indexStart; index < indexEnd; index++)
                {
                    int vertIndex = indiceArray[index];
                    int newIndex;
                    if (!indicesMap.TryGetValue(vertIndex,out newIndex))
                    {
                        newIndex = newIndices.Count;
                        indicesMap.Add(vertIndex, newIndex);
                        tempVerts.Add(vertices[vertIndex]);
                        tempNormals.Add(normals[vertIndex]);
                        if (uvs.Length == 0)
                        {
                            tempUVs.Add(Vector2.zero);
                        }
                        else
                        {
                            tempUVs.Add(uvs[vertIndex]);
                        }

                         newIndices.Add(newIndex);
                    }

                     indiceArray[index] = newIndex;
                }
            }

             page.vertexStride = 5;//pos3 + uv2
            page.vertexData = newfloat[tempVerts.Count * page.vertexStride];
            page.vertexCount = tempVerts.Count;
            for (int v = 0; v < tempVerts.Count; v++)
            {
                int vertexIndex = v * page.vertexStride;
                page.vertexData[vertexIndex + 0] = tempVerts[v].x;
                page.vertexData[vertexIndex + 1] = tempVerts[v].y;
                page.vertexData[vertexIndex + 2] = tempVerts[v].z;
                page.vertexData[vertexIndex + 3] = tempUVs[v].x;
                page.vertexData[vertexIndex + 4] = tempUVs[v].y;
            }
            totalVerts +=tempVerts.Count;
            string newPath = fileName + "_p"+i +".asset";
            AssetDatabase.CreateAsset(page, newPath);
        }
        AssetDatabase.Refresh();

         Debug.Log("mesh Vertx:"+vertices.Length +" mesh Nanite:"+ totalVerts + " cluster:"+totalClusterCount + "part:"+ buildPartCount +" page:"+pageArray.Length);
        NaniteMesh naniteMesh = ScriptableObject.CreateInstance 
 
 (); 
 
        {
            naniteMesh.subMeshCount = subMeshCount;
            naniteMesh.pageArray = new NaniteMeshPage[pageArray.Length];
            for (int i = 0; i < pageArray.Length; i++)
            {
                string newPath = fileName + "_p" + i + ".asset";
                naniteMesh.pageArray[i] = AssetDatabase.LoadAssetAtPath 
 
 (newPath); 
 
            }
        }

         var meshBound = mesh.bounds;
        naniteMesh.boundingSphere = meshBound.center;
        naniteMesh.boundingSphere.w = meshBound.extents.magnitude;
        string meshExt = "_mesh.asset";
        AssetDatabase.CreateAsset(naniteMesh, fileName + meshExt);
        AssetDatabase.Refresh();
    }

到這里離線部分基本結(jié)束，可以得到一個Nanite的資源。當然UE5原文還做了很多操作，如BVH、Encode、編碼、壓縮、Page的劃分、頂點屬性優(yōu)化等，個人認為這些都屬于工程細節(jié)。

4. 運行時資源

來到Runtime部分，我們需要把這個Nanite Mesh加載上來，方便起見，這里直接引用一下資源在腳本上，偷懶省略加載部分。

把資源、Object、材質(zhì)信息整合起來，傳到GPU的Buffer中。這里做法很不正式還是偷懶來處理。當然也可以用Compute Shader來更新Page數(shù)據(jù)到GPUBuffer中。

                                                               public static List 
                  
 renderers =  
         new List 
                  
 (); 
         
    privatestatic SceneObject[] gpuObjects = new SceneObject[2048];
    //cluster -> part -> page
    publicstruct SceneObject
    {
        publicint naniteMeshID;
        public Matrix4x4 localToWorldMatrix;
        publicint materialIDOffset;
    }
    publicstruct NaniteRes
    {
        public Vector4 boundingSphere;
        publicint partIndex;
        publicint partCount;
    }

 unsafe static void UpdateRenderList()
    {
         if(renderers.Count == 0)
            return;
        //object update
        if (renderers.Count > gpuObjects.Length)
        {
            gpuObjects = new SceneObject[Mathf.NextPowerOfTwo(renderers.Count)];
        }

         objectCount = 0;
        maxPartCount = 0;
        naniteMeshes.Clear();
        materialList.Clear();
        List materialIndices = new List();
        for (int i = 0; i < renderers.Count; i++)
        {
           var renderer = renderers[i];
           var nMesh = renderer.naniteMesh;
            foreach (var p in nMesh.pageArray)
           {
               maxPartCount += p.parts.Length;
               maxClusterCount += p.clusterArray.Length;
           }

            SceneObject obj = new SceneObject();
           obj.localToWorldMatrix = renderer.transform.localToWorldMatrix;
            //mesh index
           int index = naniteMeshes.IndexOf(nMesh);
           if (index < 0)
           {
               index = naniteMeshes.Count;
               naniteMeshes.Add(nMesh);
           }
           obj.naniteMeshID = index;
           //mat indexs
           obj.materialIDOffset = materialIndices.Count;
           for (int m = 0; m < renderer.materials.Length; m++)
           {
               var mat = renderer.materials[m];
               int matIndex = materialList.IndexOf(mat);
               if (matIndex < 0)
               {
                   matIndex = materialList.Count;
                   materialList.Add(mat);
               }
               materialIndices.Add(matIndex);
           }
           gpuObjects[i] = obj;
           renderer.transformChanged = false;
           objectCount++;
        }

         if(candidateClusterBuffer!=null)
            candidateClusterBuffer.Dispose();
        candidateClusterBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured, maxClusterCount *2, sizeof(int));

         if(visibleClusterBuffer != null)
            visibleClusterBuffer.Dispose();
        visibleClusterBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured,maxClusterCount *2, sizeof(int));

         if (objectsBuffer != null)
            objectsBuffer.Dispose();
        objectsBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured, objectCount, sizeof(SceneObject));
        objectsBuffer.SetData(gpuObjects,0,0,objectCount);

         if(visObjectsBuffer !=null)
            visObjectsBuffer.Dispose();
        visObjectsBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured,objectCount, sizeof(int));

         int vertCount = 0;
        List 
 
 tempClusters =  
 new List 
 
 ( 
 2048);
        List 
 
 tempParts =  
 new List 
 
 ( 
 2048);
        List 
 
 naniteRes =  
 new List 
 
 ( 
 2048);
        List tempIndices = new List(2048 * 100);
        List vertexDataList = new List();
        //load page
        for (int nID = 0; nID < naniteMeshes.Count; nID++)
        {
            NaniteRes res = new NaniteRes();
            var nMesh = naniteMeshes[nID];
            //填充到GPU
            var pages = nMesh.pageArray;
            res.partIndex = tempParts.Count;
            res.partCount = 0;
            res.boundingSphere = nMesh.boundingSphere;
            for (int p = 0; p < pages.Length; p++)
            {
                var page = pages[p];
                var parts = page.parts;
                int vertOffset = vertCount;
                int indicesOffset = tempIndices.Count;
                int clusterOffset = tempClusters.Count;

                 //add all cluster
                var clusters = page.clusterArray;
                for (int c = 0; c < clusters.Length; c++)
                {
                    var cluster = clusters[c];
                    cluster.indiceIndex += indicesOffset;
                    cluster.vertexOffset = vertOffset;
                    tempClusters.Add(cluster);
                }

                 //add all part
                for (int partIndex = 0; partIndex < parts.Length; partIndex++)
                {
                    var part = parts[partIndex];
                    part.ClusterStart += clusterOffset;
                    tempParts.Add(part);
                    res.partCount++;
                }

                 //add page data
                tempIndices.AddRange( page.indiceArray);
                vertexDataList.AddRange(page.vertexData);
                vertCount += page.vertexCount;
            }
            naniteRes.Add(res);
        }

         //TODO GPU Update Buffer
        if (naniteResBuffer != null)
            naniteResBuffer.Dispose();
        naniteResBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured, naniteRes.Count, sizeof(NaniteRes));
        naniteResBuffer.SetData(naniteRes);

         if (partsBuffer != null)
            partsBuffer.Dispose();
        partsBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured,tempParts.Count, sizeof(NaniteMeshPart));
        partsBuffer.SetData(tempParts);

         if (clusterBuffer != null)
            clusterBuffer.Dispose();
        clusterBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured, tempClusters.Count, sizeof(NaniteCluster));
        clusterBuffer.SetData(tempClusters);

         if (indiceseBuffer != null)
            indiceseBuffer.Dispose();
        indiceseBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Raw, tempIndices.Count, sizeof(int));
        indiceseBuffer.SetData(tempIndices);

         if(materialIndexBuffer!=null)
            materialIndexBuffer.Dispose();
        materialIndexBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured,materialIndices.Count, sizeof(int));
        materialIndexBuffer.SetData(materialIndices);

         if(vertexDataBuffer!=null)
            vertexDataBuffer.Dispose();
        vertexDataBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Raw, vertexDataList.Count,sizeof(float));
        vertexDataBuffer.SetData(vertexDataList);
    }

     //input object ID => 
    public unsafe static void UpdateNaniteScene()
    {
        if (renderListDirty)
        {
            UpdateRenderList();
           // UpdateRenderListGPU();
            renderListDirty = false;
        }

        for (int i = 0; i < renderers.Count; i++)
       {
           var renderer = renderers[i];
           if (renderer.transformChanged)
           {
               gpuObjects[i].localToWorldMatrix = renderer.transform.localToWorldMatrix;
               renderer.transformChanged = false;
               transformDirty = true;
           }
       }

        if (objectsBuffer != null && transformDirty)
           objectsBuffer.SetData(gpuObjects, 0, 0, objectCount);
    }

5. 剔除

這時離線時候已經(jīng)把Clusters扁平化到數(shù)組中了，這些Clusters是可以并行進行剔除的，巧妙之處是他記錄了父級的誤差和自己的誤差，當我們傳入誤差系數(shù)時候就可以獨立地判斷自己是否被剔除，而和上下級無關。

先從CPU發(fā)起剔除Compute Shader的Dispatch。這里因為組織數(shù)據(jù)時候就知道了所有Object最大的Parts/Cluster數(shù)量，所以直接用這個數(shù)去Dispatch了。

Objects剔除：

根據(jù)Object找到NaniteMesh的Parts進行Culling：

ClustersCulling：

6. 軟光柵

略。

7. VisibilityBuffer

VBuffer主要用來減少Overdraw，著色器直接輸出InstanceID、ClusterID、材質(zhì)ID。然后用這個VBuffer來計算頂點數(shù)據(jù)來著色。

這個得益于GPUDriven的好處，一個DrawProceduralIndirect就可以繪制所有物體了：

一次DrawProceduralIndirect繪制多個物體

VBuffer存哪些屬性，多少位，都是工程細節(jié)這里就不考究了。

8. 著色

有了VBuffer就需要逐材質(zhì)進行繪制，原文是材質(zhì)ID分Tile組合IndirectDraw畫Quad的思想。

需要注意一下這里VBuffer通過三角重心插值求出的UV是不能直接采樣貼圖的，因為DDXY不對，所以需求重新計算，計算的代碼放下面。并且利用SampleGrad（samplerName, coord2, dpdx, dpdy）來采樣。

                                                           uint MurmurMix(uint Hash)
{
    Hash ^= Hash >> 16;
    Hash *= 0x85ebca6b;
    Hash ^= Hash >> 13;
    Hash *= 0xc2b2ae35;
    Hash ^= Hash >> 16;
    return Hash;
}
float3 IntToColor(uint Index)
{
    uint Hash = MurmurMix(Index);

     float3 Color = float3
    (
        (Hash >> 0) & 255,
        (Hash >> 8) & 255,
        (Hash >> 16) & 255
    );

     return Color * (1.0f / 255.0f);
}

 struct FBarycentrics
{
    float3 Value;
    float3 Value_dx;
    float3 Value_dy;
};

 float2 Lerp(float2 Value0, float2 Value1, float2 Value2, FBarycentrics Barycentrics, out float2 dxy)
{
    float2 Value = Value0 * Barycentrics.Value.x + Value1 * Barycentrics.Value.y + Value2 * Barycentrics.Value.z;
    dxy.x = Value0 * Barycentrics.Value_dx.x + Value1 * Barycentrics.Value_dx.y + Value2 * Barycentrics.Value_dx.z;
    dxy.y = Value0 * Barycentrics.Value_dy.x + Value1 * Barycentrics.Value_dy.y + Value2 * Barycentrics.Value_dy.z;

     return Value;
}

 /** Calculates perspective correct barycentric coordinates and partial derivatives using screen derivatives. */
FBarycentrics CalculateTriangleBarycentrics(float2 PixelClip, float4 PointClip0, float4 PointClip1,
                                            float4 PointClip2, float2 ViewInvSize)
{
    FBarycentrics Barycentrics;
    PixelClip.y = 1 - PixelClip.y;
    PixelClip.xy = PixelClip.xy * 2 - 1;
    const float3 RcpW = rcp(float3(PointClip0.w, PointClip1.w, PointClip2.w));
    const float3 Pos0 = PointClip0.xyz * RcpW.x;
    const float3 Pos1 = PointClip1.xyz * RcpW.y;
    const float3 Pos2 = PointClip2.xyz * RcpW.z;

     const float3 Pos120X = float3(Pos1.x, Pos2.x, Pos0.x);
    co...

特別聲明：以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布，本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.