How to do the profiling in jPCT-AE?

Started by kiffa, August 12, 2013, 10:13:27 AM

Previous topic - Next topic

kiffa

My purpose is to do some performance profiling, and there are another questions:

1, Is there a method to print the exact numbers of draw calls per frame?

2, How to print the total numbers of triangles(vertices) in world?

3, How to print the total  numbers of triangles(vertices) which was exactly  rendering in world?(means which was not culled/in display list/in visible list ...).

4, How to print the consuming time of gpu and cpu per frame?

EgonOlsen


  • Not build in. You could get all the objects from World, attach an IRenderHook and count the calls to repeatRendering(). That would be the number draw calls.
  • You would have to add all triangles of all objects in the world.
  • You can't. The engine does some gross culling, but the gpu does additional culling and there's no way to detect culled triangles.
  • You could measure the time it takes to render a frame, but you can't split it into cpu/gpu. Both can work in parallel and there's no way to detect how much time the gpu has consumed.

kiffa

#2
Thanks, I did some work for this, see the pic below(the left is memory profiling, the right is performance profiling):



And I visited the statistics view of Unity3D(http://docs.unity3d.com/Documentation/Manual/RenderingStatistics.html, I want to print some of these but I have no idea how to do:

1,  VRAM usage:  Approximate bounds of current video memory (VRAM) usage. This also shows how much video memory your graphics card has.

2,  VBO total:   The number of unique meshes (Vertex Buffers Objects or VBOs) that are uploaded to the graphics card. Each different model will cause a new VBO to be created. In some cases scaled objects will cause additional VBOs to be created. In the case of a static batching, several different objects can potentially share the same VBO.

3,Onscreen objects(triangles\vertices).

And another questions:

4, For cloned Object3D(reuse the mesh), will object3d.getMesh().getTriangleCount() return the same number as the origin object3d?

5, Could I get the consuming time of each draw call(or sum)?

6, I have read the default shaders, and I found there is a "precision mediump float;" in defaultFragmentShader.src but a "precision highp float;" in defaultFragmentShaderTex0Amb.src. Could you explain the reason, and how much difference of performance between them will be?

kiffa

#3
7, And there are no lights in my game, so I think I needn't to upload the normals as VBO. Could I do this? (I used custom shaders which have no "attribute normal")

8, The apple-doc say:
  Avoid using the OpenGL ES GL_FIXED data type. It requires the same amount of memory as GL_FLOAT, but provides a smaller range of values. All iOS devices support hardware floating-point units, so floating point values can be processed more quickly.(http://developer.apple.com/library/ios/documentation/3ddrawing/conceptual/opengles_programmingguide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html

  I want to know how about jPCT-AE and Android?


EgonOlsen

I'm on low bandwidth and only mobile ATM. I'll try to give a detailed answer later....

EgonOlsen

1. No idea how to get that value. Memory is shared anyway, the GPU has no dedicated vram.

2. I'm not tracking that number. I might add it, but it will some time because of the moving and stuff...

3. iirc, there's a wasVisible-Methode in Object3D. You base your own counter on that, but it doesn't take overdraw into account, i.e. an object hidden by wall will still return true (which is actually correct, because it cause a draw call anywa y).

4. Should do that, Yes.

5. Roughly. You could measure the time between the calls of beforeRendering and afterRendering in an IRenderHook implementation. It will count some additional overhead, but it shouldn't really matter.

6. I'm using the lowest precision that is sufficient. I've never really seen a difference in performance though and gpus are allowed to ignore the highp anyway.

7. You can't. Normals will always be created and uploaded. It's a very special case where you don't use them. It's not worth it to optimize for that imho.

8. On Android, older versions have performance issues with float buffers, which is why jPCT uses fixed point for those buffers where it doesn't matter. If you don't want that, you can revert to floats...but i forgot the methods name...you should be able to find it yourself... ;)

Yerst

@kiffa:
Could you tell me how you got your memory profiling values?
I'm only familiar with this 3:

android.os.Debug.getNativeHeapSize()
android.os.Debug.getNativeHeapFreeSize()
android.os.Debug.getNativeHeapAllocatedSize()

How do you get values like the app memory limit?

kiffa

Quote from: Yerst on August 17, 2013, 12:56:36 PM
@kiffa:
Could you tell me how you got your memory profiling values?
How do you get values like the app memory limit?

1, For memory limit, there are two ways to get it's value(I think they are the same in Android but not sure):
a,
mActivityManager = (ActivityManager) context.getSystemService(Context.ACTIVITY_SERVICE);
memProcessLimit = mActivityManager.getMemoryClass();

b,
Runtime.getRuntime().maxMemory();

2, For VM memory:
Runtime.getRuntime().totalMemory();
Runtime.getRuntime().freeMemory();
vm_used_mem = totla_mem - free_mem;

3, For app used memory:
I simply add up: app_uesd_memory = vm_used_mem + native_used_mem;