State of OpenGL ES2.0 support

Started by EgonOlsen, May 13, 2011, 11:17:08 PM

Previous topic - Next topic

EgonOlsen

I scrapped portal rendering for the Android version of jPCT. It was great for doing software rendering where fillrate was very limited but it doesn't play very nice on GPUs. In addition, at least the tile-based renderers like the PowerVR and Adreno GPUs that are used in most mobile device (i.e. everything not Tegra) won't benefit from it anyway, because the way in which they work already reduces overdraw of opaque polygons to 0.

EgonOlsen

#31
Did some heavy debugging to make the default shaders work on Adreno GPUs (like the Nexus One uses it). The shader compiler of these things doesn't seem to like any kind of loop inside the shader as it creates either bogus results or kills the device. Actually, it doesn't seem to like anything inside the shader, because it's soooo slow in executing the default shaders. Performance of the 2.0 pipeline is around 1/10th of the performance of the 1.1 pipeline. Many thanks to raft for his patience....he had to execute billions of test cases for me to make this work. Looking at the performance, i'm not sure that it was worth it...

Edit: Managed to improve this...see below.

EgonOlsen

Current performance:

Quote
06-08 21:34:28.437: INFO/jPCT-AE(907): Double dragon: 12.77 fps
06-08 21:34:28.437: INFO/jPCT-AE(907): Flower power: 18.92 fps
06-08 21:34:28.437: INFO/jPCT-AE(907): Ninjas' garden: 17.70 fps
06-08 21:34:28.437: INFO/jPCT-AE(907): Emperor's new clothes: 42.39 fps
06-08 21:34:28.437: INFO/jPCT-AE(907): Magic island: 20.74 fps
06-08 21:34:28.437: INFO/jPCT-AE(907): TOTAL SCORE: 19483

I've splitted the fragment shader into three separate ones now to avoid dynamic branching in the shader in some cases.

EgonOlsen

#33
More separation (of the vertex shaders this time) and i decided to use the unrolled loop that i was using for Adreno gpus only for all...it's ugly looking (code wise), but it's faster...result:

Nexus S

2.0
Quote
06-08 22:43:40.511: INFO/jPCT-AE(1643): Double dragon: 40.11 fps
06-08 22:43:40.511: INFO/jPCT-AE(1643): Flower power: 19.39 fps
06-08 22:43:40.511: INFO/jPCT-AE(1643): Ninjas' garden: 17.35 fps
06-08 22:43:40.511: INFO/jPCT-AE(1643): Emperor's new clothes: 40.87 fps
06-08 22:43:40.511: INFO/jPCT-AE(1643): Magic island: 26.95 fps
06-08 22:43:40.511: INFO/jPCT-AE(1643): TOTAL SCORE: 25051

edit:

1.1 again for comparison
Quote
05-25 23:29:24.995: INFO/jPCT-AE(6997): Double dragon: 30.55 fps
05-25 23:29:24.995: INFO/jPCT-AE(6997): Flower power: 22.86 fps
05-25 23:29:24.995: INFO/jPCT-AE(6997): Ninjas' garden: 16.14 fps
05-25 23:29:24.995: INFO/jPCT-AE(6997): Emperor's new clothes: 43.58 fps
05-25 23:29:24.995: INFO/jPCT-AE(6997): Magic island: 37.33 fps
05-25 23:29:24.995: INFO/jPCT-AE(6997): TOTAL SCORE: 26053


Nexus One

2.0
Quote
I/jPCT-AE ( 2928): Double dragon: 4.43 fps
I/jPCT-AE ( 2928): Flower power: 8.54 fps
I/jPCT-AE ( 2928): Ninjas' garden: 2.40 fps
I/jPCT-AE ( 2928): Emperor's new clothes: 33.19 fps
I/jPCT-AE ( 2928): Magic island: 6.06 fps
I/jPCT-AE ( 2928): TOTAL SCORE: 9457

1.1
Quote
I/jPCT-AE (16793): Double dragon: 11.86 fps
I/jPCT-AE (16793): Flower power: 15.86 fps
I/jPCT-AE (16793): Ninjas' garden: 2.38 fps
I/jPCT-AE (16793): Emperor's new clothes: 36.23 fps
I/jPCT-AE (16793): Magic island: 15.63 fps
I/jPCT-AE (16793): TOTAL SCORE: 14192




EgonOlsen

#34
...and please don't ask why the double dragon test on Nexus S is actually faster than under 1.1....i've no idea....

Thomas.


EgonOlsen

Anybody listening with too much time and a NVidia Tegra based device?

EgonOlsen

BLARGH...PowerVR's shader compiler isn't much better than Qualcomms...on my phone, i get bogus results in one shader once i store a value in a temporary variable. If i access it directly, everything is fine. In another, really simple shader, i had to add two totally pointless lines with some assignments of vectors to variables that i never use...or otherwise, the resulting shader doesn't render anything...i'm having fun...!

EgonOlsen

Performance after the last minor shader tweaks:

Quote
06-09 23:36:07.476: INFO/jPCT-AE(6662): Double dragon: 41.46 fps
06-09 23:36:07.476: INFO/jPCT-AE(6662): Flower power: 20.18 fps
06-09 23:36:07.476: INFO/jPCT-AE(6662): Ninjas' garden: 17.07 fps
06-09 23:36:07.476: INFO/jPCT-AE(6662): Emperor's new clothes: 43.15 fps
06-09 23:36:07.476: INFO/jPCT-AE(6662): Magic island: 27.40 fps
06-09 23:36:07.476: INFO/jPCT-AE(6662): TOTAL SCORE: 25845

I thought about making the shader generation totally dynamic, i.e. the engine would write the shader source code by itself based on the demands of the renderer...but the bugs in the shader compilers prevent me from doing this.

EgonOlsen

Some more shader tweakage...

Nexus S:
Quote
INFO/jPCT-AE(16223): Double dragon: 42.14 fps
INFO/jPCT-AE(16223): Flower power: 20.64 fps
INFO/jPCT-AE(16223): Ninjas' garden: 17.42 fps
INFO/jPCT-AE(16223): Emperor's new clothes: 41.82 fps
INFO/jPCT-AE(16223): Magic island: 31.43 fps
INFO/jPCT-AE(16223): TOTAL SCORE: 26571

Nexus One:
Quote
I/jPCT-AE ( 3817): Double dragon: 13.79 fps
I/jPCT-AE ( 3817): Flower power: 10.84 fps
I/jPCT-AE ( 3817): Ninjas' garden: 2.38 fps
I/jPCT-AE ( 3817): Emperor's new clothes: 35.26 fps
I/jPCT-AE ( 3817): Magic island: 10.27 fps
I/jPCT-AE ( 3817): TOTAL SCORE: 12561

Thomas.

#40
oh, some tests are even faster than 1.1, good work :) so is possible for performance perpixel lighting (spot light), bump maps? :)

EgonOlsen

#41
It depends. The current implementation doesn't do much in the fragment shader, which is the most expensive part (because it's executed for each pixel). If you put a dot product there for some per-pixel stuff, performance will suffer and there's nothing i do can about that. My guess is, that it'll be ok to use it for some objects but to light the whole scene per pixel, current hardware is too weak IMHO.

EgonOlsen



Ported my simple normal mapping shader to ES 2.0...finally, i saw a sense in the precision modifiers, because with lowp, the result was totally screwed up!?

We are getting close to a first alpha release of this version...

EgonOlsen

Found and fixed a bug when combining blitting with custom shaders only. A simple example:



The funny thing with this test case is, that performance hovers around 40fps unless you start to use the touch screen...then, it goes up to 56 fps (which is the device's limit). Seems like as if the really small load that this test puts onto the cpu makes the device reducing its clock speed or something...

EgonOlsen

Two light sources struggling for power:



This shows the same strange fps loss as the former test case while the cpu is more or less idle...