Crash (native)

Started by AeroShark333, July 10, 2017, 07:29:06 PM

Previous topic - Next topic

AeroShark333

Uhm, I'm not sure if the context is lost...
I don't re-initialize the Framebuffer ever (I just use #resize() whenever the device is rotated).

I already had Config.unloadImmediately set to true, which did not work... Setting it to false did not work either unfortunately.
Whenever I don't use a rendertexture but just the FrameBuffer, RAM usage does not increase on screen rotations.

Is there perhaps a rotate function/method possible for rendertextures/NPOTTextures since screen rotations usually just swap the width and the height? (But these rendertextures would hold the same amount of data eventually for the different orientations)
Probably my solution for now would be to create these two rendertextures at start (one with default orientation and one with rotated orientation) and just switch between these two rendertextures.

---

Another thing about my application:
I don't keep my texturedata for rendertextures nor any other texture in VM.
TextureManager.getMemoryUsage() would indeed show that just 1024 bytes is stored by the VM. (Basically nothing)
However, when I use "Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()", it would show that there's muuuch RAM usage more by the VM, up to 300 megabytes (by jPCT, I assume). What could be using all the RAM?

EgonOlsen

To answer your last question: The textures and other native data is using that RAM.

About the render targets: As long as you don't create a new FrameBuffer, nothing happens for the textures if your rotate the device. However, I've no idea how it's supposed to work that way, because rotating the device should destroy the context (simply because width and height are changing) and so you should need a new instance of the buffer. But anyway...as long as the buffer doesn't change, there's actually no need to unload the render target texture at all. Have you tried what happens if you don't do that at all?

AeroShark333

#17
Quote from: EgonOlsen on February 07, 2018, 10:44:31 AMTo answer your last question: The textures and other native data is using that RAM.
Ah understood.

Quote from: EgonOlsen on February 07, 2018, 10:44:31 AMAbout the render targets: As long as you don't create a new FrameBuffer, nothing happens for the textures if your rotate the device.
Yes, they are fine if the device is rotated

Quote from: EgonOlsen on February 07, 2018, 10:44:31 AMHowever, I've no idea how it's supposed to work that way, because rotating the device should destroy the context (simply because width and height are changing) and so you should need a new instance of the buffer.
Well it is a live wallpaper, and live wallpapers are able to preserve EGL context on pause (or something like that).
https://developer.android.com/reference/android/opengl/GLSurfaceView.html#setPreserveEGLContextOnPause(boolean)
So whenever the device is rotated, onSurfaceChanged(...) is called again, and here I can resize the already existing FrameBuffer to the new width and height.

Quote from: EgonOlsen on February 07, 2018, 10:44:31 AMBut anyway...as long as the buffer doesn't change, there's actually no need to unload the render target texture at all. Have you tried what happens if you don't do that at all?
I tried that... but the render target won't have the right dimensions then and it will give weird results on screen.
Let's say I start the wallpaper in portrait mode.
I would create a 1440 x 2560 FrameBuffer (width x height)
And at the same time I'd create a 1440 x 2560 NPOTTexture for the render texture.
All fine here.
Now if I change the orientation to landscape, I'd resize the FrameBuffer to 2560x1440 (width x height). (Notice that the values are now swapped)
But the rendertexture is still 1440 x 2560, so I remove this texture from the texturemanager and add a new 2560 x 1440 NPOTTexture.

Removing without unloading did not change anything, still filling memory...

I actually have the feeling that unloading textures does not completely work...
Whenever I open a second live wallpaper instance (using preview) it seems that the textures of that instance aren't unloaded whenever the preview instance is 'killed'.
It is like the primary live wallpaper instance is preventing the other textures to get unloaded or something
(Unless all instances are killed it will all be gone)

EgonOlsen

As mentioned, unloading of texture data from the GPU's memory has to happen in the GL rendering thread with a valid context. Otherwise, there nothing to unload it from. If the OS destroys a GL context, the graphics driver should actually free the associated memory. If that doesn't happen...well, I can't do anything about it then.
Have you tried this on some other device witha different GPU? Does it behave the same?

AeroShark333

#19
I tested rendertarget textures with rotating screen on the following devices

My device (ZTE Axon 7 @ Android 7.1.1 Custom ROM):
-> Low VM RAM (discard what I said about 300 MB RAM usage in VM earlier... It's about 30 MB actually (I guess my eyes added a digit... derp))
-> No memoryleak when rotating screen in VM (but there is in native/video memory)
-> However, in developer options -> Active services, the RAM usage keeps rising (300+ MB+ here (could go much higher though (1,8+ GB after many many screen rotations => device will kill other processes and might even reboot here) so I suppose it's not VM but native+video+VM that is being displayed here)

Some old tablet (Difrnce DIT4350 @ Android 4.0.4)
-> Low VM RAM
-> No memory leak with rendertarget textures
-> Crashes after many screen rotations..?

Old tablet 2 (Asus TF101 @ Android 4.0.3)
-> Low VM RAM
-> Crashes after many screen rotations?

Emulator #1 (Android 4.1.1):
-> Low VM RAM
-> No memory leak with rendertarget textures in VM

Emulator #2 (Android 4.2.2):
-> Low VM RAM
-> No memory leak with rendertarget textures in VM

Emulator #3 (Android 7.1.0):
-> Low VM RAM
-> No memory leak with rendertarget textures in VM
-> I don't see the VM+Video+Native memory usage as high as on my own device here in developer options... Unloading seems to work just fine here
-> No memory leak with rendertarget textures in native+video


For the sake of clarity also a test without rendertarget textures on my main device:
-> Low VM RAM
-> No memoryleak with rotating screen
-> Memoryleak with renderers when restarted in native+video RAM (renderer restarts when a setting is changed)


How does it restart:
-> Change a setting
-> (I believe it still uses the same GL context as before)
-> Removing and unloading textures from texturemanager (but it won't go through another draw call to actually unload them I suppose...)
-> Framebuffer is disposed
-> Reference to the renderer is gone now
-> A new Framebuffer is created
Now that I think of it... I think I could re-use the framebuffer... :|
10 minutes later... -> Re-using the same FrameBuffer but the memory leak remains..?

In the end, it seems that textures aren't unloaded untill the whole wallpaper engine is killed.
(restarting doesn't kill the wallpaper engine...)

Starting to think it could be a driver issue on my device... Egh

I'll try to test it with other physical devices

AeroShark333

#20
For now I'll just assume my device is the only device with this unloading issue... Oh well, it's a custom ROM...

Anyway, about the SIGSEGV error that keeps happening at the first world draw call, I think it's because of my custom shaders... Which I don't really understand though...
On most high-end devices there are no issues at all with my custom shaders
And I don't really understand why the default shaders would work just fine (always..) while they look more complicated than some of the custom shaders I'm using. Although, on devices where my custom shaders crash, it does NOT always crash, which completely blows my mind... It seems to happen just randomly basically.

Unlike the default shader, my custom shaders use:
-> pre-processor things with #
-> setting uniform variables in onDraw
-> defined functions within the shader
Though, I don't think these are really the issue...

While Googling this issue I did find some weird OpenGLES shader crash reports by other people that could be solved by work-arounds such as changing the order of operands...???
Anyway, what would be do's and don't for writing a GLES shader? Or maybe: how did you manage to create the 'perfect?' default shader which never seems to crash?
Does jPCT-AE perhaps treat custom shaders differently than default shaders?

Another thing that seems to reduce the probability of the random crash at the first draw call was to use lower polygon models...

Also... What could be an explanation of the randomness of the crashes since it does not always happen, well the positions of the Object3D's are always different but would that make a huge difference...? I'd think not but apart from that nothing else is really different and yet it somehow manages to render/crash
And once the first draw call has successfully completed (assuming all Object3D's are visible) then it won't crash in future world draw calls

Another possible solution: increase the buffersize of the framebuffer even more? I thought Config.blittingMode = 8; did impact the probability of crashing in a positive way (I think it had to do with vertex upload buffer maybe, I don't really know...)
Is there any way for me to determine what is actually causing the crash as in which call in the jPCT-AE jar is causing the SIGSEGV error?
Might help for solving the issue...

AeroShark333

#21
Okay nevermind, it crashes with default shaders too :| It just seems more likely with my custom shaders though...
But increasing the buffer size to 1800 did help a little, is it possible to increase it even more?
Why is the default 600 anyway? And what units are these 1800? Bytes I assume?

Another thing that helped a lot is reducing polygon count per mesh but yeah... that'd reduce quality...

LOGS:
Working: https://pastebin.com/n08EE8VR
Not working: https://pastebin.com/AuQF8NTm

EgonOlsen

Which buffer size do you mean exactly? And which device are you using to test this?

AeroShark333

Quote from: EgonOlsen on March 05, 2018, 10:01:12 AMWhich buffer size do you mean exactly?
03-05 10:19:44.749: I/jPCT-AE(2711): Blitting buffer size: 600
^ this one

I tried to test with the default Config.blittingMode and with Config.blittingMode = 8;
I used higher polygon count models for this test.
Compat mode => Config.blittingMode = 8;
No compat mode => Default Config.blittingMode
compat mode:
runs: 10
succes: 5
crashes: 5

no compat mode:
runs: 10
succes: 1
crashes: 9

compat mode:
runs: 10
succes: 4
crashes: 6

no compat mode:
runs: 10
succes: 0
crashes: 10

compat mode:
runs: 10
succes: 5
crashes: 5

no compat mode:
runs: 10
succes: 0
crashes: 10

As you can see it drops the chance of crashing from ~96% to ~53%.
Whether this is just a coincidence, I can't tell but I tested this multiple times (swapping between the two config values after every 10 runs).

Quote from: EgonOlsen on March 05, 2018, 10:01:12 AMAnd which device are you using to test this?
I'm currently using a Genymotion emulator (With a Nexus 5, Android 5.0.1 build) to reproduce these crashes. (I can reproduce crash on older physical devices, just to make sure... Plus many people reported this SIGSEGV crash through the Google Play Store and I think I can assume they are not using an emulator...  ??? )

EgonOlsen

No, they aren't using an emulator for sure, but...from my own experience, crashes happen mostly on MALI-based GPUs. Can you confirm this based on the Google Play stats?

AeroShark333

#25
All devices with SIGSEGV's (for https://play.google.com/store/apps/details?id=com.aeroshark333.artofearthify):
Samsung Galaxy J1 Ace => Mali-400MP2 or Vivante GC7000 UL - J110L
Motorola Moto E4 (2nd Gen) => Mali-T720
Xiaomi Mi A1 => Adreno 506
Samsung Galacy Note 3 => Adreno 330 - N9005, N9002 or Mali-T628 MP6 - N9000
Lenovo K5 => Adreno 405 or Mali-T860MP2
LGE LG Aristo => Adreno 308

Another app of mine with SIGSEGV's (for https://play.google.com/store/apps/details?id=com.aeroshark333.skinviewer):
Huawei MediaPad => Adreno 220
Samsung Galaxy S3 Neo => Adreno 305
Samsung Galaxy Note 2 => Mali-400MP4
LGE L20 => Mali-400
ZTE Lever Z936L => Adreno 306
Huawei Mate 9 => Mali-G71 MP8
HTC U11+ =>Adreno 540
Motorola Moto X4 => Adreno 508
General Mobile GM6 => Mali-T720 MP2
OnePlus 3T => Adreno 530
Xiaomi Mi A1 => Adreno 506
Samsung Galaxy Tab E 8.0 => Adreno 306
Motorola Moto C Plus => Mali-T720MP2
Samsung Galaxy Tab A 10.1 (2016) => Mali-T830 MP2
Motorola Moto C =>    Mali-T720MP2
Samsung Galaxy Tab 3 Lite 7.0 => Vivante GC1000 (according to specs website but logs mention "libGLES_mali.so"...)
Samsung Galaxy On7 => Adreno 306
LGE Nexus 5X => Adreno 418
Huawei P8 Lite => Mali-T830MP2

So it's mostly (or only?) Adreno/Mali based GPU's for me

I found this while Googling around: https://stackoverflow.com/questions/30825386/android-opengl-fatal-signal-11-sigsegv-code-2
I tried the same code on the Nexus 5 emulator and I got some similar results.
=> size = 10000; would crash
=> size = 3000; would work
=> size = 5000; would crash
=> size = 3500; would crash sometimes?
When it works, I would show nothing but it would keep 'drawing' and not crashing
Adding floatBuffer.rewind(); after giving it values would fix the issue for any size (and it'd actually show something when drawing.. lol).
I'm not sure if this could be helpful but I sure found it interesting.

EgonOlsen

That looks more like the normal distribution of GPUs in the Android market than anything else. The stackoverflow post doesn't help either. Of course, I'm rewinding the buffers. Otherwise, it wouldn't be able to render anything in the first place.

Personally, I just accepted a certain "crash rate". It just happens on some devices, may it be caused by driver errors or by some custom rom that has some great "optimization" in it that just doesn't work.

Any ideas about which "in the wild" crash rate we are taking here? 50% 1% 0.1%...any clues?

And because fiddling around with the blitting config seems to changes things for you: Are you actually blitting stuff? What happens if you don't?

AeroShark333

#27
Quote from: EgonOlsen on March 06, 2018, 04:41:06 PMAny ideas about which "in the wild" crash rate we are taking here? 50% 1% 0.1%...any clues?
Well it depends on the context..? Higher polygon count seems to make it more likely to crash. While lower polygon models don't crash at all...

Quote from: EgonOlsen on March 06, 2018, 04:41:06 PMAnd because fiddling around with the blitting config seems to changes things for you: Are you actually blitting stuff? What happens if you don't?
Yes, I do blit things before the first world.draw() call.
=> Texture blits (some of these textures are used for the Object3D's, so the textures get uploaded to the GPU and removed from VM heap memory)
=> Loading screen (probably 50+ blits of a 2x2 texture with variable greyscale color) per frame

I tried to comment out the texture blits (so they'd stay in VM heap memory) => it would still crash
But when I removed all blits (texture+loading blits) before the first draw call, it would work fine.

Also, only have texture blits (without loading screen blits) would still be able to crash but not so likely.

Interesting results I guess...

I once again tested the crashing likelyness (with higher polygon models and with blitting before the first world.draw() call)
=> Config.blittingMode = 8
Runs: 25
Crashes: 17
Result: More than the 50% of last time...
=> Default Config.blittingMode
Runs: 25
Crashes 25
Result: About the same result as last time (100% vs. 96%)

So could it be that blitting anything before having all Object3D data loaded could cause this SIGSEGV issue?

PS: I actually do blit (2D background behind 3D world) before calling the first world.draw() in the other app too actually...

EgonOlsen

Blitting is just like rendering an object except that the data of that "blitting object" is dynamic and changes every frame. That's not a problem per se, animate objects do exactly the same. A lot of apps using jPCT-AE are doing it all the time (including mine) without any problems. However, I'm well aware that it might cause trouble, which is why there are these config settings for it (which are just some shots in the dark as well). The actual problem is, that I've no idea why this happens and when. I've checked the code at least a dozen times because of this and it's just fine. It's the same code that desktop jPCT uses as well and it doesn't have this problem. My current app is blitting stuff before doing anything else and that is fine as well.

Your apps...are these all wallpapers or standard apps?

AeroShark333

Quote from: EgonOlsen on March 07, 2018, 07:42:51 AMA lot of apps using jPCT-AE are doing it all the time (including mine) without any problems. However, I'm well aware that it might cause trouble, which is why there are these config settings for it (which are just some shots in the dark as well). The actual problem is, that I've no idea why this happens and when. I've checked the code at least a dozen times because of this and it's just fine.

Well yeah, I'm unable to reproduce this SIGSEGV error on most of my devices too, it'd work just fine with the way I render everything now.
And I'm not sure if this blitting before the uploading the Object3D data is actually the issue for all the SIGSEGV reports I got, maybe it would only solve the problem for the emulator.

Quote from: EgonOlsen on March 07, 2018, 07:42:51 AMYour apps...are these all wallpapers or standard apps?
Mixed, however I can run the renderer class in a regular Android Activity too but that'd not really make much of a difference..?
The app I'm currently working on is the wallpaper app, and the other app is a 'standard app' I'd say.

My best workaround/solution for now would be to make sure the Object3D's I'm using are uploaded to the GPU before blitting anything.
Which I'll try to implement soon so I can see if people would still get SIGSEGV's after this.
Though, there's one problem... How can I set textures to my Object3D's after building and compiling them?
If I remember correctly it'd have a delay or something before the textures get visibly applied or something if you do it this way.