Higher poly count without lower FPS?

Post by **DaEngineer** » Fri Apr 29, 2011 11:37 pm

New games have amazing high polycounts - hundreds of thousands of triangles - while remaining playable on mainstream PC's. The Unreal Engine 3 is a good example for this. My question - and I have absolutely no idea of the technical backgrounds, that's why I'm asking - why do these games look better than Quake 3, have by far more polys and run with the same amount of FPS?

Why is a Quake 3 map with 200k triangles unplayable while modern games laugh about these numbers? I'm really interested in the WHY and if this can be changed and adjusted to modern hardware? Being limited to a maximum of around 40k polys can be a pain in the ass when you want to add tons of details and notice that you've reached the limit already.

Post by **obsidian** » Sat Apr 30, 2011 3:52 am

This is a huge topic so I'll just scratch the surface a bit.

Modern day engines are designed to take advantage of modern day video cards which are designed with features in mind that weren't even imagined yet when Q3 was in production. File batching, increased bandwidth and multiple pipelines are things that new engines can take advantage of that old ones can't.

Q3's BSP and portal system sets the rudimentary standard for most engine systems, but it is still a bit outdated. New ones can take better advantage of newer portal systems and in many cases raytracing visibility. Basically new engines can better determine what should and shouldn't be loaded into the GPU for best performance.

Other features like model instancing, displacement maps, tessellation can dynamically generate or appear to generate more detail depending on conditions. It makes Q3's model and patch LoD's look prehistoric.

There's probably loads more stuff, I'm sure one of the programming types who browse here can contribute some more details.

Post by **^misantropia^** » Sat Apr 30, 2011 10:11 pm

Programming type here. obsidian is spot on. The major pain point is GPU bandwidth. Modern engines upload their stuff to the GPU once and then try very hard to keep it there. Q3 on the other hand uploads the scene again and again, once for each frame.

"Well, fix it then," you say. Unfortunately, that again-and-again approach is intrinsic to the Q3 engine. Fixing it means rewriting it.

Post by **VolumetricSteve** » Mon May 02, 2011 6:29 pm

I wonder how fast quake 3 COULD run if it were rewritten to take advantage of all of these things. However, you'd probably do just as well to export all of the quake 3 maps to ASE objects and then convert them into whatever Unreal Engine 3 uses(if you hate end-user-license-agreements and fair use). Get your high-performance giggle on.

A different part of the puzzle is that Quake 3 was written along side of some old version of OpenGL. Your modern day super-powered sick mad props dumb bomb diggity dual 590gtx hot glued to a HD6990 certainly supports, and accelerates the bajezus out of those older OpenGL commands. However, as Misantropia and Obsidian correctly noted, if the newer commands aren't there in the engine, your GTX 590 that's been cleverly nailed to a pair of HD6990s won't force those old commands to do new and interesting things. Those older OpenGL commands only get you so far, even if you can overclock your 18 GPUs to 16,000THz.

The same thing is true for Desktop applications as well as games. Your CPU has extensions it supports, 3D Now, MMX, SSE, AltiVec, etc. (ok maybe your cpu doesn't support AltiVec) Or maybe you're sitting on a mountain of Cell-Broadband engine APUs; Your software will only take advantage of what it's programmed to use unless your hardware is frighteningly crafty. (there are exceptions to the rule I won't go over here) Microsoft Excel from...AGES ago...essentially required a math co-processor (there was a time when your CPU was nothing more but a series of logic units and I/O units) so you'd have this extra chip you bought that allowed it to do math. real math. This made many programs happy. Today, your CPU has math units built right in, and you can run ancient Excel to your hearts content and run spreadsheets until your eyes fall out. If you were to run that same ancient version of Excel on a modern system...it'd work, just as Quake 3 does, and it'd be very, very fast, but once you started to hit hard-coded maximums, you'd see you have a reason to upgrade. You'd find that newer versions of Excel which can take use of SSE, MMX, and all that other stuff run much more smoothly because they're built to use that much more of your hardware.

Post by **nbohr1more** » Wed May 04, 2011 2:16 am

Any comparison of older engines to UE3 must take into consideration that UE3 has two at least attributes that provide smoke 'n mirrors to make it seem as if it renders more than it does:

1) Deferred Shading

This is especially painful when comparing Doom 3 to UE3. Waiting until visibility and overdraw are elimitated to apply shaders is a HUGE savings on the GPU side.

2) Virtual Displacement Mapping

Otherwise known as "Parallax Mapping" this shader technigue gives traditional bump\normal maps a more volumetric look. Thus the apparent visual density in UE3 scenes is much higher than the actual polygons in use.

When you add those to the fact that Unreal engines have always been good with LOD strategies (for example: "detail textures") you are looking at scenes that fake a huge leap in poly-count in very clever ways.

Post by **o'dium** » Wed May 04, 2011 11:18 am

Since when was parallax mapping a cheap effect? While its not as expensive as rendering millions of polygons it’s totally circumstantial. It depends on the engines bottleneck on what’s being drawn. Parallax Occlusion Mapping is some system intensive shit, it doesn’t have a minimal impact like bump mapping, and its even worse when it’s self shadowed. That’s why to date, no actual Unreal Engine games really use it, but its there for use (I think at most it may be used on one or two surfaces, but not everywhere).

The actually reason why engines of today can render things much faster is a simple combination of A) Faster/better hardware and B) Faster/Better shaders. Over the years, things that used to be, lets say, hard coded, are now done with programmable shaders in such a way that makes them pretty low level on the GPU side of things. An example of this is bloom, it used to be slow as shit to render a bloom pass. But these days we can do it pretty much as fast as not doing it at all.

While Unreal Engine certainly does do a lot of “smoke and mirrors”, don’t think that other engines don’t do the same, even Doom 3. An example of this is OverDose. Our shadow map resolutions are set up (as default) so that as you move away from the shadow, the shadow map drops in resolution… But you can’t tell, because the resolution matches your current render resolution perfectly. In other words, its not wasting any overdraw on what can’t be seen anyway (which of course can be changed lower and include LOD levels/stages). Or our foliage system, which uses alpha testing and alpha to coverage to fade out foliage instances over X distance, but because each layer is different we can step fade the LOD levels so that you can’t even tell the foliage is missing.

EVERY engine has these so called smoke and mirrors effects. It’s a way of life. Nobody renders an entire scene brute force the correct way… Frame rates would be in the floor.

Post by **VolumetricSteve** » Wed May 04, 2011 1:46 pm

Not to derail, but is there a study that shows where parallax mapping is beneficial over full-polygon scene rendering? Self-Shadowed or not, I've always thought parallax mapping was so taxing, it seems like the line of positive returns would come in when you manage to squeeze the likeness of 500K polys into 5,000 polys....which at that point, on modern hardware.....why not just render the damn polygons?

Post by **o'dium** » Wed May 04, 2011 2:48 pm

Pretty simple really, the answer is "It depends". Sometimes an actual polygon is much cheaper to render than using POM, but then it depends like I said on the engines bottle neck. OD for example again, doesn't exact crap out with polygons, we can chuck loads at it with no real impact. But unreal engine, because of the way its lit and the way its rendered, needs to have a lot less polygon use. However, OD can run slower if we chuck more higher resolution media at it, while Unreal can chug along using as many 2048x2048 maps as you really want, because the engine is more streamlined that way.

Polygons will ALWAYS be better, because sometimes you just can’t fake things cheaply. Normal mapping is awesome because it works amazingly well at faking shading and simple geo, but there will be times when no matter how high res or perfect the render to texture is, it won't hide the fact its still a low poly object. Shadows, intersects, collision etc, they all depend on the mesh below.

So it really is a case of "the answer depends on what the engine does" because there’s never a right or wrong way to do this stuff, just more a case of "what works better in the current situation".

Post by **obsidian** » Wed May 04, 2011 3:19 pm

Performance for parallax occlusion - like everything else - depends on a number of factors, all of which need to be balanced and considered for the task at hand. Scaling the displacement map resolution, intensity values and self-shadowing can affect performance drastically, so modern engine users will have to correctly compensate each of these factors for their scene.

While relatively expensive, parallax occlusion is typically (and I say this very loosely) better than drawing the same counterpart in polygons. Painting or generating a displacement map is also a lot less work for a developer than modelling every little bump. Plus, you can toggle it off on lower systems which isn't as easily done with models.

On subject of bandwidth, most engines will try to do what Misantropia said, load as much into the GPU and keep it there. Stuff that has to be moved in and out are batched in chunks to save as much processing bandwidth as possible. Other engines are starting to stream certain data to the GPU and I think we'll see interesting things from this (megatextures for instance). idTech 6 is experimenting with streaming geometry on a voxel octree.

Post by **nbohr1more** » Wed May 04, 2011 4:04 pm

Standard Parallax mapping is definitely less expensive than polygons. Parallax Occlusion Mapping (a more advanced version) has been shown to be more expensive than polygons in Tessellation demos... Where UE3's method falls between those two has never been fully clarified but most folks are suggesting the former rather than the latter...

Post by **VolumetricSteve** » Wed May 04, 2011 4:59 pm

ok this might really derail things...but in light of all this, it seems like we have a choice between tasks, all of which can be very complex, and at different times, some can be better or worse than the other choices depending on a number of variables. Parallax mapping in particular, unless I'm mistaken, is growing rapidly in popularity as well as complexity, especially with self-shadowing. I assume there's a specific type of math that parallax mapping does a lot of to achieve its effect. My first thought was "like we had texture units and geometery pipelines, why not have specialized parallax accelerators? But they wouldn't even have to go that far I think...couldn't they just put more of an emphasis within the architecture of the GPU to aid in the handling of parallax. I'm just throwing out shots in the dark, but if Parallax does a lot of 128-bit calculations in tandem with other commands, they could add another 128-bit pipe. I guess what I'm suggesting is...when designing GPUs, why not build them around the needs of what the developers are trying to accomplish rather than the developers trying to work around the hardware? I guess it's kinda "chicken and the egg"...but it seems like there's a lot to be gained of the attempt to accelerate popular, very complicated functions.

Post by **obsidian** » Wed May 04, 2011 6:30 pm

That's like saying you want to build a very special toolbox just for your hammer when what you really need is just a bigger toolbox for all your tools or to replace the old hammer with a brand new nailgun.

Parallax occlusion is just one tool of many in a game developer's toolbox, it's not even that important of one and is something that is just powered better with a faster general GPU. It won't benefit from it's own pipeline and it's silly to build one when there are much more important processes that can benefit from discreet hardware architecture.

Post by **nbohr1more** » Wed May 04, 2011 6:48 pm

POM (Parallax Occlusion Mapping) is old and busted...

Quadtree displacement mapping is the new hotness:

http://www.drobot.org/pub/M_Drobot_Prog ... _short.pdf

Post by **VolumetricSteve** » Wed May 04, 2011 6:59 pm

obsidian wrote:That's like saying you want to build a very special toolbox just for your hammer when what you really need is just a bigger toolbox for all your tools or to replace the old hammer with a brand new nailgun.

Parallax occlusion is just one tool of many in a game developer's toolbox, it's not even that important of one and is something that is just powered better with a faster general GPU. It won't benefit from it's own pipeline and it's silly to build one when there are much more important processes that can benefit from discreet hardware architecture.

I know it'd be silly to do for just parallax mapping, I meant to comment on the way GPUs seem to be designed now in general, which seems to be adding more cores and increasing frequencies. It seems like there ought be someway, in hardware, to enhance the handling of various functions, parallax, quadtree, or otherwise. If the best way to do that IS by just adding more cores...that's cool I guess, it just sounds like an inefficient way to move things forward.

Higher poly count without lower FPS?

Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?

Re: Higher poly count without lower FPS?