Q3Map2 performance question

Post by **VolumetricSteve** » Mon Mar 07, 2011 2:42 pm

I've been trying to think up some different ways to accelerate q3map2 compiles, or at least different parts of the process.
Performance per price, the Apple Quad G5 Mac Pros have come down in price a lot...like 400 bucks on ebay and it seems that in double precision floating point math, they're STILL faster than my overclocked 6-core AMD, faster by about 10 Gflops. Has anyone done a lot of q3map2 compiling on G5s? or any PPC chips for that matter? I'm curious to see how they tend to compare to X86 chips of the same MHz/GHz. My current thinking is...Silicone_Milk's project sounds awesome, but he's really busy working on other things(starting a company?) so it'll be a while before his OpenCL version comes out, in the mean time, dropping 400 bucks into a quad G5 (or best other suggestion) is still cheaper and more immediately effective than dropping the same 400 bucks into a GPU and waiting for the OpenCL version.

Ideas?

Post by **^misantropia^** » Mon Mar 07, 2011 4:11 pm

Compile it on a EC2 Quadruple Extra Large instance. And post the benchmarks. =)

http://aws.amazon.com/ec2/#pricing

Post by **obsidian** » Mon Mar 07, 2011 4:21 pm

I suspect that even with a small amount of map optimization you can shave off more time off of a compile than you could possibly do with any specified amount of money and hardware. Just looking at your maps, there's nothing there that should take as long as it is. Use some caulk, convert ~95% of your brushes to detail and I'm sure you can compile in minutes.

Post by **VolumetricSteve** » Mon Mar 07, 2011 5:04 pm

I optimize a decent amount, I've gotten particularly handy at dealing with VIS.(VIS also hardly seems to matter anymore with video cards that can push a gajillion triangles a second anyway) My biggest issue, as is with most people, the Light phase takes a while, and while ago I made some posts about increasing the lightmap size with external lightmaps - I'm gounna be experimenting with that more soon. But I imagine when I work out how to orchestrate that, my Light compile times will go through the roof. I know about all the tricks for per-surface lightmap sizes and those sorts of precise optimizations, I'm just doing things my crazy way and trying to get a consistently high-res lightmap across the whole map. At least for test compiles so I could later compare to more tweaked builds of the same map to see how much of an impact the extra detail has in certain areas, so at some point, I will basically be setting the global lightmap settings to things that are completely reckless, and later picking and choosing which surfaces can be dumbed-down for a pk3 file that won't be (too)mammoth. Again, I realize just waiting for the OpenCL version of q3map2 will solve the Light phase time for much less money, I'm just always on the hunt for alternatives that are immediately effective, and if possible, cost effective. I thought a PowerPC G5 system might bring something new(or old) to the table. (I was reading that the quad G5s can push 73Gflops, while my 6core pushes 60 to 65Gflops according to some benchmarks, but for all I know the G5 may be horrible at something q3map2 does well on X86)

Misantropia - I'm actually considering that cloud computing thing. If I do, I'll post the crap out of some benchmarks.

Post by **^Ghost** » Mon Mar 07, 2011 6:20 pm

i dont know what a gflop is. is it similiar to a g-string? but i7 is the way to go.

Post by **VolumetricSteve** » Mon Mar 07, 2011 6:55 pm

GFlop (short for GigaFLOP) is a measure of mathematical performance.

http://en.wikipedia.org/wiki/FLOPS

i7s are very fast, but they're also very expensive. To get an i7 that'd be ALMOST twice the speed of my current system, it'd run me about 1,500 dollars. That's fast, but....it's also 1,500 dollars. I'm almost tempted to get a barebones HPC server and just scale it up with more CPUs as I get the money to throw into it, but that's even more ludicrously expensive (but a lot cooler) than some of the other ideas I have bouncing around. If there were some..easy way to recompile q3map2, or put it in some kind of clustering environment, I could just get another system exactly like the one I have now for like...500 bucks, and get an amazing performance return (I'm guessing it wouldn't bottleneck at the network), but I've been looking into that, and it's still not any easier than it was the last time I looked into it. OpenMPI is freakin' complicated...(holy crap I'm waiting for some evolution in that field)

so...my choices are:
make maps that are WAYYY scaled back from what I'm trying to do (No)
wait for more cores to show up per processor socket vs price (Crazy)
wait for more sockets to show up in each motherboard vs price (Crazy)
magically "cluster" q3map2

---ideal (Dreamy)

also, I was looking at the q3map2 sourcecode and I stumbled upon this..

https://zerowing.idsoftware.com/svn/rad ... /listen.pl

I see mention of a "[Q3Map2 listener $0 is now active on port $port]\n"; "

is there some kind of built-in clustering code here? what else could that be? I hope it's not just the loop-back that produces console output....

Post by **^misantropia^** » Mon Mar 07, 2011 10:19 pm

VolumetricSteve wrote:also, I was looking at the q3map2 sourcecode and I stumbled upon this..

https://zerowing.idsoftware.com/svn/rad ... /listen.pl

I see mention of a "[Q3Map2 listener $0 is now active on port $port]\n"; "

is there some kind of built-in clustering code here? what else could that be? I hope it's not just the loop-back that produces console output....

Close. It's an aid for debugging q3map2 over the network. =)

Post by **VolumetricSteve** » Tue Mar 08, 2011 3:29 am

POOP.

ok, it's on. Looking into this...

http://www.kerrighed.org/wiki/index.php/Main_Page

Glancing over a tutorial, I saw "turn a bunch of old computers into one big SMP machine" which...may mean my great big, energy inefficient ship has finally come in. I have yet to see any mention of having to recompile apps to make use of the clustery goodness. Research has begun to see if this is too good to be true.

It looks like I can build 65 Gflops per node (6-core) at about 510 bucks just under 170 Watts each.

*edit : I'm not sure what can do what anymore, I spent all day reading clustering documentation and I think I actually know less now than I did when I started. Not a single thing I've read has been clear enough to say "one app will use the resources of many nodes" but it seems like that's what they're all skirting around....plus only a few of them mention recompiling code for the cluster(none of them make it clear how), other ones...just don't mention it...and kinda let you guess? After a day of "OH CRAP THIS MIGHT WOR----oh wait...what does that mean?" I think my soul is starting to leak out. How the hell is this so complicated/poorly documented?

Post by **VolumetricSteve** » Fri Mar 11, 2011 3:27 pm

I have an idea that could actually be so lazy, it's brilliant. I'm not sure it'd even constitute a real cluster because there'd be no active shared memory between systems, no interconnect, and no clustering software/environment.

I was thinking I could just write a script that would take a .map file, and divide the surfaces up by how hard they'd be to process so each system get's a semi-even workload. Then...the script would produce a different .map file for each node, and each map would the have the "nolightmap" parameter flipped on almost every surface except what that node was supposed to compile. Then when they're all done, you'd have a few maps all of which had incomplete lightmaps...you just add them together. I imagine you'd have to set the "zones" of lighting for each map to overlap a bit so you didn't get weird seams between zones, but...this should be VERY easily doable and totally avoid the need for high-speed interconnects, shared memory, overly specialized code...all of that....

Now that I have a clear plan that doesn't involve thousands of dollars of hardware,software, or research, any thoughts? Or reasons this wouldn't work?

Post by **^misantropia^** » Fri Mar 11, 2011 3:47 pm

I think you'll run into edge cases fast (and chopping up the map itself probably isn't going to be all that easy either).

Still, sounds like a fun side project. Go for it!

Post by **VolumetricSteve** » Fri Mar 11, 2011 4:28 pm

Yeah, I'm not sure how I'm going to do that, for my first test I was just gounna cut a map 80/20 and let my 6-core do 80% and let my pentium 4 do the left over 20 just to see how hard it is to splice the lightmaps back together, I feel like some simple image editing software should do the trick, irfran view or something and just layer the lightmaps ontop of eachother directly.

My only worry is that q3map2 will produce different shaped lightmaps for the different maps, making it almost impossible to drag and drop the lightmaps onto eachother....but there should be some switch to disable lightmap sorting or condensing that might screw that up I'd think.

Q3Map2 performance question

Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question

Re: Q3Map2 performance question