Affinity Mask

Any problems with your Installation or other install related questions, please post here

Moderators: Steve Waite, SysAdmin

Affinity Mask

Postby Steve Waite » Thu Nov 06, 2014 6:25 pm

(When coming back to this page press the refresh button on your browser (F5) in case of updates.)

Remember that P3D works differently to FSX and FSX-SE. FSX with no AM and HT enabled will try to use alternate logical processors (LPs) so as to use each core fully and unshared. P3D will use all LPs as if they are real cores and does not take into account that each pair of LPs is one core.

To increase texture and scenery loading speed, what we can do is lose out on several percent in fps stability and frame rate in exchange for faster loading. We can extend the number of scenery loading threads but this extends the overlap of system resources, addon exe's, and simulator rendering. Note that more than four threads given to the sim creates an overhead of handling them, leading to some minor stutter, but the scenery will load faster. If these threads over extend across the CPU they coincide with too many other processes causing stutter.


Overview of four core HT enabled:
For example, with the four core HT enabled (8 thread) the most we can allocate to load scenery is with no affinity mask, on four core HT enabled no AM=0=255=11,11,11,11 (the rightmost pair of ones being core zero).

Problem is the addon exe's will certainly overlap, and so will all the system resources.

Instead we can use an affinity mask of 254=11,11,11,10 which brings a little performance boost to the main thread, increases the load times, but it can still sawtooth badly (stutter) with that core used by a lot of other system activity. This maybe worse than no AM since with no AM the core use of zero and one are intensified and the jobscheduler will likely not target other processes onto them.

To gain stability and fps performance, we can reduce the loading speed even more (and possibly induce blurries) and move off of core zero entirely. This gives more room for addon exe's and system resources, and reduces the likelihood of stutter due to addon exe activity since their processes will be targeted onto core zero by the jobscheduler. In this case we can use AM=252=11,11,11,00. Again we can improve the performance of the main thread and lose out on loading speed with AM=248=11,11,10,00.

But now we've got hardly any increase in loading over AM=184=10,11,10,00 (equivalent to 116=01,11,01,00), maybe none at all since they are both competing for throughput of the same fourth core, and yet the fifth sim thread produced with 248 means an overhead to the simulator code to handle more than four jobs is a burden rather than an improvement.

So it's all about balance. The first three thread jobs of the sim don't always come out the same way depending on what bank of four Logical Processors the sim first thread started on, and then there are a myriad of other complications that make it very hard to describe hyperthreading and core affinity with P3D and FSX. But that's what we've got and we have to make the best use of that.

For the four core HT enabled, there's really only two configurations for the Affinity Mask; 85(170) 4 cores, or 116(184) 3 cores. These gain the best distribution of work that can be made on the four core HT CPUs. Three cores leaves one full core free for the jobscheduler to hit with other tasks, and four cores means that there's more power in the background threads of the sim, but there will always be coincidence with other processes main work, somewhere on the CPU.


Prelude: We hear that P3D understands the processor and asserts its own threads onto different cores as it needs them, this is true, and so does FSX, and FSX SE. But P3D and FSX only know how many "Logical Processors" (LPs) are available. They spread their threads across those available processors in a special way that can help the smoothness of the sim.

However, they don't understand "Hyperthreading" (HT), and don't make proper use of the availability of more than four cores. So we can help them by use of the Windows Job Scheduler and the Affinity Mask. HT makes the PC more crisp and responsive, so we don't particularly want to turn it off. Up to half the process switching time is saved across the CPU.

With everything equal, two equal threads on one core perform better with Hyperthreading enabled:

Image


What P3D and FSX do in terms of threading, and how they do it, requires an understanding of Windows programming and threading at an expert level considered a commercial asset, so understandably most of what's read about Affinity Masks (AMs) and FSX/P3D threading are guesswork.

Without going into any real technicalities, to use "HT" or "many cores" we must work with these following parameters:

1. We want FSX or P3D to utilise four *Processors*; three is a little too few and five is a little too many.
We want to ensure the first Processor utilised is started on a *Core* with no activity

2. We can only gain fps by making the Processor go faster; we can run another process on it along with the sim and the Turbo mode can cut in. This can show a higher fps on the in-sim readout and causes lots of confusion, but in reality the sim is behaving poorly (look at the graphs at the bottom of the page).

For demonstration, let's say we have a four core HT enabled, 8LP CPU. We know our addon runs on core 0 and we want to avoid it, leaves only three cores, but is actually six LPs. How do we best utilise these 6LPs, since each pair shares throughput of one of the other three cores. We want to help the sim place its threads across those four processors in such a way that the major procedures that demand maximum throughput, are not sharing cores. The Affinity Mask 116=(01,11,01,00) and an equivalent like 184=(10,11,10,00), even though HT is enabled, ensures terrain loading is kept to individual cores so that threads demanding maximum throughput are not sharing a core:

Image

The above image shows Terrain loading and sim running across four LPs on three cores. If we only have three cores available, with HT enabled we can use 4 LPs from the 6 LPs available across three cores, and we can group the middle two threads together on the middle core so that maximum throughput during terrain loading is not restricted.


With FSX, and currently P3D, we should take care to help the simulator threads establish themselves in a logical manner, with the available Logical Processors on the CPU. We use an Affinity Mask to partition the logical processors on an Hyper-Threading enabled CPU, so that the main thread of the sim has a core to itself. And with whichever HT mode, we don't have too many secondary threads, which would eventually stop the main thread to communicate too often, and might not allow the sim to continue smoothly.

The affinity mask can only be used to block the use of cores, or in HT mode they are called Logical Processors (LP). In HT mode each core is split into two LPs (HT supported processors only). These look like two cores to the system. Since sooner or later a core will be required to run other threads it will have to stop and save what it's doing and load up the other thread. HT enabled improves this performance, HT disabled causes inefficient thread swapping.

When we have HT enabled the FSX threads will line up on contiguous LPs, when we would prefer the main FSX thread to occupy it's own core. In this case we use an Affinity Mask to block off one of the LPs so the next thread moves onto the next core, leaving the main FSX thread to run on a core all on it's own.

Affinity masks also allow the engineer to control heat to a certain extent, especially when there are multiple CPUs.

The application is best started with an AM setting since using the Task Manager to set/unset cores or LPs won't necessarily get the application to align its threads the way we intended.

In FSX we put a new section in fsx.cfg so that the application starts up it's threads properly:

[JOBSCHEDULER]
AffinityMask=0

We would by default have an AM=0 if we do not include the JOBSCHEDULER section, this enables all cores or LPs.

With a 4 core HT enabled processor we have 8 LPs (on supported CPUs). We would want the first FSX thread to get a core of it's own so there is less thread switching, although this won't stop other apps using the LPs of that core.

We might use an AM=170 which in Binary is 10101010, the right most 0 represents the first LP of the first core. We use the Windows calculator in Programmer mode and switch between Dec and Bin. The right most pair "10" means that the first LP is disallowed use by the app, the second LP will load the first thread, the other threads will move on to find the next available LP. Windows sees the first core is utilised and finds another available core for the next LP.

In FSX we might only enable three cores (or 3 HT LPs) and leave one core (2 HT LPs) free for windows to find for other apps threads.

With HT on and 4 core = 8 LPs: AM=116 in binary represents 01,11,01,00 (cores or LPs are represented in binary). AM=116 effectively disables the first core (leaves available for addons), makes one LP available to the second core, the main thread starts there, then 2 LPs from the third core, then one more LP is utilised from the fourth core. We might use 116 instead of 85 when we want to devote one core to another app that may demand full throughput on occasions, and so we leave only three cores to the sim. But three cores used effectively can be very close in performance to four full cores.

If we allocate one core for addons, ensure any addons get at least two separate LPs in an HT enabled system. Allocation of LP0 and LP1 alone should be OK even with intense weather addons.


Suggested masks to consider

2 core = 2 LP: leave blank or 3
2 core HT = 4 LP: 13 or 14
4 core = 4 LP: 13 or 14
4 core HT = 8 LP: 116 (last 3 cores) or 170 (all four cores)
6 core = 6 LP: 30 or 60
6 core HT = 12 LP: 340 or 1360
8 core = 8 LP: 1360 or 240
8 core HT = 16 LP: 1360 or 5440


For demonstration purposes, here we have a 6 core with HT enabled, showing AM=3392, vs AM=0 (no JOBSCHEDULER section). With 3392=(11,01,01,00,00,00) we can see the first P3D thread on LP6 has an unused partner LP7. With no AM or AM=0, the first P3D thread shares a core with another losing around 10% in overall performance. Tests made with small sim windows (not on fullscreen windows).

Image


Now looking at 4 cores of a 6 core, verses 3, but utilising 6 LPs as we have available with a four core, and only 3 LPs as we would have in a dual core+HT (although using 3 actual cores). With only 3 LPs, the sim places the second and third threads onto the second available LP, then makes a fourth data gathering thread on the third LP, With more cores we can gang up data gathering threads and the second thread, the thread pool, has an LP on it's own. Remember here we have other cores available since it's on a 6 core. So with a four and two core CPU other programs will occupy all those cores more than they do with the 6 core in these examples:

Image

In the sim we are panning, rotating in the outside view. But with the better utilised cores with the Affinity Mask 1000, we get a smoother sim and the fps is less interrupted.

With the latest versions IF10 >augmented and IF10 Professional, adding the frame rate to the sim values output .csv file (comma separated values), we can use a spreadsheet program to read the .csv and plot the columns. Here I've plotted fps - Delta (Av) which shows the true performance on a 6 core CPU with various Affinity Masks.

Look at the light blue line for real performance, "fps - Delta (Av)", how close it is to the red line, "Average fps".

Image

And again, but with the last two captures showing HT disabled. With the CPU in Turbo mode the load inproves the turbo setting, the CPU increases clock speed to meet demand. This means that although some appear to have higher fps, due to extra demand, they also have much greater Delta (change in fps), which detracts from the overall performance.

HT enabled with AM=340=(00,01,01,01,01,00) provided consistently smoothest results.

Image


P3D v3.1 Splitting up the sim into jobs;
four verses five jobs - four jobs better than 5 or more jobs, or 3 jobs:
Image
(Red line is true performance, fps-Delta)

P3D v3.1 Splitting up the sim jobs over the CPU;
four verses five cores - Four cores beats 5 cores and HT enabled beats HT disabled:
Image
(Red line is true performance, fps-Delta)


A very heavy test for a 3960x+GTX680 - P3D v3.1, KSEA and high autogen, full IF10 overcast weather and continuous thermals, volume fog, high shadows, Ai injection and ATIS, all the visual effects, and water on medium, outside view across aircraft, over very high density urban area from around 500ft on AP, just took off, making a 45 degree turn at 2 minutes, and of course, being monitored by IF10 Pro. Each plot has identical environment and starts on the same frame, same cloud same place same time. AA is absent, bringing AA on appears to cap the frame rate at some point, as application of higher orders of AA is applied, so AA is set off, no vsync and set to unlimited fps, default NCP profile. In all cases there were no problems with scenery, for example no waiting for autogen to appear. 4, 5, 6 and 8 jobs, 4 jobs produced the smoothest view:

Image


Conclusion

In all the testing and analysis, splitting the sim into four jobs works best, and HT enabled works best. Five jobs is always a poor result, but six jobs brings back a good result, but not as good as four. Seven or eight jobs continue to make no difference, they only serve to fill in CPU time where the jobscheduler would otherwise be able to place other tasks, addon exe's for instance.

More than four jobs also takes more time to load, and shows initially a higher frame rate, for 30 seconds or so, until the sim settles down. Do not be fooled by this higher initial frame rate, it is because the sim is waiting longer for the background threads to come online.


Tip about exe.xml, the file that the sim uses to start other applications:

If exe programs are started with the FSX/P3D exe.xml file *they start within the affinity of the sim*. That means they load up the sim cores, unnecessarily robbing the sim of performance. Instead we can run them away from the main sim threads. Maybe start these type of programs from IF10 or a .bat or proc lasso instead of using the sim exe.xml file.

dll.xml processes are fixed within the sim affinity, we cannot improve on that.

Example, I run 2 Saitek panels.exe programs to run the panels and displays. I run a couple of Saitek panels.exe programs from IF10 Start Other Apps on the front page, and set the affinity with the " IF10AM=[n]" command on the end of the parameters line. IF10 will dedicate an affinity space for these apps rather than move them after they started.


EFSA
Ideal Flight 10 brings you EFSA, advanced affinity space setup for P3D, FSX, and FSX-SE. Here clearly demonstrates Ideal Flight 10 - ExpandedFSAffinity and also IF10 app startup vs starting exe's from exe.xml.

EFSA gaining as much as 2% performance enhancement in a very demanding and strictly repetitive test:

Ideal Flight 10 - ExpandedFSAffinity


Overclockers:

Reducing the amount of ones "1" in the mask decreases heat. 4 core HT enabled - how about AM=52=(00,11,01,00). With the four core, if we want to go cooler than 116 try 52=(00,11,01,00). This only uses two cores, but splits into three jobs, and performs very well, not much less that 116, but better fps at higher GHz at the expense of loading textures. Slightly lower fps, slightly more heat, for increased texture loading AM=60=(00,11,11,00) which splits the sim into four jobs, moving the second job onto the main thread core.


Overview of setups for a 4core+HT using three and four cores:

The four core versions:

01,01,01,01 = 85 dec - sim - best rendering
10,10,00,00 = A0 hex - apps (for batch file setting)

or

11,11,01,01 = 245 dec - sim - best loading
10,10,00,00 = A0 hex- apps


The three core versions:

01,11,01,00 = 116 dec - sim - best rendering
10,00,00,01 = 81 hex - apps or
10,00,00,10 = 42 hex - apps or
00,00,00,11 = 3 hex - apps

or

11,11,01,00 = 244 dec - sim - best loading
00,00,00,11 = 3 hex - apps

(Also see Affinity .bat files Page)
software architect at codelegend.com
equipment: i9-9980Xe 64GB 2xRTX2080ti NVLink 2TB M.2 NVMe,
i9-9900X 64GB RTX2080ti 2TB M.2 NVMe, i7-3960X 32GB GTX680 4TB RAID10,
NAS @7TB RAID10 (16TB)
Steve Waite
 
Posts: 5055
Joined: Wed Jun 29, 2011 12:02 am

Return to Installation

Who is online

Users browsing this forum: No registered users and 3 guests