Jump to content
The simFlight Network Forums

issue with ipc.get/set


Recommended Posts

I have a few scripts which share variables with each other using ipc.get and ipc.set. I noticed that FSUIPC would often mix up the values of different variables for a few milliseconds and very rarely one of the scripts would crash with an access violation error.

I made the following script which replicates my problem:

local modules_dir = lfs.currentdir():gsub("\\\\","\\") .. "\\Modules\\"
local testdir = "__TEST__\\"
lfs.mkdir(modules_dir .. testdir)

for i = 1, 100 do
   local path = testdir .. "test" .. i .. ".lua"
   local content =
   [[while true do 
      for i = 1,1000 do 
         ipc.get('var' .. i)
         ipc.sleep(100)
      end
      for i = 1,1000 do 
        ipc.set('var' .. i, "you're not supposed to see this") 
        ipc.sleep(100)
      end 
   end]]
   local file = io.open(path, "w")
   file:write(content)
   file:close() 
   ipc.runlua(path)
end

while true do
   local myVar = ipc.get("myVar")
   if myVar then print(myVar) end
   ipc.sleep(100)
end

This is intentionally over the top to replicate the issue quickly. I only have a handful of scripts and variables.

Link to comment
Share on other sites

1 hour ago, adrem said:

I made the following script which replicates my problem:

This is probably some clash between threads. You should have something to slow down those loops -- an ipc.sleep perhaps. There's no way threads can be switched fast enough to meet your demands. The globals are, of necessity, stored and retrieved in a different (invisible to you) thread, so that all threads can access them.

I'm not sure there's anything FSUIPC can do unless we deliberately assume a sleep before each get and after each set request. That might actually be necessary now that I think of it -- for FSUIPC5 and the future in any case (you don't say what you are using) -- but as it has never been a problem before

I think your solution is to put them in yourself. You might need to expriment with different sizes of sleep, from (0) which usually allows a thread change (but maybe not to the desired one) upwards.

Oh, two questions

1. In your test.lua, why are you getting the 1000 variables before they are set. Wouldn't you expect errors? What is setting them beforehand?

2. Why the complication of creating a lua file in a lua file instead of just creating it yourself? Is there something I've not understood?

Pete

 

Link to comment
Share on other sites

2 hours ago, Pete Dowson said:

This is probably some clash between threads. You should have something to slow down those loops -- an ipc.sleep perhaps. There's no way threads can be switched fast enough to meet your demands. The globals are, of necessity, stored and retrieved in a different (invisible to you) thread, so that all threads can access them.

I'm not sure there's anything FSUIPC can do unless we deliberately assume a sleep before each get and after each set request. That might actually be necessary now that I think of it -- for FSUIPC5 and the future in any case (you don't say what you are using) -- but as it has never been a problem before

I think your solution is to put them in yourself. You might need to expriment with different sizes of sleep, from (0) which usually allows a thread change (but maybe not to the desired one) upwards.

Oh, two questions

1. In your test.lua, why are you getting the 1000 variables before they are set. Wouldn't you expect errors? What is setting them beforehand?

2. Why the complication of creating a lua file in a lua file instead of just creating it yourself? Is there something I've not understood?

Pete

 

2) I wanted to replicate the issue outside of my actual scripts and I knew that it stemmed from multiple files interacting with the global variables. This is easier than creating and editing those multiple files by hand.

The getting the variables before setting them isn't really necessary as removing that loop causes the same errors. Even if I insert ipc.sleep(2000) inside the setting loop, I still get the errors, eventually. I actually need a maximum interval of 50 ms between two ipc.get calls in one of my scripts, so I'll just use the user offsets instead if this is not fixable.

Edited by adrem
Link to comment
Share on other sites

12 hours ago, adrem said:

The getting the variables before setting them isn't really necessary as removing that loop causes the same errors. Even if I insert ipc.sleep(2000) inside the setting loop, I still get the errors, eventually. I actually need a maximum interval of 50 ms between two ipc.get calls in one of my scripts, so I'll just use the user offsets instead if this is not fixable.

I wouldn't know how to fix it, that's the problem. The set and get functions are really simple, using basic calls into the Lua code which is not mine, but the free lua.org package, and for which the source is complex, convoluted and too complex for my aged and gradually failing brain. Maybe there is some bug in the version of Lua we used -- there are later versions.

However, on this:

15 hours ago, adrem said:

very rarely one of the scripts would crash with an access violation error.

The details of the crash might be useful to check where this is occurring as that might then be fixable. You can get these details, even historically, from the Windows Event Viewer (Application Logs). But I'd only be interested if it was for the latest FSUIPC5 release, or the latest WideClient.

Pete

 

Link to comment
Share on other sites

  • 5 months later...
On 1/9/2020 at 1:55 PM, Pete Dowson said:

The set and get functions are really simple, using basic calls into the Lua code which is not mine,

Sorry for the stupid question, but could it simply be the case that the code isn't thread safe? If I understand correctly, ipc.get and ipc.set are implemented through a dedicated lua state where you store the shared variables. Lua isn't thread safe, so if you aren't locking the state in your own code, ipc.get and ipc.set aren't thread-safe either.

Link to comment
Share on other sites

1 hour ago, adrem said:

Sorry for the stupid question, but could it simply be the case that the code isn't thread safe? If I understand correctly, ipc.get and ipc.set are implemented through a dedicated lua state where you store the shared variables. Lua isn't thread safe, so if you aren't locking the state in your own code, ipc.get and ipc.set aren't thread-safe either.

Can you explain your thinking?  FSUIPC is just using the facilities in Lua combined with those for threading in Windows.

Yes, a global variable can be "set" in one thread  whilst it is being "read" in another. A timing difference of a few milliseconds can make a difference. But this can occur in any case. If you were getting partially wrong values -- part of a variable being written whilst part was being read, i could understand your concern -- wrong values. But as this isn't the case, what is your thinking?

I don't really see how any such problems could cause access violation crashes.

BTW what happened between the 8th January and now? Been on a long holiday?  🙂

Pete

 

 

Link to comment
Share on other sites

1 hour ago, Pete Dowson said:

I don't really see how any such problems could cause access violation crashes.

But how can you know that? You said yourself that you don't know what the Lua code is doing.

By the way, this is the exact line that causes the access violations:

 Capture.PNG.eef424d3f2e808f3f13b1ecfcaefa9b4.PNG

1 hour ago, Pete Dowson said:

BTW what happened between the 8th January and now? Been on a long holiday?  🙂

I just randomly remembered this for some reason 😄

Link to comment
Share on other sites

4 hours ago, adrem said:

By the way, this is the exact line that causes the access violations:

Can you understand enought of that to explain what is happening then? If that source reconstructed (in a debugger?) from our binary or located some other way?

Do you have the Windows crash information? Version 6.08 of FSUIPC? Or WideClient?

I'm still not sure what I can do about it.

Pete

 

Link to comment
Share on other sites

20 hours ago, Pete Dowson said:

If that source reconstructed (in a debugger?) from our binary or located some other way?

I looked up the binary code at the address that crashed in my own program that uses lua. It's the ltable.c file from the lua source.

20 hours ago, Pete Dowson said:

Do you have the Windows crash information? Version 6.08 of FSUIPC? Or WideClient?

FSUIPC 6.08. P3D itself didn't crash because FSUIPC catches these exceptions in the lua threads and displays the address in the console.

20 hours ago, Pete Dowson said:

I'm still not sure what I can do about it.

The only way to prevent this is to lock the shared lua VM with a mutex on each read and write. Maybe you shouldn't do anything at all 🙂 I personally don't even use ipc.get and ipc.set anymore and it appears that I'm the only one who ever complained.

Link to comment
Share on other sites

3 hours ago, adrem said:

The only way to prevent this is to lock the shared lua VM with a mutex on each read and write.

Or, as I more usually do, just put the push and pop calls into their own critical section, preventing thread swapping in the Lua code doing the push and pop. However, the Lua push and pop functions already have locks on them (lua_lock and lua_unlock calls either end) so I never thought it necessary.

However, looking through the source I see this:

#ifndef lua_lock
#define lua_lock(L)     ((void) 0) 
#define lua_unlock(L)   ((void) 0)
#endif


which is very strange. I'd not actually believed they did nothing! I wonder why they aren't used.

I'm very puzzled.

They are used in a huge number of places. Maybe it's something to do with efficiency v reliability?

Pete

 

Link to comment
Share on other sites

1 hour ago, Pete Dowson said:

They are used in a huge number of places. Maybe it's something to do with efficiency v reliability?

Lua is designed to run in a single thread so they are no-op in a vanilla distribution. lua_lock and lua_unlock are supposed to be implemented by the user if needed.

Link to comment
Share on other sites

10 hours ago, adrem said:

Lua is designed to run in a single thread so they are no-op in a vanilla distribution. lua_lock and lua_unlock are supposed to be implemented by the user if needed.

Ah, I see. Thank you.

I wouln't have implemented them as they are because they are used in every reference to the Lua stack. Since in the FSUIPC/WideClient implementation each Lua thread is separate except for the common use of the 'global' stack, it is only for the "get" and "put2 operations than the interlocak is needed.

I'll mention it to John, see if he would like to implement this safety measure in FSUIPC6. 

Pete

 

Link to comment
Share on other sites

On 1/8/2020 at 9:25 PM, adrem said:

I have a few scripts which share variables with each other using ipc.get and ipc.set. I noticed that FSUIPC would often mix up the values of different variables for a few milliseconds and very rarely one of the scripts would crash with an access violation error.

I made the following script which replicates my problem:

John agreed to put it into FSUIPC6, at least, so i've done the code change to test, and used your original Lua script above to try to force a fail.

It got to having the  100 test Luas running but my PC (a 7700 running at 5GHz) was by then so sluggish with so many threads running that I turfed it off. How many did you get to before it failed, typically? And what do you expect the loop at the end to do (the loop forever reading a global called "myvar" which was never written)?

If this is a good test I'll send the change to John for including in the next release.

Pete

 

Link to comment
Share on other sites

2 hours ago, Pete Dowson said:

It got to having the  100 test Luas running but my PC (a 7700 running at 5GHz) was by then so sluggish with so many threads running that I turfed it off. How many did you get to before it failed, typically? And what do you expect the loop at the end to do (the loop forever reading a global called "myvar" which was never written)?

Don't know how it ran on my PC back then, but the original script makes my PC sluggish now too. I've amended the script in my initial post to include sleep statements in the for loops. With it, I got the following output:

    40234 -------------------- Starting everything now ----------------------
    41437 Advanced Weather Interface Enabled
   158484 LUA.0: you're not supposed to see this
   158812 LUA.0: you're not supposed to see this
   160343 LUA.0: you're not supposed to see this
   162312 LUA.0: you're not supposed to see this
   164171 LUA.0: you're not supposed to see this
   165265 LUA.0: you're not supposed to see this
   171500 LUA.0: you're not supposed to see this
   172703 LUA.0: you're not supposed to see this
   177187 LUA.0: you're not supposed to see this
   178390 LUA.0: you're not supposed to see this
   182109 LUA.0: you're not supposed to see this
   185500 LUA.0: you're not supposed to see this
   192609 LUA.0: you're not supposed to see this
   197312 LUA.0: you're not supposed to see this
   198078 LUA.0: you're not supposed to see this
   203593 LUA.1: Crash C0000005 at 7FFD48FD7BA3: "C:\Prepar3D v4\Modules\__TEST__\test1.lua"
   204000 LUA.2: Crash C0000005 at 7FFD48FD7BA3: "C:\Prepar3D v4\Modules\__TEST__\test2.lua"
   207000 LUA.3: Crash C0000005 at 7FFD48FD7BA3: "C:\Prepar3D v4\Modules\__TEST__\test3.lua"


Then P3D crashed with an access violation in ntdll.dll

You can reduce the number of threads but then you'll have to wait longer. In the setup where I encountered this first, I only had 3 threads with a dozen of these 'global' variables and eventually I had the variables being mixed up after about 30 minutes.

Link to comment
Share on other sites

1 hour ago, adrem said:

You can reduce the number of threads but then you'll have to wait longer. In the setup where I encountered this first, I only had 3 threads with a dozen of these 'global' variables and eventually I had the variables being mixed up after about 30 minutes.

Okay. With the interlock implemented and your modified Lua script I have had it running with the 100 threads for over 60 minutes now and no failures of either type.

I'll provide the mods to John to incorporate in the next update. I'll release a WideClient update withig the next day or two. John may want you to verify the build with an interim update before he releases it.

Pete

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use. Guidelines Privacy Policy We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.