adrem Posted January 8, 2020 Report Posted January 8, 2020 I have a few scripts which share variables with each other using ipc.get and ipc.set. I noticed that FSUIPC would often mix up the values of different variables for a few milliseconds and very rarely one of the scripts would crash with an access violation error. I made the following script which replicates my problem: local modules_dir = lfs.currentdir():gsub("\\\\","\\") .. "\\Modules\\" local testdir = "__TEST__\\" lfs.mkdir(modules_dir .. testdir) for i = 1, 100 do local path = testdir .. "test" .. i .. ".lua" local content = [[while true do for i = 1,1000 do ipc.get('var' .. i) ipc.sleep(100) end for i = 1,1000 do ipc.set('var' .. i, "you're not supposed to see this") ipc.sleep(100) end end]] local file = io.open(path, "w") file:write(content) file:close() ipc.runlua(path) end while true do local myVar = ipc.get("myVar") if myVar then print(myVar) end ipc.sleep(100) end This is intentionally over the top to replicate the issue quickly. I only have a handful of scripts and variables.
Pete Dowson Posted January 8, 2020 Report Posted January 8, 2020 1 hour ago, adrem said: I made the following script which replicates my problem: This is probably some clash between threads. You should have something to slow down those loops -- an ipc.sleep perhaps. There's no way threads can be switched fast enough to meet your demands. The globals are, of necessity, stored and retrieved in a different (invisible to you) thread, so that all threads can access them. I'm not sure there's anything FSUIPC can do unless we deliberately assume a sleep before each get and after each set request. That might actually be necessary now that I think of it -- for FSUIPC5 and the future in any case (you don't say what you are using) -- but as it has never been a problem before I think your solution is to put them in yourself. You might need to expriment with different sizes of sleep, from (0) which usually allows a thread change (but maybe not to the desired one) upwards. Oh, two questions 1. In your test.lua, why are you getting the 1000 variables before they are set. Wouldn't you expect errors? What is setting them beforehand? 2. Why the complication of creating a lua file in a lua file instead of just creating it yourself? Is there something I've not understood? Pete
adrem Posted January 9, 2020 Author Report Posted January 9, 2020 (edited) 2 hours ago, Pete Dowson said: This is probably some clash between threads. You should have something to slow down those loops -- an ipc.sleep perhaps. There's no way threads can be switched fast enough to meet your demands. The globals are, of necessity, stored and retrieved in a different (invisible to you) thread, so that all threads can access them. I'm not sure there's anything FSUIPC can do unless we deliberately assume a sleep before each get and after each set request. That might actually be necessary now that I think of it -- for FSUIPC5 and the future in any case (you don't say what you are using) -- but as it has never been a problem before I think your solution is to put them in yourself. You might need to expriment with different sizes of sleep, from (0) which usually allows a thread change (but maybe not to the desired one) upwards. Oh, two questions 1. In your test.lua, why are you getting the 1000 variables before they are set. Wouldn't you expect errors? What is setting them beforehand? 2. Why the complication of creating a lua file in a lua file instead of just creating it yourself? Is there something I've not understood? Pete 2) I wanted to replicate the issue outside of my actual scripts and I knew that it stemmed from multiple files interacting with the global variables. This is easier than creating and editing those multiple files by hand. The getting the variables before setting them isn't really necessary as removing that loop causes the same errors. Even if I insert ipc.sleep(2000) inside the setting loop, I still get the errors, eventually. I actually need a maximum interval of 50 ms between two ipc.get calls in one of my scripts, so I'll just use the user offsets instead if this is not fixable. Edited January 9, 2020 by adrem
Pete Dowson Posted January 9, 2020 Report Posted January 9, 2020 12 hours ago, adrem said: The getting the variables before setting them isn't really necessary as removing that loop causes the same errors. Even if I insert ipc.sleep(2000) inside the setting loop, I still get the errors, eventually. I actually need a maximum interval of 50 ms between two ipc.get calls in one of my scripts, so I'll just use the user offsets instead if this is not fixable. I wouldn't know how to fix it, that's the problem. The set and get functions are really simple, using basic calls into the Lua code which is not mine, but the free lua.org package, and for which the source is complex, convoluted and too complex for my aged and gradually failing brain. Maybe there is some bug in the version of Lua we used -- there are later versions. However, on this: 15 hours ago, adrem said: very rarely one of the scripts would crash with an access violation error. The details of the crash might be useful to check where this is occurring as that might then be fixable. You can get these details, even historically, from the Windows Event Viewer (Application Logs). But I'd only be interested if it was for the latest FSUIPC5 release, or the latest WideClient. Pete
adrem Posted June 23, 2020 Author Report Posted June 23, 2020 On 1/9/2020 at 1:55 PM, Pete Dowson said: The set and get functions are really simple, using basic calls into the Lua code which is not mine, Sorry for the stupid question, but could it simply be the case that the code isn't thread safe? If I understand correctly, ipc.get and ipc.set are implemented through a dedicated lua state where you store the shared variables. Lua isn't thread safe, so if you aren't locking the state in your own code, ipc.get and ipc.set aren't thread-safe either.
Pete Dowson Posted June 23, 2020 Report Posted June 23, 2020 1 hour ago, adrem said: Sorry for the stupid question, but could it simply be the case that the code isn't thread safe? If I understand correctly, ipc.get and ipc.set are implemented through a dedicated lua state where you store the shared variables. Lua isn't thread safe, so if you aren't locking the state in your own code, ipc.get and ipc.set aren't thread-safe either. Can you explain your thinking? FSUIPC is just using the facilities in Lua combined with those for threading in Windows. Yes, a global variable can be "set" in one thread whilst it is being "read" in another. A timing difference of a few milliseconds can make a difference. But this can occur in any case. If you were getting partially wrong values -- part of a variable being written whilst part was being read, i could understand your concern -- wrong values. But as this isn't the case, what is your thinking? I don't really see how any such problems could cause access violation crashes. BTW what happened between the 8th January and now? Been on a long holiday? 🙂 Pete
adrem Posted June 23, 2020 Author Report Posted June 23, 2020 1 hour ago, Pete Dowson said: I don't really see how any such problems could cause access violation crashes. But how can you know that? You said yourself that you don't know what the Lua code is doing. By the way, this is the exact line that causes the access violations: 1 hour ago, Pete Dowson said: BTW what happened between the 8th January and now? Been on a long holiday? 🙂 I just randomly remembered this for some reason 😄
Pete Dowson Posted June 23, 2020 Report Posted June 23, 2020 4 hours ago, adrem said: By the way, this is the exact line that causes the access violations: Can you understand enought of that to explain what is happening then? If that source reconstructed (in a debugger?) from our binary or located some other way? Do you have the Windows crash information? Version 6.08 of FSUIPC? Or WideClient? I'm still not sure what I can do about it. Pete
adrem Posted June 24, 2020 Author Report Posted June 24, 2020 20 hours ago, Pete Dowson said: If that source reconstructed (in a debugger?) from our binary or located some other way? I looked up the binary code at the address that crashed in my own program that uses lua. It's the ltable.c file from the lua source. 20 hours ago, Pete Dowson said: Do you have the Windows crash information? Version 6.08 of FSUIPC? Or WideClient? FSUIPC 6.08. P3D itself didn't crash because FSUIPC catches these exceptions in the lua threads and displays the address in the console. 20 hours ago, Pete Dowson said: I'm still not sure what I can do about it. The only way to prevent this is to lock the shared lua VM with a mutex on each read and write. Maybe you shouldn't do anything at all 🙂 I personally don't even use ipc.get and ipc.set anymore and it appears that I'm the only one who ever complained.
Pete Dowson Posted June 24, 2020 Report Posted June 24, 2020 3 hours ago, adrem said: The only way to prevent this is to lock the shared lua VM with a mutex on each read and write. Or, as I more usually do, just put the push and pop calls into their own critical section, preventing thread swapping in the Lua code doing the push and pop. However, the Lua push and pop functions already have locks on them (lua_lock and lua_unlock calls either end) so I never thought it necessary. However, looking through the source I see this: #ifndef lua_lock #define lua_lock(L) ((void) 0) #define lua_unlock(L) ((void) 0) #endif which is very strange. I'd not actually believed they did nothing! I wonder why they aren't used. I'm very puzzled. They are used in a huge number of places. Maybe it's something to do with efficiency v reliability? Pete
adrem Posted June 25, 2020 Author Report Posted June 25, 2020 1 hour ago, Pete Dowson said: They are used in a huge number of places. Maybe it's something to do with efficiency v reliability? Lua is designed to run in a single thread so they are no-op in a vanilla distribution. lua_lock and lua_unlock are supposed to be implemented by the user if needed.
Pete Dowson Posted June 25, 2020 Report Posted June 25, 2020 10 hours ago, adrem said: Lua is designed to run in a single thread so they are no-op in a vanilla distribution. lua_lock and lua_unlock are supposed to be implemented by the user if needed. Ah, I see. Thank you. I wouln't have implemented them as they are because they are used in every reference to the Lua stack. Since in the FSUIPC/WideClient implementation each Lua thread is separate except for the common use of the 'global' stack, it is only for the "get" and "put2 operations than the interlocak is needed. I'll mention it to John, see if he would like to implement this safety measure in FSUIPC6. Pete
Pete Dowson Posted June 25, 2020 Report Posted June 25, 2020 On 1/8/2020 at 9:25 PM, adrem said: I have a few scripts which share variables with each other using ipc.get and ipc.set. I noticed that FSUIPC would often mix up the values of different variables for a few milliseconds and very rarely one of the scripts would crash with an access violation error. I made the following script which replicates my problem: John agreed to put it into FSUIPC6, at least, so i've done the code change to test, and used your original Lua script above to try to force a fail. It got to having the 100 test Luas running but my PC (a 7700 running at 5GHz) was by then so sluggish with so many threads running that I turfed it off. How many did you get to before it failed, typically? And what do you expect the loop at the end to do (the loop forever reading a global called "myvar" which was never written)? If this is a good test I'll send the change to John for including in the next release. Pete
adrem Posted June 25, 2020 Author Report Posted June 25, 2020 2 hours ago, Pete Dowson said: It got to having the 100 test Luas running but my PC (a 7700 running at 5GHz) was by then so sluggish with so many threads running that I turfed it off. How many did you get to before it failed, typically? And what do you expect the loop at the end to do (the loop forever reading a global called "myvar" which was never written)? Don't know how it ran on my PC back then, but the original script makes my PC sluggish now too. I've amended the script in my initial post to include sleep statements in the for loops. With it, I got the following output: 40234 -------------------- Starting everything now ---------------------- 41437 Advanced Weather Interface Enabled 158484 LUA.0: you're not supposed to see this 158812 LUA.0: you're not supposed to see this 160343 LUA.0: you're not supposed to see this 162312 LUA.0: you're not supposed to see this 164171 LUA.0: you're not supposed to see this 165265 LUA.0: you're not supposed to see this 171500 LUA.0: you're not supposed to see this 172703 LUA.0: you're not supposed to see this 177187 LUA.0: you're not supposed to see this 178390 LUA.0: you're not supposed to see this 182109 LUA.0: you're not supposed to see this 185500 LUA.0: you're not supposed to see this 192609 LUA.0: you're not supposed to see this 197312 LUA.0: you're not supposed to see this 198078 LUA.0: you're not supposed to see this 203593 LUA.1: Crash C0000005 at 7FFD48FD7BA3: "C:\Prepar3D v4\Modules\__TEST__\test1.lua" 204000 LUA.2: Crash C0000005 at 7FFD48FD7BA3: "C:\Prepar3D v4\Modules\__TEST__\test2.lua" 207000 LUA.3: Crash C0000005 at 7FFD48FD7BA3: "C:\Prepar3D v4\Modules\__TEST__\test3.lua" Then P3D crashed with an access violation in ntdll.dll You can reduce the number of threads but then you'll have to wait longer. In the setup where I encountered this first, I only had 3 threads with a dozen of these 'global' variables and eventually I had the variables being mixed up after about 30 minutes.
Pete Dowson Posted June 25, 2020 Report Posted June 25, 2020 1 hour ago, adrem said: You can reduce the number of threads but then you'll have to wait longer. In the setup where I encountered this first, I only had 3 threads with a dozen of these 'global' variables and eventually I had the variables being mixed up after about 30 minutes. Okay. With the interlock implemented and your modified Lua script I have had it running with the 100 threads for over 60 minutes now and no failures of either type. I'll provide the mods to John to incorporate in the next update. I'll release a WideClient update withig the next day or two. John may want you to verify the build with an interim update before he releases it. Pete
John Dowson Posted June 26, 2020 Report Posted June 26, 2020 (edited) Hi Adrem, could you try the following dll please (v6.0.9c): FSUIPC6.dll I will (most probably) release this publicly at the weekend, as version 6.0.9. Cheers, John Edited June 26, 2020 by John Dowson dll added
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now