Constant 1-30 minute crashes

Discussion in 'Bukkit Help' started by xeology, Mar 12, 2011.

Thread Status:
Not open for further replies.
  1. Offline

    xeology

    we are running different OS, and this is only affecting newer builds of bukkit meaning if bukkit is NOW incompatable with ubuntu, linux, certain javas that is BUKKITS fault and needs to be dealt with by BUKKIT, not denied. And many more people are comming up with this issue. And if it is something to do with particular methods then its most certainly bukkit because the person using nothing but chatbukkit and scrapbukkit is by the same authors of bukkit itself!
    [MERGETIME="1300052985"][/MERGETIME]
    heres a better idea is there a way to SEE EXACTLY what bukkit is doing when it goes into this terminal infinate loop? Such as a method to see what it is doing, what hooks are being used or ANYTHING such as a error handler/monitor?
    [MERGETIME="1300053674"][/MERGETIME]
    On another note I have just downgraded BACK to my original version of 440
     
  2. Offline

    Racha

    Same for me ;/ Please fix this how fast you can :) :) :) Good luck.
     
  3. Offline

    xeology

    give it up, i have, no one is going to claim the problem as their software or as their software having bad interactions. Honest programming is dead. I am sure the responsable party has seen this thread at SOME point and I am sure they refuse to take credit for this issue.

    And I find it strangely coincedental that whenever I ask for a method of monitoring what the bukkit servers are doing to see the true cause, everyone just pretends like I didnt say anything (because this would expose sloppy programming and their TRUE origination) Maybe this is the TRUE reason why people are not including PROPER error handling in their code? Dont want to be pegged as the cause of an issue?
    [MERGETIME="1300055917"][/MERGETIME]
    Also want to point out the flaw of trying to debug a non-replicatable error (completely 100% random with no error, reason, particular setup, particular plugin, particular ANYTHING or timeframe) without proper tools to do so, which no one makes mention of existing
     
  4. Offline

    Racha

    @xeology
    Okay i dont understand half of your text, because i am bad at english ;/ Bu what you sad was - they wont fix it? :O
     
  5. Offline

    xeology

    long story short, we dont know whos fault and no one will be honest and say that it is them.
     
  6. Offline

    contex

    I'm willing to help you test a bit, I'm running a 2GB RAM VPS, CentOS, Intel Core i7 860, uplink at 100mbps.
    And currently running b531 WITHOUT any of these warnings, I could test each plugins that I don't have that you have and see if I get any of these warnings.

    Plugins:

    Code:
    Achievements.jar
    Authentication.jar
    AutoSave.jar
    BigBrother.jar
    ChatStamp.jar
    Citizens.jar
    CraftIRC.jar
    dynmap.jar
    General.jar
    HealthyNames.jar
    iConomy.jar
    iSee.jar
    LocalShops.jar
    Lockette.jar
    mcMMO.jar
    MinecraftCheck.jar
    MonsterHunt.jar
    MyHome.jar
    Permissions.jar
    Portcullis.jar
    SearchIDs.jar
    Stats.jar
    TelePlus.jar
    uQuest.jar
    Whitelist.jar
    WorldEdit.jar
    WorldGuard.jar
    
     
  7. Offline

    Racha

    @contex
    And you have Windows...
     
  8. Offline

    contex

    Server is running on Linux CentOS while my OS on this computer is Windows.... Why?
     
  9. Offline

    xeology

    Good luck, thanks for the support. I have just been trying running 440 with no issues as of yet.
    [MERGETIME="1300071749"][/MERGETIME]
    5 hours and no crashes on 440.
     
  10. Offline

    Spikey

    This thread has gone along quite a bit but I am running on #531 with no BackupPlugin and no errors have been coming up, except the usual "Server cannot keep up"
     
  11. Offline

    Racha

    It shows that Server cant keep up and than it laggs ;/ No crashes, nothing, only that error, and console is full of it.
     
  12. Offline

    xeology

    my crashes have been fixed as of reverting back to 440, the issue is most DEFINITELY bukkit builds 493-531, it could be incompatibility with certain setups, usage crash, incompatibility with a common method of doing something, who knows! All I know is it does not affect 440 and I will remain 440 until bukkit is either fixed or I find the enthusiasm to switch back to hmod (considering the roughly 100+ builds unavailable to some of us due to instability).
     
  13. Offline

    nickguletskii

    This could be an issue with all of the plugins that do raw backups (copy and paste the world), because what I think the problem is, is that the plugin locks the file while copying and bukkit either:
    1. Has to wait until the world is unlocked again.
    2. The plugin finishes its slow task (lrn2multithread, devs!)
     
  14. Offline

    xeology

    It never unlocks and console freezes.
    [MERGETIME="1300114832"][/MERGETIME]
    More like lrn2errorhandle devs . . . . then these problems could be solved. There should be and error handler for every single loop/control structure and function. This way when something fails, the handler catches the issue and reports it as such. But I am assuming when people ported over to a language that does everything but wipe your ass for you, real skill just faded away into a sea of lazy dependance on integrated generic methods that are NOT proven in every circumstance. Yeah garbage collection is great, when you programmed it and told it how to function and what to target, that = efficiency, using a generic method = lazy and unstable in large scales. Multithreading is great, when you determine how it is suppose to function and what is to be threaded and how, this = efficiency, allowing a integrated manager to make these decisions for you, this = lazy.
     
  15. Offline

    nickguletskii

    No no no, these are not things caused by errors. This is caused by single-threading. Bukkit waits until the task finishes instead of doing its job parallel.
     
  16. Offline

    xeology

    Oh I am sure that adds to the issue with lag allot but in terms of infinite loops and such, well I am sure that adds to it as well :p error handling would really tell though.
     
  17. Offline

    nickguletskii

    I would say that no plugin is supposed to do something long in one thread... But hey, even WorldEdit is singlethreaded (can anyone confirm that?)...
     
  18. Offline

    Dodecha

    Hey xeology: Is CB440 still working for you? i tried it aswell, seems fine until i warp, the server and client start to lag horribly, but doesnt always crash.
    It seems that the 440 build is slightly better at handling whatever problem is occuring, close but no cigar for me.
     
  19. Offline

    dak393

    If hes running 522 I would suggest doing the opposite and upgrading to 531. I had error-less crashes before and when I updated it fixed the problem.
     
  20. Offline

    Lame One

    While I'm not an expert, and, at the moment, don't have access to the list of plugins that we are using, I hope that I am able to help.

    I have the same issue as stated previously (working fine, no problem, then, slowly, a build up of overload messages, before it just dies.) Now, we are running on an 8 (or possibly 6) core serverbox, with 2 gigs of ram dedicated to the server. We have a number of other servers running (3 CoD 4 servers, for instance) and none of them are having any problems whatsoever, meaning that internet is most likely not a problem. This is supported by the fact that once the server goes down, it stays down until it is reset.

    Now, the above has pretty much been determined already, but I hope to bring a bit more information to light. While we had been adding plugins fairly constantly, I distinctly remember that nothing was done for a few days between it working and it not. In other words, there was no single thing that was done to the server (that I know of) that could have possibly caused an error. All plugins, settings, etc. had already been in place, and a few days later, the server suddenly begins to crash. At times, it will go for days, but a few times, it shut down within 15 minutes. Also, the server overload messages do not always signal a crash, and I notice that reloading plugins (typing reload into the console) brings up one or zero overloads. This, however, is perfectly normal (too many plugins to deal with).

    Now, I have a theory, but I'm not sure about it's validity. But we all seem to be looking at the second part of the message. So I looked again...

    [WARNING] Can't keep up! Did the system time change, or is the server overloaded?

    In other words, I was looking at the time change. By now you've probably figured out what I'm aiming at. Daylight savings time could be having some influence on bukkit. While I am completely guessing, I figured that it could be right. It seems unreasonable, but certainly not impossible. At worst, I look like an idiot but supply some information in the second and third paragraphs. So... What do you all think?

    EDIT: Also, lul at xeology:

     
  21. Offline

    TnT

    @Lame One
    Well, the server time going out of sync would cause that message - not a stupid statement. This problem was occurring before DST hit this year, so I doubt its DST related, but it could be that his time on his server is getting so out of sync that its causing his problems. I only doubt it because I assume the system time has been checked already, but its worth a double check by the OP.
     
  22. Offline

    Lame One

    I checked when the post was appearing. 3-12 would have been the day before daylight savings time. Although I don't know how time is handled, if his server is based in another country that participates in the tradition of moving clocks forward an hour, it could very well have been past 2 a.m. where his server was based.

    UPDATE: Just went in and changed the time back an hour (hoped that it would fix it) and no dice. Any more potential solutions?

    EDIT by Moderator: merged posts, please use the edit button instead of double posting.
     
    Last edited by a moderator: May 11, 2016
  23. Offline

    xeology

    Well to the one guy, yes 440 has resolved my issues but 440 is a lag whore.

    This has happened ever since I went 522, happens with 531 as well and then happened with 531. The issue is isolated to newer builds and IN bukkit itself.

    I BELIEVE it may be caused by a broken hook or a broken method that plugins use however due to the randomness and inability to replicate the issue or by the fact that MULTIPLE plugins may be causing this it is impossible to find the issue, believe me I have tried.

    An observation is that the system is looping, it will not take console commands yet some plugins work. For instance Dynmap is working but it will keep showing the time looping. So it is less of a crash as much as it is a constant loop in bukkit itself, because if it were a plugin I am sure the plugin would end up failing and shoot an error, which it does not. Make sense?

    And the issue is not related to server time lol, I checked.

    And as a side note to anyone using my PHP plugin MUkkit please patch to v0.1b. BIG security issue I resolved ASAP.
     
  24. Offline

    Sevenos

    1) Well, one more on this list. I got this "server freeze" the 5. time now, including rollbacks due to hard stop the server (Ctrl+C). It just stops responding to console commands and give players a timeout. However, it still seems to run somehow as my income timers etc are still working and printing out in server console. This happens about every 2 days to me, but totally random in terms of server run time (but never before 2 hours of run time I think). Different from you, I don't get "can't keep up" messages before this happens, but my worlds are running on RAMdisk. So it may does alot with the world and gets you that messages, but my RAMdisk is fast enough to don't care about that until it really freezes.

    2) Other problem I have, when someone builds on a chunk, teleports away and I do a save-all, that chunk sometimes get resetted.

    3) And one more: The world saving was not working 3 times for me. There was NO errors, I did save-all and it responded as it should and said it completed (however, it was instant instead of taking 2-3 seconds), but after a restart I saw that NOTHING was saved the whole time the server was running.

    As far as I remember the first two happened with 493-531, the 3. only before (440). There seems to be serious bugs with the chunk handling.

    Things to think about:
    - Which Java are you running? I use the latest Java 7, but with minimal arguments (-server -Xms4096M -Xmx4096M -jar craftbukkit.jar nogui) tried arround alot with different settings, but nothing worked better.
    - Do you use multiworld and if yes, any plugin for that? I use MultiVerse with 3 worlds (350, 90 and 30mb roughly)

    I've created a ticket on the bug tracker, rate it if you have this problem and support with additional information if you have any: http://leaky.bukkit.org/issues/575

    EDIT by Moderator: merged posts, please use the edit button instead of double posting.
     
    Last edited by a moderator: May 11, 2016
  25. Offline

    EvilSeph

    Seriously? What are you, 12? Hm, no, it says 22 on your profile but you clearly aren't acting like it. Do you have any experience dealing with bugs? If it were so easy to catch each and every single error, bugs wouldn't exist for very long within software. But, low and behold, look around you and you'll see that's not the case.

    Your attitude is pure poison and I, for one, don't even want to be writing this post to address you. In fact, I don't want ANYTHING to do with you thanks to the lack of respect you've shown anyone.

    Has it ever occurred to you that we can't fix something we can't reproduce? Nope? It really seems like you have absolutely no experience with development and its process. For the record - and we've said this many times - we cannot simply keep up with all the discussions on the forums. If there's a bug (and you can tell us how to reproduce it - even better!), post it on leaky.

    Honest programming is dead. Honestly.

    Let's forget the fact I've been spending the past 3 weeks looking into issues, hopping into servers and trying to find ways to reliably reproduce them.

    Let's forget the fact that other software is just as buggy and subject to the same treatment Bukkit is right now.

    Honestly, if you're so much more skilled than us, maybe you should be writing your own mod? The truth of the matter is, error handling and error catching isn't so black and white as you put it.
     
  26. Offline

    xeology

    For starters you took this as it was targeted towards the bukkit team. Infact it was not it was targeted towards every developer who has a plugin using a method that is causing these crashes yet has their plugin distinctly labled 531, which IS dishonest and scummy.

    Next as for error handling, I have seen plenty of plugins that issue some sort of error or another with the plugins name itself inside the error where as these, if caused by a plugin is unmentioned entirely and there are what seems to be 0 checks/handling at all, it just goes, and goes and goes until it crashes with no significant sign. I wasn't saying error handle every little piece but to completely ignore ALL handling altogether in this fashion right here causing this issue is just crap.

    The simple fact is that not one developer popped in and said, hey yeah my plugin users have been experiencing crashes. Not one person has said, hey there's no way to monitor current activity or point the way to the proper method of doing so and not one person has even said and/or verified that this issue even EXISTS besides people with the issue of course.

    Plain and simple if someone knew there was an issue, they should have at least SAID there was an issue so we all could have stopped using it and moved on. Not pretend like it doesn't exist. Which again falls on the shoulders of the people using the method/hook or w/e is crashing the software, because this is just going backwards instead of forward.

    Honestly if you want to take this as towards you and your team then do so, I am not about to sit here and argue something I know wasn't. As for the dishonesty opinion, that is my opinion and I hold it tightly, someone could have at least said WHAT was causing it, not WHY or HOW it was causing it.

    But since I am such an ass apparently for expecting standard moral responsibility then I figure you don't at least appreciate the fact that this group effort alone has determined and compiled quite a bit of information narrowing down the issue quite a bit, and have resolved it quite a bit for many of us so far. Which is fine.
     
  27. Offline

    TnT

    Probably because not very many devs browse these forums consistently.
     
  28. Offline

    xeology

    Even so they should know that their plugin is causing an issue or making it worse during testing it and it should have been made known.

    For instance someone came up to me with a security issue, I dropped everything and fixed it asap as per my responsibility as the dev of that particular piece, I would not have said anything else was the cause or ignored the person . . . that's just wrong.

    But anyway, getting off this drama crap and back to the REAL issue here as it has been for awhile now without the need for it to have gone back to the drama crap, it shouldnt be hard to narrow down if someone has the resources to run a few copies of bukkit with the similar plugins, with 531 and testing 2 at a time for a few days seeing which ones crash and which ones do not.

    Then take the causes found, dissect the hooks and methods used and see which ones were changed in bukkit from 440 to 493. Should not be all THAT tedious at that point.

    After which since it does seem to be an infinite loop (considering that it does not take commands as if it is trying to still execute a certain method yet not actually crashed) just add a counter in ever loop in bukkit that falls under the category of affected by what was found above and have it echo out to the console which loop it is and what the counter is at.

    When it crashes, which the goal would to be to make it crash using one of the affected plugins found. You should see a constant echo of what ever loop is not breaking where it should.

    Here is a break down of why I see it as an infinite loop,
    • The server seems to still run and most plugins still functioning and ticking correctly
    • People are able to connect, just get timed out before downloading terrain
    • I am ASSUMING that logins have to be queued in some sort of action queue considering bukkit (or minecraft itself) from what I hear does not multitask many things if anything.
    • This being said what ever method is being run over and over again is stopping the execution of the logins actions leading up to people being disconnected with timeout errors.
    • This would explain the lack of ability for console commands as well.
    • This would aslo explain the lack of an error at this point because that particular loop would not have had some failsafe measure in it to detect an extremely abnormally high counter meaning either A) the dev didn't see a need because it should not have caused an issue, b) A plugin is interacting with a part it should not be affecting.
    This is my idea and method for debugging this once and for all but I do not have the java skill nor the resources free to do this. Even though it should not be all that difficult just take awhile and and be tedious.

    What do you think?
     
  29. Offline

    EvilSeph

    It's not that easy and, just like you, people don't have the time to do this.

    Did it ever occur to you that just maybe the developers aren't actually aware of the issue? Bug reports by the users aren't always the most helpful things.
     
  30. Offline

    NotYetRated

    I agree with you, though I feel that we can effectively knock out your initial step of having someone run multiple instances of a server until we pinpoint the plugin in question. I think if we have enough people here posting which plugins their affected server uses, we should at least be able to narrow it down significantly.
     
Thread Status:
Not open for further replies.

Share This Page