Tutorial [Tutorial] Preventing Tab-induced Yaml resets

Discussion in 'Resources' started by AoH_Ruthless, Aug 2, 2014.

Thread Status:
Not open for further replies.
  1. Offline

    AoH_Ruthless

    As many of you may know, sometimes the users of your plugins place tabs in config.yml's or any other yaml file, despite any warnings you give. It happens. And then, the config.yml resets, which nobody likes. So, this tutorial provides a method of checking if a tab was found while reading a yaml file. While this can be applied to any yaml file, I will be working directly with the Bukkit API config.yml . Note that all code is used in my main class (the one that extends JavaPlugin) so change this accordingly.

    There are two different viable approaches to this in my opinion: BufferedReaders and Scanners.

    BufferedReaders, well, read the file without any special parsing. They are:
    • Thread-safe (by using synchronization)
    • More efficient than a regular Reader (for reasons I will not divulge into)
    • Better for reading/logging files without parsing anything.
    • Adequate for this tutorial.
    Scanners offer different perks:
    • They can do everything a BufferedReader can as efficiently, and additionally can parse the underlying stream for primitives and Strings without any hassle.
    • They are not thread-safe
    • They are better if you are trying to parse a YML file or an XML file, per-se. They will suit better if you are looking for user-input, which we are not in this tutorial.
    The following methods are stand-alone. In v2.0 of the Gist, I offer a way to override default implementation of config.yml, overriding getConfig(), reloadConfig(), and saveConfig(). You can check out the updated version here, and edit the files to whatever file you want. If you change the file, make sure you remove the @Override annotation.
    Using the BufferedReader to detect tabs:
    Code:java
    1. private void setupWithBufferedReader() {
    2. // Define the reader as null because we will need to close it in the
    3. // 'finally' portion of the try/catch block. Same with Scanner.
    4. BufferedReader reader = null;
    5. try {
    6. // Locate the config file. Change this accordingly.
    7. File file = new File(getDataFolder(), "config.yml");
    8. // Declare the reader.
    9. reader = new BufferedReader(new FileReader(file));
    10.  
    11. // An internal way to check what line we're on.
    12. int row = 1;
    13. // Define the line we're on.
    14. String line = "";
    15.  
    16. // While there is a line to read ...
    17. while ((line = reader.readLine()) != null) {
    18. // \t refers to tabs. When using indexOf, if no specified string
    19. // was found, the index is -1. So, if the index is 0 or higher,
    20. // a tab was found.
    21. if (line.indexOf("\t") > -1) {
    22. // Tell the user in console that a tab was found.
    23. throw new IllegalArgumentException("Tab found on row "
    24. + row + "!");
    25. }
    26. // Increment the row.
    27. row++;
    28. }
    29. config.load(file);
    30. } catch (FileNotFoundException e) {
    31. // handle error that should never happen.
    32. } catch (IOException e) {
    33. // handle failed buffered reader.
    34. } catch (InvalidConfigurationException e) {
    35. // handle Snakeyaml error.
    36. } finally {
    37. // After catching, close the reader to avoid memory leaks.
    38. if (reader != null) {
    39. try {
    40. // Close the reader
    41. reader.close();
    42. } catch (IOException e) {
    43. // Handle unsuccessful attempt to close reader
    44. }
    45. }
    46. }
    47. }



    Using the Scanner to detect tabs:
    Code:java
    1. private void setupWithScanner() {
    2. Scanner scan = null;
    3. try {
    4. // Define scanner
    5. scan = new Scanner(new File(getDataFolder(), "config.yml"));
    6.  
    7. int row = 0;
    8. String line = "";
    9.  
    10. // While there is stuff to read
    11. while (scan.hasNextLine()) {
    12. line = scan.nextLine();
    13. row++;
    14.  
    15. // If a tab is found ... \t = tab
    16. if (line.indexOf("\t") != -1) {
    17. StringBuilder sb = new StringBuilder();
    18. sb.append("Tab found in config-file on line # ")
    19. .append(row).append("!");
    20. // Throw exception -> unsuccessful reload -> file cannot
    21. // reset.
    22. throw new IllegalArgumentException(sb.toString());
    23. }
    24. }
    25. config.load(file);
    26. } catch (FileNotFoundException ex) {
    27. // handle error that shouldn't even happen
    28. } catch (IOException e) {
    29. // handle failed loading error
    30. } catch (InvalidConfigurationException e) {
    31. // handle snakeyaml error
    32. } finally {
    33. // Close the scanner to avoid memory leaks.
    34. if (scan != null) {
    35. scan.close();
    36. }
    37. }
    38. }


    Conclusion:
    BufferedReaders are adequate enough for reading files; however, scanners perform the same task. It is a lot better to use BufferedReaders in the event of multiple threads, for again, scanners are not thread-safe, while BufferedReaders are. However, in the event that there is one-thread, I recommend using the Scanner. Just personal opinion.

    With multiple edits of this thread in the future (and horrible code-syntax shift), you may want to read the class on the github gist.

    v2.0 - https://gist.github.com/AoHRuthless/76baee8f558834f75607
    v1.0 - https://gist.github.com/AoHRuthless/76baee8f558834f75607/96eddc2de1e588d68a728197e5b30feab8d83af6

    If you find any issues with the Scanner or BufferedReader method, please let me know.

    Reserved.

    -- Edit: Note that you will have to handle reloading (JavaPlugin#reloadConfig() for default config.yml) of the yaml file.

    EDIT by Moderator: merged posts, please use the edit button instead of double posting.
     
    Last edited by a moderator: Jun 9, 2016
  2. Offline

    PandazNWafflez

    You could go further with this and replace all the tabs with spaces :p
     
  3. Offline

    TigerHix

    Yep the Scanner is much more laconic. Very unexpected resource, thx for sharing this.
     
    AoH_Ruthless likes this.
  4. Offline

    AoH_Ruthless

    TigerHix
    Thank you! :)

    PandazNWafflez
    You could; I will think about implementing it. Personally I throw an exception when a tab is found to help let the user know about the error of their ways, so it doesn't happen again with plugins that don't implement a feature of this nature. But I definitely see your perspective.

    EDIT by Moderator: merged posts, please use the edit button instead of double posting.
     
    Last edited by a moderator: Jun 9, 2016
    TigerHix likes this.
  5. Offline

    xTrollxDudex

    AoH_Ruthless
    I believe that buffering unnecessarily makes a large performance impact, and using an unbuffered LineReader might be faster in a file with a large amount of content.

    I have performed a small test a while ago, however, this test was not flawless, so I may be wrong about this.
     
  6. Offline

    PandazNWafflez

    AoH_Ruthless Yeah it would probably be a lot of effort for very little gain to be honest.
     
    AoH_Ruthless likes this.
  7. Offline

    AoH_Ruthless

    xTrollxDudex
    To address this question we first have to discuss what Buffering really is (not specifically for you, as you probably are well aware of what it is, but for anyone with a similar question)

    Most streams are unbuffered I/O, like Byte Streams and Character streams. Unbuffered I/O utilizes the underlying OS to handle each read/write request. This can become extremely inefficient, which is why Buffering was implemented. Unbuffered requests are expensive relatively because they often trigger network activity or other expensive operations. While BufferedReaders also read large chunks of data from the underlying stream, the behavior of the BufferedReader allows it to read huge data chunks rather than many small data chunks, which greatly improves performance. I should have mentioned this in the main post, but you have to wrap your BufferedReader around a reader like a FileReader or InputStreamReader otherwise performance becomes terrible (disk network is accessed for every single character).

    I have never heard of a LineReader being more efficient than a BufferedReader, although they perform the same functionality in essence. (LineReaders however are indeed advantageous for they perform BufferedReader#readLine() for all Readable objects, not just Reader instances... But, that isn't advantageous per se when reading the contents of a YML file). I was trying to google for an answer to this question but couldn't find anything. I will keep looking and get back to you on this.

    tl;dr: BufferedReaders are very efficient (as far as synchronization goes) if used properly. Although LineReaders are just as efficient and perform the same basic tasks, I have never witnessed LineReaders being superior to BufferedReaders in large files. I will look into this further.
     
  8. Offline

    raGan.

    That might still not work as there's no fixed tab to space conversion.
    If tabs were mixed with spaces, (user only edited the file) it could be hard or even impossible to determine right indentation.
     
  9. Offline

    PandazNWafflez

    raGan. While it would be difficult, with a lot of effort to check indentation for each set of parallel elements it is plausible.
     
  10. Offline

    raGan.

    PandazNWafflez
    There could be some corner cases where you couldn't possibly know if the element indented with a tab shuld be nested into previous one or not.
    Code:
    **element1:
    ****element2: value
    =element3: value
    **element4: value
    * - space, = - tab
    In situation like this, you can't know where to put element 3 unless you know exactly how many spaces tab represents. Unfortunately representation depends on editor.
     
  11. Offline

    97WaterPolo

    raGan.
    True, but don't most editors default to 4?
     
  12. Offline

    PandazNWafflez

    raGan. In that situation you could just indent it the same as element1 then indent element4 to the same as element2.
     
  13. Offline

    raGan.

    I don't know about that, but even if it was true, it would still not be safe to rely on that.
     
  14. Offline

    AoH_Ruthless

    97WaterPolo PandazNWafflez
    No, I don't believe most editors have a default. YAML hierarchy allows between 2-9 spaces for each hierarchical difference, and it doesn't even have to be consistent across the whole file. I believe raGan. is right in saying that the effort involved is too great for the reward. As I have said, personally I prefer telling the plugin user they screwed up so they know for every other plugin they use that they must never use tabs.
     
    97WaterPolo likes this.
  15. Offline

    PandazNWafflez

    AoH_Ruthless
    That's actually a great point - changing it in the plugin would make them think it's OK to use tabs and then they'd get errors in other plugins which don't do the same thing.
     
  16. Offline

    AoH_Ruthless

    v2.0 is out. In version 2.0, I show how to override default config.yml implementation by overriding getConfig(), saveConfig() and reloadConfig(). The scanner and bufferedreader is utilised during reloading of config.

    If using your own yaml file, you can just remove the @Override tags and change some of the basic code around.

    v2.0 Gist.
    v1.0 Gist.

    I will update the main post accordingly to reflect this.
     
  17. Offline

    Europia79

    That's something I've always been wondering... Because I see some plugins indent 2 spaces, while some plugins indent 4 spaces... So, the two indentations represent the exact same thing ?
     
  18. Offline

    Necrodoom

Thread Status:
Not open for further replies.

Share This Page