Lists vs Good old Arrays.

Discussion in 'Resources' started by oyasunadev, Oct 10, 2011.

Thread Status:
Not open for further replies.
  1. Offline

    oyasunadev

    I just wanted to start off saying that Lists are the better choice, and this is why:

    Although a List is a type of Array, it is still better because:
    1) Easier to use.
    2) You do not need to initialize the length.
    3) Allocates RAM much better.
    4) Probably much more reasons I'm forgetting.

    An example of an Array:
    Code:
    String[] myarray = new String[length];
    //You must initialize, meaning you will need a integer for the length, unless you want to take up empty RAM by setting it higher than it needs to be.
    
    System.out.println(myarray[0]);
    An example of a List:
    Code:
    List<String> mylist = new ArrayList<String>(); //ArrayList helps make it growable.
    
    System.out.println(mylist.get(0));
    After comparing, you can see that there are advantages to using Lists, so stop using Array's!
     
  2. Offline

    garbagemule

    There's a lot of incorrect information in this post.

    List is not a "type of array". List is an interface, and out of the two main implementing classes, only ArrayList is a "type of array", because it is backed by an array. Whether to use a list or an array is not always clear, but I assure you that there is no reason to "ditch arrays" and just use ArrayLists whenever you need an indexed collection.

    You've completely left out the fact that ArrayLists have an initial capacity of 10 elements, which means the array backing the ArrayList is size 10 (same as String[] array = new String[10]). This means that if you're not going to add exactly 10 elements to your ArrayList, you are wasting memory, because you have allocated space for exactly 10 elements. Thus, the memory allocation argument of yours is null and void. It should also be mentioned that if you ever add more than 10 elements to an ArrayList with the default capacity of 10, the entire array gets copied over to a new array of a larger size. Consider a list with 1,000,000 elements and a capacity of 1,000,001. If you add two more elements, that's a lot of copying that needs to be done. For smaller collections, the performance impact is negligible, but it is definitely worth mentioning - especially considering the huge amount of beginners here. The same thing happens every time you add or remove an element.

    The other main implementing List class is the LinkedList, which is slightly more complex. LinkedLists are not backed by arrays, but by pointers (I forget if Java uses double or single pointers). What this means is that the list's first element (index 0) has a pointer forward to the next element (index 1) in the list, and this element has a pointer to the next element (index 2), and so on. Thus, depending on the memory needed to represent an element of a list, LinkedLists are usually minimized in regards to memory allocation, compared to ArrayLists, which always need to be "ahead".

    What's the big difference in performance between the two? Well, ArrayLists, being backed by arrays, have O(1) lookup, meaning the .get() method is fast for ArrayLists. For LinkedLists, this is O(n), because you need to follow the elements' pointers to get to the element you need. For insertions and deletions, however, LinkedList is faster than ArrayLists.

    Which one should you use, then? It depends. If the size of your collection can be predetermined, and you're only going to populate the list once and rarely insert/delete from it, an ArrayList is probably the way to go. If you're going to be adding and removing a lot of elements, you won't be doing a lot of lookups, and you don't know the possible size of the list, then a LinkedList is the better option. In general, if you're looping and adding a bunch of things, and then iterating over the entire collection afterwards, use LinkedLists. If you're adding things and randomly looking up values, use ArrayLists.

    So what about arrays vs. ArrayLists? Well, arrays are slightly faster than ArrayLists, because of the inherent method calls to get() and set(). If you don't care about ordering, and just need something fast, simple and with a size that needs to be fixed (even though dynamically setting the size is also entirely possible), just use arrays. If you need the fast lookups but a flexible capacity, and more importantly sorting, go ahead and use an ArrayList.
     
    Maulss, ks07, Kierrow and 12 others like this.
  3. Offline

    SpikeMeister

    For those unfamiliar with asymptotic notation, O(1) means that the lookup will happen in constant time. In other words, the lookup will always take the same small amount of time, no matter the size of the list. O(n) means that the lookup will happen in linear time. In other words, the lookup will grow at up to the same rate that n grows, and as your list gets longer it will take longer and longer to do lookups.
     
    garbagemule likes this.
  4. Offline

    oyasunadev

    Sorry about this post everyone. My son was on my account, he is new to Java programming, and misunderstood what I had told him. Sorry for the inconvenience.
     
  5. Offline

    Lolmewn

    You let your son on your account :O
    I'd never do that.
    ever.
    again ;)
     
    Adamki11s likes this.
  6. Offline

    oyasunadev

    Well, I don't let him have an E-Mail yet...
     
  7. Offline

    Daniel Heppner

    He probably has one behind your back.
     
  8. Not to mention that a Set is faster than a List ;)
     
  9. Offline

    Brain

    Everybody knows the ConcurrentLinkedTreeHashMapListArray is the ultimate data structure.
     
    zhuowei likes this.
  10. Offline

    garbagemule

    They are completely different data types. A Set is useless if you need the structure of a List (indexed elements) or if you need to allow null values or duplicates. If you're iterating a sorted (or structured) collection of elements, and you need that same order/structure "going out", a Set is also useless, because iteration order is undefined. Sets and Lists are very, very different data structures, and in a lot of cases, it is absolutely necessary to use one over the other (or at least using the other over the one means a loss in performance one way or the other).
     
  11. Yea. I meant that generally, you can use a Set and won't need a List, UNLESS you need the indexed elements...
     
  12. Offline

    bergerkiller

    How I distinct collection types:
    LinkedList: Queue and others
    when you don't perform index operations. Index operations are very slow on linked lists. They are best for iterating over a list or to quickly remove and add new elements. (like in the queue)

    ArrayList
    when you need to perform index operations, but also need an iterator. It is very useful when storing fixed data you hardly ever add stuff to. It can also be used if you have an estimate of the size it will have to contain, and it can be used to quickly add new data to it without having a huge RAM usage.

    HashSet
    When using contains on objects that use hashcodes. Very useful when you only want a single instance in the collection. Never use it for temporary storage.

    HashMap
    When storing fixed data using a key and value. Never use it for temporary storage.

    Weak-
    Useful when attaching data to things like chunks and players, but if it's possible to do without, do without. Only useful if the key can be cleared by the garbage collector, so not useful for things like Strings and Integers.

    Concurrent-
    If you need something like this, then obviously something is wrong in your code. Synchronize your code mister!

    TreeMap
    No idea what it does. Never used it before. Maybe to have multiple values per key?

    Array
    When you always have the same amount of data, or know what size the data is you will calculate, or when you need something to stay of a fixed size because of concurrency. It is useful when used in large numbers, for example, to store Chunk data like MC does.
     
    zhuowei likes this.
  13. Offline

    Sleaker

    @bergerkiller - TreeMaps are for when you need your map ordered by the Key value useful when you need sorted information that's based on a different lookup than the actual data that the key is linked to.
     
    desht likes this.
  14. Offline

    desht

    @bergerkiller
    TreeMaps are useful if you want key ordering - unlike HashMaps, items can be iterated/retrieved in the order in which they are added. Performance isn't quite as good as for a HashMap - O(log n) versus linear - so only use a TreeMap where ordering is important.
     
  15. Offline

    bergerkiller

    @Sleaker like if you, for example, wanted to store keys that implement comparable (for example dates) with a value. So basically a hashmap, but instead of ordering on the hashcode, it orders them based on another of the same instance?
     
  16. Offline

    Sleaker

    @bergerkiller - kind of. TreeMap's are ordered by the key's natural ordering, so using a comparator for the key on a treemap is highly suggested, hashmaps function a bit different than this as they use bucketing and such to handle where the actual data is stored in relation to the other objects, and using .get or .contains on a HashMap is O(1) vs a TreeMap which is log(n) -

    If anyone is unfamiliar with log(n) - it's not as fast as hashing, but better than O(n) - as they generally use binary search, or balanced search methods to limit the number of items checked to find the item being looked for, which is much faster than simply iterating over all the elements.

    For the list of common notations you can find them here: http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions
     
  17. Offline

    garbagemule

    Edit: I took a bit longer on this post than I expected, so I apologize for any redundant information mentioned in posts before this one xD

    I think it is fair that people are trying to chip in about these data structures and how or when they can/should be used, but please, guys, don't just blurt out something you have no clue about. I'm not trying to be arrogant or harsh here, but I think it's very important that data structure discussions and posts are well-founded and true - not just intuition and notions.

    An ArrayList, like I said in my initial post, is backed by an array, and the "RAM usage" of an ArrayList is that of the backing array (and a few extra bits and bytes, however negligible they may be in large collections), meaning it is almost always "ahead", allocating more memory than actually required by the collection. Of the data structures mentioned in your post, the ArrayList is one of the most memory intensive, so the sentence I quoted doesn't really make a lot of sense; care to elaborate?

    I don't understand the "Never use it for temporary storage"-part. I consider two possible interpretations: 1) Don't use existing Sets/Maps to put temporary garbage into: You shouldn't do that to any existing or irrelevant collection anyway; 2) Don't make a new Set/Map to store temporary data that is then later discarded: Why not? Clearing or nulling the Set/Map, or when using them as purely local method-variables, they pose no memory issues.

    The "fixed data" part in your HashMap-section is also unclear to me. There's nothing wrong with using HashMaps for dynamic data, meaning you can easily put and remove as much data as you please, and it doesn't have to be "fixed" in the sense that it can't ever change (if that's what you meant).

    WRONG! I have no idea where you got this insane idea that the concurrent collections shouldn't be used, but they are in no way unnecessary, and in no way does synchronizing your methods amount to the same thing. When you synchronize your methods, you acquire object locks, meaning any other thread that ever tries to access or mutate the same object will block. This is not the case for the concurrent collections. When you use ConcurrentHashMap, for instance, it is entirely possible to read from the Map while updating it from another thread. Note that with many threads accessing or mutating the collection often, this poses a massive performance-gain over using object locks.

    Please don't just make a wild guess. Any class that implements the Map interface must adhere to the specifications and requirements of a Map. No Map allows multiple values per key, so the TreeMap doesn't either.

    First, a HashMap is backed by a hash table, i.e. an array (typically with a prime length), which contains (something similar to) LinkedLists of elements. A "good" hash function guarantees very fast lookups, because essentially every element is the only element in its list. A "bad" hash function makes a HashMap as slow as a LinkedList.

    Now, a TreeMap is backed by a red-black tree, meaning its entire data structure is a tree. Each node in the tree has two children, which are either another node (which can also be a leaf, and thus an actual key), or null. Red-black trees are fast, and provide O(log n) lookup/search/get, insertion and deletion. TreeMaps are extremely useful in that they are sorted by key (the equals() method), have well-defined performance (over hashing, which depends on the hash function, the collection, and the hashCode() method of the elements), and waste no memory compared to the array-based data structures, which must allocate more memory than required to be efficient.

    While concurrency poses certain issues for some of the other collections, arrays are not thread-safe either, so they do suffer some of the same issues. The main difference is that concurrency errors with arrays don't throw exceptions (in general, depends on the kind error, of course), but with the other data structures, they often do.


    Edit 2: I may appear to rip a little on hashing in this post, but just to clarify: if the hashCode() methods are well-written, most of the Hash-collections in the Java library work really well, and provide O(1) lookup. Often, and especially with these Bukkit plugins, the Hash-collections are preferable, because they provide fast lookups on these relatively tiny collections, which leads to better performance and less system bogging. Tree-collections have their uses, though, and should definitely not be ignored :)
     
  18. Offline

    bergerkiller

    @garbagemule thanks for the info. With the concurrent I mean the following.

    Take for example, this:
    Code:
        private static HashMap<String, PlayerChunkBuffer> buffers = new HashMap<String, PlayerChunkBuffer>();
     
        public static PlayerChunkBuffer getBuffer(Player player) {
            synchronized (buffers) {
                PlayerChunkBuffer loader = buffers.get(player.getName());
                if (loader == null) {
                    loader = new PlayerChunkBuffer(player);
                    buffers.put(player.getName(), loader);
                }
                return loader;
            }
        }
        public static void remove(Player player) {
            synchronized (buffers) {
                buffers.remove(player.getName());
            }
        }
    I could make 'buffers' a concurrent hashmap, at least, only if it directly used. As right here, under 'getBuffer', I make sure no one can interfere. Otherwise it is possible that someone puts the object in between the get and put, this way causing some weird issues. I hardly ever have a hashmap publicly visible, since it is used internally. If you perform multiple operations on the same hashmap in a single method, then this concurrent- system doesn't work: it will prevent the error, but not the result. This way, if you end up using concurrent hashmaps, you have to think of yourself: Shouldn't I add synchronization myself? If it only consists of get, put and contains which are done once in each method, then concurrent is safe to use.

    I do have one major question, maybe you can answer it.

    =====================================

    Inside Minecraft, chunks are stored in a hashmap and in a list. It uses a 'long' hashmap to get the chunk at that specific position. This operation is done multiple times per tick.

    Would it be possible to do this without a hashmap? Say, you use neighbouring. You start at the main 'middle' chunk, and have neighbouring chunks. Using a simple pathfinding-like alghorithm, you get the chunk you need.

    Would this be better, equal or worse? It seems like a way better method than having unsafe chunk storage using hashmaps.
     
  19. Offline

    garbagemule

    Fair point! I was thinking entirely of unconditional mapping.

    When you say neighbouring, do you mean that every chunk has a north, east, west and south neighbour, i.e. basically a type of linked list implementation? In that case, it would definitely work; no doubt about it. I mean, depending on the specifications for the data structure, you could probably replace the HashMap with a bunch of different things. I don't know the inner workings of this chunk thing, and I'm not sure what you mean by "long HashMap" (small backing array?), so I can't say for sure. The neighbouring sounds like it would be very, very inefficient, and pose a bunch of locking issues, though.
     
  20. Offline

    bergerkiller

    @garbagemule Well, neighbouring all sides is a bit useless. Let's assume the following:
    Basically the same as having blocks in a chunk and chunks in a world, but instead you now sub-divide chunks as well. If I need chunk [12/54] I could use a simple bit-shift to get the index of this chunk in the main bunch, index in the bunch underneath that, underneath that, etc. This completely neglects the use of a hashmap and basically acts as a hashmap.

    This is what I mean. You basically have a fixed 4-length array in each bunch and can dynamically fill the bunches. This would even make it possible to have infinite world sizes, if done correctly. And, of course, having chunks above chunks: an Y-value!

    To get an idea:
    [​IMG]

    Also, the LongHashmap is basically a regular HashMap, but uses bit-shifts to 'try' to convert two ints to one int. This obviously causes issues at some point, once people start exploring very large areas on the map.
     
  21. Offline

    Sleaker

    @bergerkiller - as I just linked the Orders of common functions.. Trying to use a TreeSort or TreeMap to store chunks is going to be inherently slower than using a Hash. The LongHash is just an implementation that takes two ints and creates a Long key out of them. This will be inherently faster than trying to store chunks relatively based on other chunk data, or grouping them together.
     
  22. Offline

    bergerkiller

    @Sleaker then why is it called a 'LongHashmap', if the maximum 'working' amount is the size of two shorts? (-32000 to 32000 estimated)

    Seems weird to call it a LongHashmap if it doesn't use long-value storage in the end. (or is it using actual long keys? Seems hard to do based on memory efficiency)
     
  23. Offline

    Sleaker

    Bukkit team stuffs mostly. The actual maximum values stored is 5*256 in chunks - because the longhashmap implementation that bukkit uses only allows for 5 colliding hashes (which occur every 256 chunks)

    It's called a longhash because it uses a Long as the key (so it actually does use a long). Which is derived from 2 integers (the x,z of the chunk)
     
  24. Offline

    bergerkiller

    @Sleaker I do wonder what happens when two players are right at the correct spots...?
    Does it simply show the same chunk, cause some sort of weird endless loop or is it a parallel universe? (Jk)

    But still, I've seen quite a lot people that went up to x or z 4096....so it is quite a risk that has been taken...
     
  25. Offline

    Sleaker

    @bergerkiller - Hashing doesn't mean equality (as far as chunks are concerned). the LongHashMap allows for 5 colliding hashes -loaded- at the same time before it will cause issues. which means you'll need a map that's atleast one directional up to 6*4096

    I think you need to look more into what hash collisions mean.
     
  26. Offline

    bergerkiller

    @Sleaker I know that it reverts back to the equals method once it collides, guess I kinda forgot about them. :)
     
  27. Offline

    544nick101

    You should probably edit the post then ;)
     
Thread Status:
Not open for further replies.

Share This Page