Solved How to search a byte[] for a String

Discussion in 'Plugin Development' started by Saposhiente, Aug 21, 2013.

Thread Status:
Not open for further replies.
  1. Offline

    Saposhiente

    So I have an external library function that generates some bytecode,
    byte[] generateBytecode(String name, some other, constant arguments)
    , and I'm trying to get the bytecode on either side of the String so that I can replace the relatively expensive function with simply
    startBytes + string.toByteArray(Charset.forName("UTF8")) + endBytes (actually using System.arraycopy because you can't concatenate arrays but you get the point)
    I tried
    Code:java
    1. String searchString = "YgltwEOOEDPgNQrfssiDjSVJuTKgGZHjrZuDPbkurpigGquMrToHaWxebTukXoAU" //random string that won't appear naturally
    2. byte[] bytes = generateBytecode(searchString, ...);
    3. final byte[] searchBytes = searchString.getBytes(Charset.forName("UTF8")); //I do know that that is the correct charset.
    4. int foundPos = Collections.indexOfSubList(Arrays.asList(bytes), Arrays.asList(searchBytes));

    but foundPos comes out to -1 not found. However, the string clearly does reside within the bytes:
    Code:
    return new String(bytes); -> ����3�@YgltwEOOEDPgNQrfssiDjSVJuTKgGZHjrZuDPbkurpigGquMrToHaWxebTukXoAU���� <more binary stuff and strings>
    I'm reluctant to just convert to String and search that because the position might be slightly off and I might miss a byte or something of the sort. How can I find the start and end positions of this String within the byte array?
     
  2. Step 1: Check if what you are doing is actually necessary. Can you look inside the library function and see what it does and just replicate that?

    Assuming you need to do what you are doing, your problem lies within "Arrays.asList". It takes the varargs parameter with a generic type to determine the type of list it returns (so you could do Arrays.asList("a", "b", "c")).
    Usually, when you pass in a xxx[] array, the compiler uses that to fill the varargs array. However, because it is generic, it doesn't allow that with primitive arrays like byte[]. So you actually get a List<byte[]> with 1 entry each, which obviously aren't contained within each other.

    TL;DR: Don't use Arrays.asList with byte arrays (even if it would work, it would be terribly inefficient anyways because of boxing). I would just reimplement the "indexOfSubArray" algorithm myself, it's pretty simple:

    (untested, no idea if it does what it's supposed to do, test with dummy data first ;) )
    Code:
    byte[] source = ...;
    byte[] target = ...;
     
    candidate:
    for (int i = 0; i < source.length; i++) { // loop over source array
        for (int j = 0; j < target.length; j++) { // check if target matches source from element i
            if (target[j] != source[i + j])
                continue candidate; // if it doesn't match at some point, continue to next element in source array
        }
       
        // if the match was successful, return index
        return i;
    }
     
    // nothing found
    return -1;
     
  3. Offline

    Saposhiente

    The external library function is for generating arbitrary classes, and is relatively complex and expensive; I just want to generate the same class repeatedly with a bunch of different names.
    I rewrote your code to not use return and it worked. (Though I did need to adjust it to replace the last byte in the startBytes with the length of the string, because that's apparently where the length is stored).
     
Thread Status:
Not open for further replies.

Share This Page