Category: regex

Parse key value separated by ;

Parse key value separated by ;

Introduction

Let us use as a sample the following string (the 3 dots mean the same pattern to the infinity):

key0=value0;key1=value1;key2=value2;key3=value3;...

While doing some code review the previous string was parsed using the split method from String class using the “;” character, after that operation another operation was performed on the token “key=value” string (yes, by the “=” sign) and checking that the key was present with an optional value.

I will not type that code here. The code we are interested here is how to parse this pattern using a unique Regular Expression, in Java that regular expression (and other programming language) is as follows:

((\w+)=(\w*)(?=;)?)

That would read (without the back references):

  • Match a word character between one and unlimited times
  • Followed by the “=” character
  • Followed by another word character between zero and unlimited times (optional)
  • This is the interesting part. We will use for this regex a positive lookahead (see my other post related to this topic). This means we are interested to match “;” without making “;” part of the match, it is only there to assert that the match of type “key=value” is true.
  • The last “?” at the end of the regex mean that it is optional, this will help to match a string that does not have a “;” at the end of the string, e.g.
key1=value1;key2=vaalue2;key3=vaalue4;key8=vaalue9;key843=vaalue854;kdkkd=jfdjfjsd;fjdsjk=jfdsj

Code

As you can see the implementation is straight, if the string matches each captured token is in the group 2 and 3.

public final class KeyValueParser {

private static final ThreadLocal<Pattern> PATTERN_THREAD_LOCAL = ThreadLocal.withInitial(() -> Pattern.compile("((\\w+)=(\\w*)(?=;)?)"));

/**
 * Example on how to use the PATTERN_THREAD_LOCAL declared above.
 *
 * @param paramsAttrValue a String in the form: key0=value0;key1=value1;key2=value2;keyN=valueN......
 *
 * @return a List with Something key value pair
 */
 public static List<Something> getRequestParams(String paramsAttrValue) {
 List<Something> somethings = new ArrayList<>();
 Matcher regexMatcher = PATTERN_THREAD_LOCAL.get().matcher(paramsAttrValue);
 while (regexMatcher.find()) {
    final String key = regexMatcher.group(2) 
    final String value = regexMatcher.group(3);
    //Do something with key and value
    //Probably Something is a holder for the key and the value.
 }
  return somethings;
}



/**
 * Forbidden to create instances of this class.
 */
 private KeyValueParser() {
   //Forbidden to create instances of this class.
 }
}

Note

ThreadLocal for the Pattern seems not needed as Pattern is already thread safe.

Advertisements
Regular expressions (positive lookahead and negative lookahead)

Regular expressions (positive lookahead and negative lookahead)

Positive lookahead

“Assert that the regex below can be matched, starting at this position”.

Let’s go with this little example:

Regex:
\s+\w+(?=\.)

This would mean, match a space between one and many times followed by a word character between one and many times only if there is a dot following the previous match:

Example:

andy mypass ok orange..see what is going on

The matched text here would be orange (Because there was a dot after “orange”, ah yes, and many spaces before).

Negative lookahead

“Assert that it is impossible to match the regex below starting at this position”.

The regex changes to:
\s+\w+(?!\.)

Do you see the difference?. this should read now:

Match a space between one and many times followed by a word character between one and many times only if the following character is not a dot, so what is matched now using our previous.

Example:

mypass (Because after "mypass" was a space)
ok (Same reason using the word "ok")
orang (Because after "orang" there is an e)
see (Because after "see" there is a space)

…. and so on