How to parse a PURL
How to parse a PURL string into its components
Parsing a PURL ASCII string into its components works from right to left, from subpath to type.
Note: some extra type-specific normalizations are required.
To parse a PURL string in its components:
-
Split the PURL string once from right on '#'
- The left side is the remainder
- Split the right side on '/'
- Percent-decode each segment
- UTF-8-decode each segment if needed in your programming language
- Discard any segment that is empty, or equal to '.' or '..'
- Report an error if any segment contains a slash '/'
- This list of path segments is the subpath
- You may escape these path segments if needed by your environment (operating system, file system, programming language, shell, etc)
- You may join these path segments with the path delimiter of your environment (operating system, file system, etc)
-
Split the remainder once from right on '?'
-
The left side is the remainder
-
The right side is the qualifiers string
-
Split the qualifiers on '&'. Each part is a key=value pair
-
For each pair, split the key=value once from left on '=':
- The key is the lowercase left side
- The value is the percent-decoded right side
- UTF-8-decode the value if needed in your programming language
- Discard any key/value pairs where the value is empty
- If the key is 'checksum', split the value on ',' to create a list of checksums
-
This list of key/value is the qualifiers object
-
-
Split the remainder once from left on ':'
- The left side lowercased is the scheme
- The right side is the remainder
-
Strip all leading '/' characters (e.g., '/', '//', '///' and so on) from the remainder
- Split this once from left on '/'
- The left side lowercased is the type
- The right side is the remainder
-
Split the remainder once from right on '@'
- The left side is the remainder
- Percent-decode the right side. This is the version.
- UTF-8-decode the version if needed in your programming language
- This is the version
-
Strip all trailing '/' characters (e.g., '/', '//', '///' and so on) from the remainder
- Split this once from right on '/'
- The left side is the remainder
- Percent-decode the right side. This is the name
- UTF-8-decode this name if needed in your programming language
- Apply type-specific normalization to the name if needed
- This is the name
-
Split the remainder on '/'
- Discard any empty segment from that split
- Percent-decode each segment
- UTF-8-decode each segment if needed in your programming language
- Apply type-specific normalization to each segment if needed
- Join segments back with a '/'
- This is the namespace