How to parse a PURL

How to parse a PURL string into its components

Parsing a PURL ASCII string into its components works from right to left, from subpath to type.

Note: some extra type-specific normalizations are required.

To parse a PURL string in its components:

Split the PURL string once from right on '#'
- The left side is the remainder
- Split the right side on '/'
- Percent-decode each segment
- UTF-8-decode each segment if needed in your programming language
- Discard any segment that is empty, or equal to '.' or '..'
- Report an error if any segment contains a slash '/'
- This list of path segments is the subpath
- You may escape these path segments if needed by your environment (operating system, file system, programming language, shell, etc)
- You may join these path segments with the path delimiter of your environment (operating system, file system, etc)
Split the remainder once from right on '?'
- The left side is the remainder
- The right side is the qualifiers string
- Split the qualifiers on '&'. Each part is a key=value pair
- For each pair, split the key=value once from left on '=':
  - The key is the lowercase left side
  - The value is the percent-decoded right side
  - UTF-8-decode the value if needed in your programming language
  - Discard any key/value pairs where the value is empty
  - If the key is 'checksum', split the value on ',' to create a list of checksums
- This list of key/value is the qualifiers object
Split the remainder once from left on ':'
- The left side lowercased is the scheme
- The right side is the remainder
Strip all leading '/' characters (e.g., '/', '//', '///' and so on) from the remainder
- Split this once from left on '/'
- The left side lowercased is the type
- The right side is the remainder
Split the remainder once from right on '@'
- The left side is the remainder
- Percent-decode the right side. This is the version.
- UTF-8-decode the version if needed in your programming language
- This is the version
Strip all trailing '/' characters (e.g., '/', '//', '///' and so on) from the remainder
- Split this once from right on '/'
- The left side is the remainder
- Percent-decode the right side. This is the name
- UTF-8-decode this name if needed in your programming language
- Apply type-specific normalization to the name if needed
- This is the name
Split the remainder on '/'
- Discard any empty segment from that split
- Percent-decode each segment
- UTF-8-decode each segment if needed in your programming language
- Apply type-specific normalization to each segment if needed
- Join segments back with a '/'
- This is the namespace

How to parse a PURL string into its components​

How to parse a PURL string into its components