Strings are sequences of characters. The length of a string is the number of characters that it contains, as an exact non-negative integer. The valid indices of a string are the exact non-negative integers less than the length of the string. The first character of a string has index 0, the second has index 1, and so on.
Strings are implemented as a sequence of 16-bit char values,
even though they’re semantically a sequence of 32-bit Unicode code points.
A character whose value is greater than #xffff
is represented using two surrogate characters.
The implementation allows for natural interoperability with Java APIs.
However it does make certain operations (indexing or counting based on
character counts) difficult to implement efficiently. Luckily one
rarely needs to index or count based on character counts;
alternatives are discussed below.
There are different kinds of strings:
An istring is immutable:
It is fixed, and cannot be modified.
On the other hand, indexing (e.g. string-ref) is efficient (constant-time),
while indexing of other string implementations takes time proportional
to the index.
String literals are istrings, as are the return values of most of the procedures in this chapter.
An istring is an instance of the gnu.lists.IString class.
An mstring is mutable:
You can replace individual characters (using string-set!).
You can also change the mstring’s length by inserting
or removing characters (using string-append! or string-replace!).
An mstring is an instance of the gnu.lists.FString class.
Any other object that implements the java.lang.CharSequence interface
is also a string.
This includes standard Java java.lang.String
and java.lang.StringBuilder objects.
Some of the procedures that operate on strings ignore the
difference between upper and lower case. The names of
the versions that ignore case end with “-ci” (for “case
insensitive”).
Compatibility:
Many of the following procedures (for example string-append)
return an immutable istring in Kawa,
but return a “freshly allocated” mutable string in
standard Scheme (include R7RS) as well as most Scheme implementations
(including previous versions of Kawa).
To get the “compatibility mode” versions of those procedures
(which return mstrings),
invoke Kawa with one the --r5rs, --r6rs, or --r7rs
options, or you can import a standard library like (scheme base).
The type of string objects. The underlying type is the interface
java.lang.CharSequence. Immultable strings aregnu.lists.IStringorjava.lang.String, while mutable strings aregnu.lists.FString.
Return
#tifobjis a string,#fotherwise.
Return
#tifobjis a istring (a immutable, constant-time-indexable string);#fotherwise.
Return a string composed of the arguments. This is analogous to
list.Compatibility: The result is an istring, except in compatibility mode, when it is a new allocated mstring.
Procedure: string-length string
Return the number of characters in the given
stringas an exact integer object.Performance note: If the
stringis not an istring, the callingstring-lengthmay take time proportional to the length of thestring, because of the need to scan for surrogate pairs.
Procedure: string-ref stringk
kmust be a valid index ofstring. Thestring-refprocedure returns characterkofstringusing zero–origin indexing.Performance note: If the
stringis not an istring, then callingstring-refmay take time proportional tokbecause of the need to check for surrogate pairs. An alternative is to usestring-cursor-ref. If iterating through a string, usestring-for-each.
Procedure: string-null? string
Is
stringthe empty string? Same result as(= (string-lengthbut executes in O(1) time.string) 0)
Procedure: string-every pred string [start end])
Procedure: string-any pred string [start end])
Checks to see if every/any character in
stringsatisfiespred, proceeding from left (indexstart) to right (indexend). These procedures are short-circuiting: ifpredreturns false,string-everydoes not callpredon subsequent characters; ifpredreturns true,string-anydoes not callpredon subsequent characters. Both procedures are “witness-generating”:
If
string-everyis given an empty interval (withstart=end), it returns#t.If
string-everyreturns true for a non-empty interval (withstart<end), the returned true value is the one returned by the final call to the predicate on(string-ref.string(-end1))If
string-anyreturns true, the returned true value is the one returned by the predicate.Note: The names of these procedures do not end with a question mark. This indicates a general value is returned instead of a simple boolean (
#tor#f).
Procedure: string-tabulate proc len
Constructs a string of size
lenby callingprocon each value from 0 (inclusive) tolen(exclusive) to produce the corresponding element of the string. The procedureprocaccepts an exact integer as its argument and returns a character. The order in whichprocis called on those indexes is not specifified.Rationale: Although
string-unfoldis more general,string-tabulateis likely to run faster for the common special case it implements.
Procedure: string-unfold stop? mapper successor seed [base make-final]
Procedure: string-unfold-right stop? mapper successor seed [base make-final]
This is a fundamental and powerful constructor for strings.
successoris used to generate a series of “seed” values from the initial seed:seed,(successorseed),(successor2seed),(successor3seed), ...
stop?tells us when to stop — when it returns true when applied to one of these seed values.
mappermaps each seed value to the corresponding character(s) in the result string, which are assembled into that string in left-to-right order. It is an error formapperto return anything other than a character or string.
baseis the optional initial/leftmost portion of the constructed string, which defaults to the empty string"". It is an error ifbaseis anything other than a character or string.
make-finalis applied to the terminal seed value (on whichstop?returns true) to produce the final/rightmost portion of the constructed string. It defaults to(lambda (x) ""). It is an error formake-finalto return anything other than a character or string.
string-unfold-rightis the same asstring-unfoldexcept the results ofmapperare assembled into the string in right-to-left order,baseis the optional rightmost portion of the constructed string, andmake-finalproduces the leftmost portion of the constructed string.You can use it
string-unfoldto convert a list to a string, read a port into a string, reverse a string, copy a string, and so forth. Examples:(define (port->string p) (string-unfold eof-object? values (lambda (x) (read-char p)) (read-char p))) (define (list->string lis) (string-unfold null? car cdr lis)) (define (string-tabulate f size) (string-unfold (lambda (i) (= i size)) f add1 0))To map
fover a listlis, producing a string:(string-unfold null? (composefcar) cdrlis)Interested functional programmers may enjoy noting that
string-fold-rightandstring-unfoldare in some sense inverses. That is, given operationsknull?,kar,kdr,kons, andknilsatisfying(kons(karx) (kdrx)) = x and (knull?knil) = #tthen
(string-fold-rightkonsknil(string-unfoldknull?karkdrx)) =xand
(string-unfoldknull?karkdr(string-fold-rightkonsknilstring)) =string.This combinator pattern is sometimes called an “anamorphism.”
Procedure: substring stringstartend
stringmust be a string, andstartandendmust be exact integer objects satisfying:0 <=start<=end<= (string-lengthstring)The
substringprocedure returns a newly allocated string formed from the characters ofstringbeginning with indexstart(inclusive) and ending with indexend(exclusive).
Procedure: string-take string nchars
Procedure: string-drop string nchars
Procedure: string-take-right string nchars
Procedure: string-drop-right string nchars
string-takereturns an immutable string containing the firstncharsofstring;string-dropreturns a string containing all but the firstncharsofstring.string-take-rightreturns a string containing the lastncharsofstring;string-drop-rightreturns a string containing all but the lastncharsofstring.(string-take "Pete Szilagyi" 6) ⇒ "Pete S" (string-drop "Pete Szilagyi" 6) ⇒ "zilagyi" (string-take-right "Beta rules" 5) ⇒ "rules" (string-drop-right "Beta rules" 5) ⇒ "Beta "It is an error to take or drop more characters than are in the string:
(string-take "foo" 37) ⇒ error
Procedure: string-pad string len [char start end]
Procedure: string-pad-right string len [char start end]
Returns an istring of length
lencomprised of the characters drawn from the given subrange ofstring, padded on the left (right) by as many occurrences of the charactercharas needed. Ifstringhas more thanlenchars, it is truncated on the left (right) to lengthlen. Thechardefaults to#\space(string-pad "325" 5) ⇒ " 325" (string-pad "71325" 5) ⇒ "71325" (string-pad "8871325" 5) ⇒ "71325"
Procedure: string-trim string [pred start end]
Procedure: string-trim-right string [pred start end]
Procedure: string-trim-both string [pred start end]
Returns an istring obtained from the given subrange of
stringby skipping over all characters on the left / on the right / on both sides that satisfy the second argumentpred:preddefaults tochar-whitespace?.(string-trim-both " The outlook wasn't brilliant, \n\r") ⇒ "The outlook wasn't brilliant,"
Procedure: string=? string1 string2 string3 …
Return
#tif the strings are the same length and contain the same characters in the same positions. Otherwise, thestring=?procedure returns#f.(string=? "Straße" "Strasse") ⇒ #f
Procedure: string<? string1 string2 string3 …
Procedure: string>? string1 string2 string3 …
Procedure: string<=? string1 string2 string3 …
Procedure: string>=? string1 string2 string3 …
These procedures return
#tif their arguments are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically nonincreasing. These predicates are required to be transitive.These procedures are the lexicographic extensions to strings of the corresponding orderings on characters. For example,
string<?is the lexicographic ordering on strings induced by the orderingchar<?on characters. If two strings differ in length but are the same up to the length of the shorter string, the shorter string is considered to be lexicographically less than the longer string.(string<? "z" "ß") ⇒ #t (string<? "z" "zz") ⇒ #t (string<? "z" "Z") ⇒ #f
Procedure: string-ci=? string1 string2 string3 …
Procedure: string-ci<? string1 string2 string3 …
Procedure: string-ci>? string1 string2 string3 …
Procedure: string-ci<=? string1 string2 string3 …
Procedure: string-ci>=? string1 string2 string3 …
These procedures are similar to
string=?, etc., but behave as if they appliedstring-foldcaseto their arguments before invoking the corresponding procedures without-ci.(string-ci<? "z" "Z") ⇒ #f (string-ci=? "z" "Z") ⇒ #t (string-ci=? "Straße" "Strasse") ⇒ #t (string-ci=? "Straße" "STRASSE") ⇒ #t (string-ci=? "ΧΑΟΣ" "χαοσ") ⇒ #t
The
list->stringprocedure returns an istring formed from the characters inlist, in order. It is an error if any element oflistis not a character.Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
Procedure: reverse-list->string list
An efficient implementation of
(compose list->text reverse):(reverse-list->text '(#\a #\B #\c)) ⇒ "cBa"This is a common idiom in the epilogue of string-processing loops that accumulate their result using a list in reverse order. (See also
string-concatenate-reversefor the “chunked” variant.)
Procedure: string->list [string [start]]end
The
string->listprocedure returns a newly allocated list of the characters ofstringbetweenstartandend, in order. Thestring->listandlist->stringprocedures are inverses so far asequal?is concerned.
Procedure: vector->string vector [start [end]]
The
vector->stringprocedure returns a newly allocated string of the objects contained in the elements ofvectorbetweenstartandend. It is an error if any element ofvectorbetweenstartandendis not a character, or is a character forbidden in strings.(vector->string #(#\1 #\2 #\3)) ⇒ "123" (vector->string #(#\1 #\2 #\3 #\4 #\5) 2 4) ⇒ "34"
Procedure: string->vector string [start [end]]
The
string->vectorprocedure returns a newly created vector initialized to the elements of the stringstringbetweenstartandend.(string->vector "ABC") ⇒ #(#\A #\B #\C) (string->vector "ABCDE" 1 3) ⇒ #(#\B #\C)
Procedure: string-upcase string
Procedure: string-downcase string
Procedure: string-titlecase string
Procedure: string-foldcase string
These procedures take a string argument and return a string result. They are defined in terms of Unicode’s locale–independent case mappings from Unicode scalar–value sequences to scalar–value sequences. In particular, the length of the result string can be different from the length of the input string. When the specified result is equal in the sense of
string=?to the argument, these procedures may return the argument instead of a newly allocated string.The
string-upcaseprocedure converts a string to upper case;string-downcaseconverts a string to lower case. Thestring-foldcaseprocedure converts the string to its case–folded counterpart, using the full case–folding mapping, but without the special mappings for Turkic languages. Thestring-titlecaseprocedure converts the first cased character of each word, and downcases all other cased characters.(string-upcase "Hi") ⇒ "HI" (string-downcase "Hi") ⇒ "hi" (string-foldcase "Hi") ⇒ "hi" (string-upcase "Straße") ⇒ "STRASSE" (string-downcase "Straße") ⇒ "straße" (string-foldcase "Straße") ⇒ "strasse" (string-downcase "STRASSE") ⇒ "strasse" (string-downcase "Σ") ⇒ "σ" ; Chi Alpha Omicron Sigma: (string-upcase "ΧΑΟΣ") ⇒ "ΧΑΟΣ" (string-downcase "ΧΑΟΣ") ⇒ "χαος" (string-downcase "ΧΑΟΣΣ") ⇒ "χαοσς" (string-downcase "ΧΑΟΣ Σ") ⇒ "χαος σ" (string-foldcase "ΧΑΟΣΣ") ⇒ "χαοσσ" (string-upcase "χαος") ⇒ "ΧΑΟΣ" (string-upcase "χαοσ") ⇒ "ΧΑΟΣ" (string-titlecase "kNock KNoCK") ⇒ "Knock Knock" (string-titlecase "who's there?") ⇒ "Who's There?" (string-titlecase "r6rs") ⇒ "R6rs" (string-titlecase "R6RS") ⇒ "R6rs"Since these procedures are locale–independent, they may not be appropriate for some locales.
Kawa Note: The implementation of
string-titlecasedoes not correctly handle the case where an initial character needs to be converted to multiple characters, such as “LATIN SMALL LIGATURE FL” which should be converted to the two letters"Fl".Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
Procedure: string-normalize-nfd string
Procedure: string-normalize-nfkd string
Procedure: string-normalize-nfc string
Procedure: string-normalize-nfkc string
These procedures take a string argument and return a string result, which is the input string normalized to Unicode normalization form D, KD, C, or KC, respectively. When the specified result is equal in the sense of
string=?to the argument, these procedures may return the argument instead of a newly allocated string.(string-normalize-nfd "\xE9;") ⇒ "\x65;\x301;" (string-normalize-nfc "\xE9;") ⇒ "\xE9;" (string-normalize-nfd "\x65;\x301;") ⇒ "\x65;\x301;" (string-normalize-nfc "\x65;\x301;") ⇒ "\xE9;"
Procedure: string-prefix-length string1 string2 [start1 end1 start2 end2]
Procedure: string-suffix-length string1 string2 [start1 end1 start2 end2]
Return the length of the longest common prefix/suffix of
string1andstring2. For prefixes, this is equivalent to their “mismatch index” (relative to the start indexes).The optional
start/endindexes restrict the comparison to the indicated substrings ofstring1andstring2.
Procedure: string-prefix? string1 string2 [start1 end1 start2 end2]
Procedure: string-suffix? string1 string2 [start1 end1 start2 end2]
Is
string1a prefix/suffix ofstring2?The optional
start/endindexes restrict the comparison to the indicated substrings ofstring1andstring2.
Procedure: string-index string pred [start end]
Procedure: string-index-right string pred [start end]
Procedure: string-skip string pred [start end]
Procedure: string-skip-right string pred [start end]
string-indexsearches through the given substring from the left, returning the index of the leftmost character satisfying the predicatepred.string-index-rightsearches from the right, returning the index of the rightmost character satisfying the predicatepred. If no match is found, these procedures return#f.The
startandendarguments specify the beginning and end of the search; the valid indexes relevant to the search includestartbut excludeend. Beware of “fencepost”" errors: when searching right-to-left, the first index considered is(-, whereas when searching left-to-right, the first index considered isend1)start. That is, the start/end indexes describe the same half-open interval[in these procedures that they do in other string procedures.start,end)The
-skipfunctions are similar, but use the complement of the criterion: they search for the first char that doesn’t satisfypred. To skip over initial whitespace, for example, say(substring string (or (string-skip string char-whitespace?) (string-length string)) (string-length string))These functions can be trivially composed with
string-takeandstring-dropto producetake-while,drop-while,span, andbreakprocedures without loss of efficiency.
Procedure: string-contains string1 string2 [start1 end1 start2 end2]
Procedure: string-contains-right string1 string2 [start1 end1 start2 end2]
Does the substring of
string1specified bystart1andend1contain the sequence of characters given by the substring ofstring2specified bystart2andend2?Returns
#fif there is no match. Ifstart2=end2,string-containsreturnsstart1butstring-contains-rightreturnsend1. Otherwise returns the index instring1for the first character of the first/last match; that index lies within the half-open interval [start1,end1), and the match lies entirely within the [start1,end1) range ofstring1.(string-contains "eek -- what a geek." "ee" 12 18) ; Searches "a geek" ⇒ 15Note: The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.
Procedure: string-append string…
Returns a string whose characters form the concatenation of the given strings.
Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
Procedure: string-concatenate string-list
Concatenates the elements of
string-listtogether into a single istring.Rationale: Some implementations of Scheme limit the number of arguments that may be passed to an n-ary procedure, so the
(apply string-appendidiom, which is otherwise equivalent to using this procedure, is not as portable.string-list)
Procedure: string-concatenate-reverse string-list [final-string [end]])
With no optional arguments, calling this procedure is equivalent to
(string-concatenate (reverse. If the optional argumentstring-list))final-stringis specified, it is effectively consed onto the beginning ofstring-listbefore performing the list-reverse and string-concatenate operations.If the optional argument
endis given, only the characters up to but not includingendinfinal-stringare added to the result, thus producing(string-concatenate (reverse (cons (substring final-string 0 end) string-list)))For example:
(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7) ⇒ "Hello, I must be going."Rationale: This procedure is useful when constructing procedures that accumulate character data into lists of string buffers, and wish to convert the accumulated data into a single string when done. The optional end argument accommodates that use case when
final-stringis a bob-full mutable string, and is allowed (for uniformity) whenfinal-stringis an immutable string.
Procedure: string-join string-list [delimiter [grammar]]
This procedure is a simple unparser; it pastes strings together using the
delimiterstring, returning an istring.The
string-listis a list of strings. Thedelimiteris the string used to delimit elements; it defaults to a single space" ".The
grammarargument is a symbol that determines how thedelimiteris used, and defaults to'infix. It is an error forgrammarto be any symbol other than these four:
'infixAn infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty string.
'strict-infixMeans the same as
'infixif the string-list is non-empty, but will signal an error if given an empty list. (This avoids an ambiguity shown in the examples below.)'suffixMeans a suffix or terminator grammar: insert the
delimiterafter every list element.'prefixMeans a prefix grammar: insert the
delimiterbefore every list element.(string-join '("foo" "bar" "baz")) ⇒ "foo bar baz" (string-join '("foo" "bar" "baz") "") ⇒ "foobarbaz" (string-join '("foo" "bar" "baz") ":") ⇒ "foo:bar:baz" (string-join '("foo" "bar" "baz") ":" 'suffix) ⇒ "foo:bar:baz:" ;; Infix grammar is ambiguous wrt empty list vs. empty string: (string-join '() ":") ⇒ "" (string-join '("") ":") ⇒ "" ;; Suffix and prefix grammars are not: (string-join '() ":" 'suffix)) ⇒ "" (string-join '("") ":" 'suffix)) ⇒ ":"
Procedure: string-replace string1 string2 start1 end1 [start2 end2]
Returns
(string-append (substringstring10start1) (substringstring2start2end2) (substringstring1end1(string-lengthstring1)))That is, the segment of characters in
string1fromstart1toend1is replaced by the segment of characters instring2fromstart2toend2. Ifstart1=end1, this simply splices the characters drawn fromstring2intostring1at that position.Examples:
(string-replace "The TCL programmer endured daily ridicule." "another miserable perl drone" 4 7 8 22) ⇒ "The miserable perl programmer endured daily ridicule." (string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9) ⇒ "It's lots of fun to code it up in Scheme." (define (string-insert s i t) (string-replace s t i i)) (string-insert "It's easy to code it up in Scheme." 5 "really ") ⇒ "It's really easy to code it up in Scheme." (define (string-set s i c) (string-replace s (string c) i (+ i 1))) (string-set "String-ref runs in O(n) time." 19 #\1) ⇒ "String-ref runs in O(1) time."
Also see string-append! and string-replace!
for destructive changes to a mutable string.
Procedure: string-fold kons knil string [start end]
Procedure: string-fold-right kons knil string [start end]
These are the fundamental iterators for strings.
The
string-foldprocedure maps thekonsprocedure across the givenstringfrom left to right:(... (konsstring2 (konsstring1 (konsstring0knil))))In other words, string-fold obeys the (tail) recursion
(string-foldkonsknilstringstartend) = (string-foldkons(konsstringstartknil)start+1end)The
string-fold-rightprocedure mapskonsacross the given stringstringfrom right to left:(konsstring0 (... (konsstringend-3(konsstringend-2(konsstringend-1knil)))))obeying the (tail) recursion
(string-fold-rightkonsknilstringstartend) = (string-fold-rightkons(konsstringend-1knil)startend-1)Examples:
;;; Convert a string or string to a list of chars. (string-fold-right cons '() string) ;;; Count the number of lower-case characters in a string or string. (string-fold (lambda (c count) (if (char-lower-case? c) (+ count 1) count)) 0 string)The string-fold-right combinator is sometimes called a "catamorphism."
Procedure: string-for-each procstring1 string2 …
Procedure: string-for-each procstring1 [start [end]]
The
strings must all have the same length.procshould accept as many arguments as there arestrings.The
start-endvariant is provided for compatibility with the SRFI-13 version. (In that casestartandendcount code Unicode scalar values (charactervalues), not Java 16-bitcharvalues.)The
string-for-eachprocedure appliesprocelement–wise to the characters of thestrings for its side effects, in order from the first characters to the last.procis always called in the same dynamic environment asstring-for-eachitself.Analogous to
for-each.(let ((v '())) (string-for-each (lambda (c) (set! v (cons (char->integer c) v))) "abcde") v) ⇒ (101 100 99 98 97)Performance note: The compiler generates efficient code for
string-for-each. Ifprocis a lambda expression, it is inlined.
Procedure: string-map procstring1 string2 …
The
string-mapprocedure appliesprocelement-wise to the elements of the strings and returns a string of the results, in order. It is an error ifprocdoes not accept as many arguments as there are strings, or return other than a single character or a string. If more than one string is given and not all strings have the same length,string-mapterminates when the shortest string runs out. The dynamic order in whichprocis applied to the elements of the strings is unspecified.(string-map char-foldcase "AbdEgH") ⇒ "abdegh"(string-map (lambda (c) (integer->char (+ 1 (char->integer c)))) "HAL") ⇒ "IBM"(string-map (lambda (c k) ((if (eqv? k #\u) char-upcase char-downcase) c)) "studlycaps xxx" "ululululul") ⇒ "StUdLyCaPs"Traditionally the result of
prochad to be a character, but Kawa (and SRFI-140) allows the result to be a string.Performance note: The
string-mapprocedure has not been optimized (mainly because it is not very useful): The characters are boxed, and theprocis not inlined even if it is a lambda expression.
Procedure: string-map-index proc string [start end]
Calls
procon each valid index of the specified substring, converts the results of those calls into strings, and returns the concatenation of those strings. It is an error forprocto return anything other than a character or string. The dynamic order in which proc is called on the indexes is unspecified, as is the dynamic order in which the coercions are performed. If any strings returned byprocare mutated after they have been returned and before the call tostring-map-indexhas returned, thenstring-map-indexreturns a string with unspecified contents; thestring-map-indexprocedure itself does not mutate those strings.
Procedure: string-for-each-index proc string [start end]
Calls
procon each valid index of the specified substring, in increasing order, discarding the results of those calls. This is simply a safe and correct way to loop over a substring.Example:
(let ((txt (string->string "abcde")) (v '())) (string-for-each-index (lambda (cur) (set! v (cons (char->integer (string-ref txt cur)) v))) txt) v) ⇒ (101 100 99 98 97)
Procedure: string-count string pred [start end]
Returns a count of the number of characters in the specified substring of
stringthat satisfy the predicatepred.
Procedure: string-filter pred string [start end]
Procedure: string-remove pred string [start end]
Return an immutable string consisting of only selected characters, in order:
string-filterselects only the characters that satisfypred;string-removeselects only the characters that not satisfypred
Procedure: string-repeat string-or-character len
Create an istring by repeating the first argument
lentimes. If the first argument is a character, it is as if it were wrapped with thestringconstructor. We can define string-repeat in terms of the more generalxsubstringprocedure:(define (string-repeat S N) (let ((T (if (char? S) (string S) S))) (xsubstring T 0 (* N (string-length T))))
Procedure: xsubstring string [from to [start end]]
This is an extended substring procedure that implements replicated copying of a substring. The
stringis a string;startandendare optional arguments that specify a substring ofstring, defaulting to 0 and the length ofstring. This substring is conceptually replicated both up and down the index space, in both the positive and negative directions. For example, ifstringis"abcdefg",startis 3, andendis 6, then we have the conceptual bidirectionally-infinite string... d e f d e f d e f d e f d e f d e f d ... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9
xsubstringreturns the substring of thestringbeginning at indexfrom, and ending atto. It is an error iffromis greater thanto.If
fromandtoare missing they default to 0 andfrom+(end-start), respectively. This variant is a generalization of usingsubstring, but unlikesubstringnever shares substructures that would retain characters or sequences of characters that are substructures of its first argument or previously allocated objects.You can use
xsubstringto perform a variety of tasks:
To rotate a string left:
(xsubstring "abcdef" 2 8) ⇒ "cdefab"To rotate a string right:
(xsubstring "abcdef" -2 4) ⇒ "efabcd"To replicate a string:
(xsubstring "abc" 0 7) ⇒ "abcabca"Note that
The
from/toarguments give a half-open range containing the characters from indexfromup to, but not including, indexto.The
from/toindexes are not expressed in the index space ofstring. They refer instead to the replicated index space of the substring defined bystring,start, andend.It is an error if
start=end, unlessfrom=to, which is allowed as a special case.
Procedure: string-split string delimiter [grammar limit start end]
Returns a list of strings representing the words contained in the substring of
stringfromstart(inclusive) toend(exclusive). Thedelimiteris a string to be used as the word separator. This will often be a single character, but multiple characters are allowed for use cases such as splitting on"\r\n". The returned list will have one more item than the number of non-overlapping occurrences of thedelimiterin the string. Ifdelimiteris an empty string, then the returned list contains a list of strings, each of which contains a single character.The
grammaris a symbol with the same meaning as in thestring-joinprocedure. If it isinfix, which is the default, processing is done as described above, except an empty string produces the empty list; if grammar isstrict-infix, then an empty string signals an error. The valuesprefixandsuffixcause a leading/trailing empty string in the result to be suppressed.If
limitis a non-negative exact integer, at most that many splits occur, and the remainder of string is returned as the final element of the list (so the result will have at most limit+1 elements). If limit is not specified or is #f, then as many splits as possible are made. It is an error if limit is any other value.To split on a regular expression, you can use SRFI 115’s
regexp-splitprocedure.
The following procedures create a mutable string, i.e. one that you can modify.
Procedure: make-string [ [k]]char
Return a newly allocated mstring of
kcharacters, wherekdefaults to 0. Ifcharis given, then all elements of the string are initialized tochar, otherwise the contents of thestringare unspecified.The 1-argument version is deprecated as poor style, except when k is 0.
Rationale: In many languags the most common pattern for mutable strings is to allocate an empty string and incrementally append to it. It seems natural to initialize the string with
(make-string), rather than(make-string 0).To return an immutable string that repeats
ktimes a charactercharusestring-repeat.This is as R7RS, except the result is variable-size and we allow leaving out
kwhen it is zero.
Procedure: string-copy [string [start]]end
Returns a newly allocated mutable (mstring) copy of the part of the given
stringbetweenstartandend.
The following procedures modify a mutable string.
Procedure: string-set! string k char
This procedure stores
charin elementkofstring.(define s1 (make-string 3 #\*)) (define s2 "***") (string-set! s1 0 #\?) ⇒ void s1 ⇒ "?**" (string-set! s2 0 #\?) ⇒ error (string-set! (symbol->string 'immutable) 0 #\?) ⇒ errorPerformance note: Calling
string-set!may take time proportional to the length of the string: First it must scan for the right position, likestring-refdoes. Then if the new character requires using a surrogate pair (and the old one doesn’t) then we have to make room in the string, possibly re-allocating a newchararray. Alternatively, if the old character requires using a surrogate pair (and the new one doesn’t) then following characters need to be moved.The function
string-set!is deprecated: It is inefficient, and it very seldom does the correct thing. Instead, you can construct a string withstring-append!.
Procedure: string-append! stringvalue…
The
stringmust be a mutable string, such as one returned bymake-stringorstring-copy. Thestring-append!procedure extendsstringby appending eachvalue(in order) to the end ofstring. Eachvalueshould be a character or a string.Performance note: The compiler converts a call with multiple
values to multiplestring-append!calls. If avalueis known to be acharacter, then no boxing (object-allocation) is needed.The following example shows how to efficiently process a string using
string-for-eachand incrementally “build” a result string usingstring-append!.(define (translate-space-to-newline str::string)::string (let ((result (make-string 0))) (string-for-each (lambda (ch) (string-append! result (if (char=? ch #\Space) #\Newline ch))) str) result))
Procedure: string-copy! toat [from [start]]end
Copies the characters of the string
fromthat are betweenstartendendinto the stringto, starting at indexat. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)This is equivalent to (and implemented as):
(string-replace! to at (+ at (- end start)) from start end))(define a "12345") (define b (string-copy "abcde")) (string-copy! b 1 a 0 2) b ⇒ "a12de"
Procedure: string-replace! dstdst-startdst-end [src [src-start]]src-end
Replaces the characters of string
dst(betweendst-startanddst-end) with the characters ofsrc(betweensrc-startandsrc-end). The number of characters fromsrcmay be different than the number replaced indst, so the string may grow or contract. The special case wheredst-startis equal todst-endcorresponds to insertion; the case wheresrc-startis equal tosrc-endcorresponds to deletion. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)
Procedure: string-fill! string [fill [start]]end
The
string-fill!procedure storesfillin the elements ofstringbetweenstartandend. It is an error iffillis not a character or is forbidden in strings.
Using function-call syntax with strings is convenient and efficient. However, it has some “gotchas”.
We will use the following example string:
(! str1 "Smile \x1f603;!")
or if you’re brave:
(! str1 "Smile 😃!")
This is "Smile " followed by an emoticon (“smiling face with
open mouth”) followed by "!".
The emoticon has scalar value \x1f603 - it is not
in the 16-bit Basic Multi-language Plane,
and so it must be encoded by a surrogate pair
(#\xd83d followed by #\xde03).
The number of scalar values (characters) is 8,
while the number of 16-bits code units (chars) is 9.
The java.lang.CharSequence:length method
counts chars. Both the length and the
string-length procedures count characters. Thus:
(length str1) ⇒ 8 (string-length str1) ⇒ 8 (str1:length) ⇒ 9
Counting chars is a constant-time operation (since it
is stored in the data structure).
Counting characters depends on the representation used:
In geneeral it may take time proportional to the length of
the string, since it has to subtract one for each surrogate pair;
however the istring type (gnu.lists.IString class)
uses a extra structure so it can count characters in constant-time.
Similarly we can can index the string in 3 ways:
(str1 1) ⇒ #\m :: character (string-ref str1 1) ⇒ #\m :: character (str1:charAt 1) ⇒ #\m :: char
Using function-call syntax when the “function” is a string
and a single integer argument is the same as using string-ref.
Things become interesting when we reach the emoticon:
(str1 6) ⇒ #\😃 :: character (str1:charAt 6) ⇒ #\d83d :: char
Both string-ref and the function-call syntax return the
real character, while the charAt methods returns a partial character.
(str1 7) ⇒ #\! :: character
(str1:charAt 7) ⇒ #\de03 :: char
(str1 8) ⇒ throws StringIndexOutOfBoundsException
(str1:charAt 8) ⇒ #\! :: char
You can index a string with a list of integer indexes, most commonly a range:
(str[i...])
is basically the same as:
(string (stri) ...)
Generally when working with strings it is best to work with substrings rather than individual characters:
(str[start<:end])
This is equivalent to invoking the substring procedure:
(substringstrstartend)
Indexing into a string (using for example string-ref)
is inefficient because of the possible presence of surrogate pairs.
Hence given an index i access normally requires linearly
scanning the string until we have seen i characters.
The string-cursor API is defined in terms of abstract “cursor values”, which point to a position in the string. This avoids the linear scan.
Typical usage is:
(let* ((strwhatever) (end (string-cursor-end str))) (do ((sc::string-cursor (string-cursor-start str) (string-cursor-next str sc))) ((string-cursor>=? sc end)) (let ((ch (string-cursor-ref str sc))) (do-something-withch))))
Alternatively, the following may be marginally faster:
(let* ((strwhatever) (end (string-cursor-end str))) (do ((sc::string-cursor (string-cursor-start str) (string-cursor-next-quick sc))) ((string-cursor>=? sc end)) (let ((ch (string-cursor-ref str sc))) (if (not (char=? ch #\ignorable-char)) (do-something-withch)))))
The API is non-standard, but is based on that in Chibi Scheme.
An abstract position (index) in a string. Implemented as a primitive
intwhich counts the number of preceding code units (16-bitcharvalues).
Procedure: string-cursor-start str
Returns a cursor for the start of the string. The result is always 0, cast to a
string-cursor.
Procedure: string-cursor-end str
Returns a cursor for the end of the string - one past the last valid character. Implemented as
(as string-cursor (invoke.str'length))
Procedure: string-cursor-ref str cursor
Return the
characterat thecursor. If thecursorpoints to the secondcharof a surrogate pair, returns#\ignorable-char.
Procedure: string-cursor-next string cursor [count]
Return the cursor position
count(default 1) character positions forwards beyondcursor. For eachcountthis may add either 1 or 2 (if pointing at a surrogate pair) to thecursor.
Procedure: string-cursor-next-quiet cursor
Increment cursor by one raw
charposition, even ifcursorpoints to the start of a surrogate pair. (In that case the nextstring-cursor-refwill return#\ignorable-char.) Same as(+but with thecursor1)string-cursortype.
Procedure: string-cursor-prev string cursor [count]
Return the cursor position
count(default 1) character positions backwards beforecursor.
Procedure: substring-cursor string [start [end]]
Create a substring of the section of
stringbetween the cursorsstartandend.
Procedure: string-cursor<? cursor1 cursor2
Procedure: string-cursor<=? cursor1 cursor2
Procedure: string-cursor=? cursor1 cursor2
Procedure: string-cursor>=? cursor1 cursor2
Procedure: string-cursor>? cursor1 cursor2
Is the position of
cursor1respectively before, before or same, same, after, or after or same, ascursor2.Performance note: Implemented as the corresponding
intcomparison.
Procedure: string-cursor-for-each proc string [start [end]]
Apply the procedure
procto each character position instringbetween the cursorsstartandend.