In Judo, all values are objects. Every object has a type, which equates to a set of properties and methods. For convenience, Judo data types are categorized into primitive and non-primitive types. Primitive types are traditionally simple values; in Judo, primitive types include boolean, numbers, string, date, time and a special type,
|
Listing 5.1 multiline.judo |
---|
a = [[* aaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaaaaaa *]]; println '----', a, '----'; b = [[[* bbbbbbbbbbbbbbb bbbbbbbbbbbbbbb bbbbbbbbbbbbbbb *]]; println '====', b, '===='; |
The println
command prints out all the textual values plus a newline. The result for this program is:
----aaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaaaaaa---- ==== bbbbbbbbbbbbbbb bbbbbbbbbbbbbbb bbbbbbbbbbbbbbb ====
Embedded expressions
In mutli-line text literals, expressions can be embedded with the (* *)
syntax. The embedded expressions will be evaluated to string values and concatenated to the rest of the text. Strictly speaking, the text is not a literal any more, but rather, a template. The following is an example that sends out emails to a mailing list which is stored in a database table (we will explain the usages later in this book; for now, just focus on how the text template is used; auxiliary parts such as connecting and disconnecting from servers are also omitted.)
executeQury qry: SELECT last_name, salute, email FROM customers ; while qry.next() { sendMail from: 'support@judoscript.com' to: qry.email subject: 'Daily digest for ' + Date().fmtDate('yyyy-MM-dd') body: [[* Dear (* qry.salute *) (* qry.last_name *), This is today's daily digest. Please don't reply to this mail. Thanks, -Judo support *]] ; }
In the body
clause of the sendMail
statement, we used a text template and generated the message body for each customer, where the values are from the database query object.
Embedded variables and environment variables
The syntax for embedded expressions, (* *)
, applies only to the multi-line text literals. However, variables, including environment variables, can be embedded all forms of string literals via the ${}
syntax, which is familiar to Unix shell programmers. The rule is that, if the named variale exists within the current Judo program, its value is used; otherwise, the name-sake environment variable is retrieved and used. What's more, ${}
can be used independently, which is a shortcut for the system function, getenv()
, that explicitly accesses environment variables. As usual, let's see an example.
Listing 5.2 envvar.judo |
---|
println 'Case I. \${CLASSPATH} --> ', ${CLASSPATH}; println "Case Ia. '\${CLASSPATH}' --> ${CLASSPATH}"; println "Case Ib. CLASSPATH -->", CLASSPATH; // set it and see that it is: println '... Set in-program variable CLASSPATH to ', CLASSPATH = 'hahaha'; println 'Case II. \${CLASSPATH} --> ', ${CLASSPATH}; println "Case IIa. getenv('CLASSPATH') --> ", getenv('CLASSPATH'); println "Case IIb. '\${CLASSPATH}' --> ${CLASSPATH}"; println 'Case III. CLASSPATH --> ', CLASSPATH; |
This program essentially consists of five test cases. Case I explicitly accesses the environment variable CLASSPATH
. Case Ia yields the same result, only because there is no name-sake variable. Prior to Case II, we set a in-program variable with the same name, CLASSPATH
; Case II proves that the ${CLASSPATH}
ignores the in-program variable and still returns the environment variable. Case IIa shows how to use getenv('CLASSPATH')
to accomplish the same. Case IIb is in contrast to Case Ia; this time, the varaible CLASSPATH
has been defined, and the in-program variable value is displayed. Lastly, the reference to CLASSPATH
is always referencing the in-program variable. The following is the result:
Case I. ${CLASSPATH} --> c:\jlib\judo.jar;c:\jlib\classes12.zip Case Ia. '${CLASSPATH}' --> c:\jlib\judo.jar;c:\jlib\classes12.zip Case Ib. CLASSPATH --> ... Set in-program variable CLASSPATH to hahaha Case II. ${CLASSPATH} --> c:\jlib\judo.jar;c:\jlib\classes12.zip Case IIa. getenv('CLASSPATH') --> c:\jlib\judo.jar;c:\jlib\classes12.zip Case IIb. '${CLASSPATH}' --> hahaha Case III. CLASSPATH --> hahaha
This environment variable access operator is familiar to Unix shell programmers, and will be discussed further in chapter . .
Within a string, both ${}
and (* *)
syntax can embed references to variables; (* *)
can enclose any expressions; ${}
potentially reference environment variables if the name-sake variable does not exist. To access global variables, you can do like this: ${::xyz}
. If the global variable does not exist, Judo still tries to find the name-sake environment variable.
Regular expression (short as regex) is a familiar topic to many scripting language programmers. As a mini language describing various text patterns, regex renders tremendous power to text processing. People have been making great efforts to provide this power to Java, and finally, JDK1.4 embraced it as a part of Java standard edition. Judo regex support is based on that of Java. Since this is available only in JDK1.4 and later, any regex uses with JDK1.3 will cause runtime errors.
If you are a Java programmer, you are probably aware of the JDK1.4 regex API. If you are not a Java programmer, you don't have to be concerned with that API; all you have to know is the regex constructs. Judo does not reinvent the regex construts but simply uses Java's, so it is good to know how Java does it and what Java supports.
The java.util.regex
package in JDK1.4 onwards supports Java regex. The key in this API is class Pattern
. A regex must be "compiled" into a Pattern
instance, and then used to deal with string instances. What you can apply a compiled regex pattern to strings to do these:
The match operation, in Java, returns a Matcher
object, which has facilities (methods) to go through various pieces of the matches. Each matched piece is called a group, which has a start and an end indices in the original string. You can reset and match again. This object is treated as an intrinsic object in Judo and will be discussed in detail later.
Regular expressions in Judo are the same as in Java; from this point on, we will just call them regular expressions, or simply regex's. In this section, we will introduce the details of the regex's, which is, indeed, the specification defined by the java.util.regex.Pattern
class in JDK1.4. For general knowledge about regex, please refer to relevant literatures such as any Perl books. Here, we assume that you are assumed to be familiar with some forms of regex's and just discuss the details of regex syntax.
The following table shows the regex constructs:
Construct | Matches |
---|---|
Characters | |
x | The character x |
\\ | The backslash character |
\0n | The character with octal value 0n (0 <= n <= 7) |
\0nn | The character with octal value 0nn (0 <= n <= 7) |
\0mnn | The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) |
\xhh | The character with hexadecimal value 0xhh |
\uhhhh | The character with hexadecimal value 0xhhhh |
\t | The tab character ('\u0009' ) |
\n | The newline (line feed) character ('\u000A' ) |
\r | The carriage-return character ('\u000D' ) |
\f | The form-feed character ('\u000C' ) |
\a | The alert (bell) character ('\u0007' ) |
\e | The escape character ('\u001B' ) |
\cx | The control character corresponding to x |
Character classes | |
[abc] | a , b , or c (simple class) |
[^abc] | Any character except a , b , or c (negation) |
[a-zA-Z] | a through z or A through Z , inclusive (range) |
[a-d[m-p]] | a through d , or m through p : [a-dm-p] (union) |
[a-z&&[def]] | d , e , or f (intersection) |
[a-z&&[^bc]] | a through z , except for b and c : [ad-z] (subtraction) |
[a-z&&[^m-p]] | a through z , and not m through p : [a-lq-z] (subtraction) |
Predefined character classes | |
. | Any character (may or may not match line terminators) |
\d | A digit: [0-9] |
\D | A non-digit: [^0-9] |
\s | A whitespace character: [ \t\n\x0B\f\r] |
\S | A non-whitespace character: [^\s] |
\w | A word character: [a-zA-Z_0-9] |
\W | A non-word character: [^\w] |
POSIX character classes (US-ASCII only) | |
\p{Lower} | A lower-case alphabetic character: [a-z] |
\p{Upper} | An upper-case alphabetic character: [A-Z] |
\p{ASCII} | All ASCII: [\x00-\x7F] |
\p{Alpha} | An alphabetic character: [\p{Lower}\p{Upper}] |
\p{Digit} | A decimal digit: [0-9] |
\p{Alnum} | An alphanumeric character: [\p{Alpha}\p{Digit}] |
\p{Punct} | Punctuation: One of !"#$%&'()*,-./:;<=>?@[\]^_`{|}~ |
\p{Graph} | A visible character: [\p{Alnum}\p{Punct}] |
\p{Print} | A printable character: [\p{Graph}] |
\p{Blank} | A space or a tab: [ \t] |
\p{Cntrl} | A control character: [\x00-\x1F\x7F] |
\p{XDigit} | A hexadecimal digit: [0-9a-fA-F] |
\p{Space} | A whitespace character: [ \t\n\x0B\f\r] |
Classes for Unicode blocks and categories | |
\p{InGreek} | A character in the Greek block (simple block) |
\p{Lu} | An uppercase letter (simple category) |
\p{Sc} | A currency symbol |
\P{InGreek} | Any character except one in the Greek block (negation) |
[\p{L}-[\p{Lu}]] | Any letter except an uppercase letter (subtraction) |
Boundary matchers | |
^ | The beginning of a line |
$ | The end of a line |
\b | A word boundary |
\B | A non-word boundary |
\A | The beginning of the input |
\G | The end of the previous match |
\Z | The end of the input but for the final terminator, if any |
\z | The end of the input |
Greedy quantifiers | |
X? | X , once or not at all |
X* | X , zero or more times |
X+ | X , one or more times |
X{n} | X , exactly n times |
X(n,} | X , at least n times |
X{n,m} | X , at least n but not more than m times |
Reluctant quantifiers | |
X?? | X , once or not at all |
X*? | X , zero or more times |
X? | X , one or more times |
X{n}? | X , exactly n times |
X(n,}? | X , at least n times |
X{n,m}? | X , at least n but not more than m times |
Possessive quantifiers | |
X? | X , once or not at all |
X* | X , zero or more times |
X | X , one or more times |
X{n} | X , exactly n times |
X(n,} | X , at least n times |
X{n,m} | X , at least n but not more than m times |
Logical operators | |
XY | X followed by Y |
X|Y | Either X or Y |
( | X , as a capturing group |
Back references | |
\n | Whatever the n th capturing group matched |
Quotation | |
\ | Nothing, but quotes the following character |
\Q | Nothing, but quotes all characters until \E |
\E | Nothing, but ends quoting started by \Q |
Special constructs (non-capturing) | |
(?:X) | X , as a non-capturing group |
(?idmsux-idmsux) | Nothing, but turns match flags on - off |
(?idmsux-idmsux:X) | X , as a capturing group with the given flags on - off |
(?=X) | X , via zero-width positive lookahead |
(?!X) | X , via zero-width negative lookahead |
(?<=X) | X , via zero-width positive lookbehind |
(?<!X) | X , via zero-width negative lookbehind |
(?>X) | X , as an independent, non-capturing group |
Backslashes, escapes, and quoting
The backslash character (\
) serves to introduce escaped constructs, as defined in the table above, as well as to quote characters that otherwise would be interpreted as unescaped constructs. Thus the expression \\
matches a single backslash and \{
matches a left brace.
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.
Line terminators
A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:
\n
),\r\n
),\r
),\u0085
),\u2028
), or\u2029
).If UNIX_LINES mode is activated, then the only line terminators recognized are newline characters.
The regular expression .
matches any character except a line terminator unless the DOTALL flag is specified.
Groups and capturing
Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C)))
, for example, there are four such groups:
((A)(B(C)))
(A)
(B(C))
(C)
Group zero always stands for the entire expression.
Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete.
The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string aba
against the expression (a(b)?)
, for example, leaves group two set to b
. All captured input is discarded at the beginning of each match.
Groups beginning with (?
are pure groups that do not capture text and do not count towards the group total.
Regex modes
Regex patterns can be run in different modes. The following table lists all the modes, along with the mode symbols used in Judo regex.
Mode | Symbol | Meaning |
---|---|---|
CANON_EQ | c | Enable canonical equivalence, so that two characters will be considered to match if, and only if, their full canonical decompositions match. The expression a\u030A , for example, will match the string å when this flag is specified. By default, matching does not take canonical equivalence into account. |
CASE_INSENTITIVE | i | Enables case-insensitive matching. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag. Case-insensitive matching can also be enabled via the embedded flag expression (?i) . |
COMMENTS | x | Permits whitespace and comments in pattern, that whitespace is ignored, and embedded comments starting with # are ignored until the end of a line. Unix lines mode can also be enabled via the embedded flag expression (?x) . |
DOTALL | s | Enables dotall mode, where the expression . matches any character, including a line terminator. By default this expression does not match line terminators. Dotall mode can also be enabled via the embedded flag expression (?s) . |
MULTILINE | m | Enables multiline mode, where the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence. Multiline mode can also be enabled via the embedded flag expression (?m) . |
UNICODE_CASE | u | Enables Unicode-aware case folding. When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case folding can also be enabled via the embedded flag expression (?u) . |
UNIX_LINES | l | Enables Unix lines mode, that only the '\n' line terminator is recognized in the behavior of . , ^ , and $ . Unix lines mode can also be enabled via the embedded flag expression (?d) . |
Regex support in Judo is very simple; there is no extra operators or special syntax. The string data type has these regex methods: matches()
, matchesStart()
, replaceAll()
, replaceFirst()
, split()
and match()
. All these methods take a pattern as their first parameter. The pattern can be a single string, or an array of two strings: the first one is the pattern and the second is the modes.
Regex's are compiled by its engine before they can be used. This process can be expensive if repeated many times, so Judo caches all the compiled ones. Regex's in different modes are different ones and are cached separately. Let us see some examples.
Listing 5.3 regex1.judo |
---|
input = 'aAabFOOAABFooABfOOb'; println input.replaceAll(['a*b','i'], '-'); // result: -FOO-Foo-fOO- input = 'zzdogzzdigzz'; println input.replaceFirst('d.g','cat'); // result: zzcatzzdigzz input = 'boo:and:foo'; println input.split(':',2); // result: [boo,and:foo] println input.split(':',5); // result: [boo,and,foo] println input.split(':',-1); // result: [boo,and,foo] println input.split('o',5); // result: [b,,:and:f,,] println input.split('o',-1); // result: [b,,:and:f,,] println input.split('o',0); // result: [b,,:and:f] println input.split('o'); // result: [b,,:and:f] |
TODO: To be expanded with more examples, including various modes and case studies such as log analyzers and such.
Both date and time literals are specified by the same Date
keyword. All parts of date/time can be specified in this sequence:
Date(year, month, day, hour, minute, second, milli-second)
where month is 1 through 12, day is the day of the month. The rest are obvious. The time components, e.g., hour, minute, etc., can be omitted; the missing components are 0's. If no parameters are supplied, Date()
itself represents the current time.
Judo is a great tool to create network client programs. Security is one of the major concerns in any distributed environment. Password is the most commonly used mechanism, but leaving plain text passwords in scripts or configuration files is always a huge hole in security. Judo address this issue by introducing a special data type, Secret
. Secret
values are created with this constructor:
Secret( encrypted_password [ , decryptor ] )
The decryptor
is any object that implements the method decrypt()
, which takes a string and returns another. It does not matter whether it is implemented in Judo or Java, though most likely it is in Java. The encrypted value must be a text string. How to obtain it is up to your crypto package that your decryptor is part of. If no decryptor is specified, or the decryptor is not found (i.e., evaluated to be null
,) by default the password is returned as-is. But would this Secret
value really protect the password? Judo is open source; what if some attacker plant a sniffer in the code that gets the returned password from the decryptor?
The idea for this Secret
mechanism is to use different decryptor objects in different environments. Take a look at this example:
decryptor = null; { decryptor = new java::com.xxx.util.MyCrypto; catch: ; // ignore any exceptions. } // Use a Secret value as password to connect to a database: connect to dbUrl, 'dbuser', Secret('abcdef', decryptor); ......
This script is run in a test environment and in the production environment. Both environments have their own database schemas, user names and passwords. On the test environment, we do not have the Java class com.xxx.util.MyCrypto
in the classpath, so the decryptor
ends up being null
, which is passed to the Secret
constructor; therefore, in the test environment, the password for the connect
command is actually abcdef
, which is ok. In the production, the Java class com.xxx.util.MyCrypto
is deployed in the classpath (that runs Judo), so decryptor
will hold an instance of that Java class; the class's decrypt()
method will be called and turn abcdef
into THIS IS SOMETHING YOU'D NEVER EVER HAVE GUESSED
, which is the password for the production database. Because of the only the production deployer has the decryptor Java class, the security is not compromized in the script, which is checked in to the Configuration Management system that every developer has access to. The same script can be easily run as-is in various environments, including production.