In this chapter: Working in the File System » The Current Directory and Change Directory » Make and Remove Directory » Rename and Move Files List and Process Files » File Selection » Obtain Selected Files in a List or a Tree » Count Files and Directories » Get Statistics about Files » Summary of listFiles Return Values » Remove and Set Attributes on Selected Files » Run Shell Commands on Selected Files » Arbitrarily Process Selected Files » List Files and Get Information in Archives » Process Files in Zip Archives Copy and Archive Files » Copy a Single File to a Different Name » Copy Files to a Different Location » Copy Files into Archives » Save Multiple File Sets into a Single Archive » Copy Public Internet Resources Other File Utilities » Encrypting and Decrypting Files and Data » Chopping and Assembling Files

Book: The Judo Language 0.9

Chapter 15. File System and Archives

By James Jianbo Huang

non-printer version

Synopsis: Judo has powerful commands to process files and archives. You can change the current working directory anywhere in the file system, make or remove directories, and move or rename files and directories. The listFiles command can selectively list files in a directory or an archive file. For the selected files, it can return them in an array or a tree, or find their statistics such as count, size, compressedSize, lines and words. Or, actions can be performed upon each file selected, including remove, setReadOnly, setFileTime, any OS shell commands via the exec clause or custom code. Another powerful command is copy, which uses the same file selection options as in listFiles and copy files to another file system location or zip, jar and tar archives. It has intimate support for zip and jar file creation, including compress or store and the manifest text. Beyond these file system and archive operations, Judo also provides some useful file utilities such as file encrypting and decrypting, and chopping and re-assembling.

Working in the File System

The Current Directory and Change Directory

Judo maintains its own current directory in the local file system. This currenct directory is used as the default location for the subsequent file operations as well as the exec command (see 16. Run Executables) that runs native executables. It starts with the directory of the JVM process, and can be changed via the cd command:

ChangeDirectory ::= cd [ Expr ( , Expr )* ]

If no parameters are specified, the cd command changes to the home directory of the user. In the path expression, it can start with ~ to represent the home directory. The path can be absolute or relative to the current directory in Judo. The file separator can use slashes always, although on Windows you can use back slashes. By the way, the current directory within Judo can be obtained with the system function curDir().

C:\>type pwd.judo
cd;          println curDir();
cd '~/..';   println curDir();
cd 'jhuang'; println curDir();

C:\>java judo -q pwd.judo
C:/Documents and Settings/jhuang
C:/Documents and Settings
C:/Documents and Settings/jhuang

The cd command can take multiple paths; the end result is the last path. This can be useful to dynamically move to a specific directory depending on different situations.

Make and Remove Directory

In Judo, you make a new directory with the mkdir command and remove one with the rmdir command:

MakeDirectory ::= mkdir Expr

If you are making a directory like 'x/y/z/' and directory 'x' does not exist, Judo creates the intermediate directories. If the target directory already exists, mkdir silently quits.

RemoveDirectory ::= rmdir Expr [ force ]

When removing a directory, normally the directory must be empty. To force a removal of a directory, use the force option.

Rename and Move Files

Renaming and moving files are both achieved by the move command. Its syntax is:

MoveCommand ::= move Expr , Expr

The first parameter is the source file or directory, the second is the target, which can be either a non-existing path name or an existing directory. If the target is a directory, the source file or directory is moved into that directory. If the target exists and is not a directory, an error occurs. Again, let's experiment on the command line:

C:\z>dir
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\z

09/04/2004  07:05a      <DIR>          .
09/04/2004  07:05a      <DIR>          ..
09/04/2004  07:04a                   5 alfa
09/04/2004  07:05a      <DIR>          ddd
               1 File(s)              5 bytes
               3 Dir(s)  38,403,932,160 bytes free

C:\z>java judo -x "move 'alfa', 'beta'"

C:\z>java judo -x "move 'beta', 'ddd'"

C:\z>java judo -x "move 'ddd/beta', 'gamma'"

C:\z>dir
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\z

09/04/2004  07:06a      <DIR>          .
09/04/2004  07:06a      <DIR>          ..
09/04/2004  07:04a                   5 gamma
09/04/2004  07:05a      <DIR>          ddd
               1 File(s)              5 bytes
               3 Dir(s)  38,403,796,992 bytes free

List and Process Files

The listFiles command is used to find and process files or directories. Its syntax is:

ListCommand	::=	( `listFiles` [ `<` Expr `>` ] \| `ls` ) [ FileSelection ] ( ListOption )* [ StatsOption \| Action ]
FileSelection	::=	Expr ( `except` Expr \| `in` Expr )*
ListOption	::=	`ordered` [ `by` ( `date` \| `size` \| `extension` ) ] \| `limit` Expr \| `as` `tree` \| `recursive` \| `noHidden` \| `fileOnly` \| `showDir` \| `dirOnly`
StatsOption	::=	`count` \| ( `size` \| `compressedSize` \| `lines` \| `words` \| `perFile` )+
Action	::=	`remove` \| `setFileTime` [ Expr ] \| `setReadOnly` \| `addToClasspath` \| `exec` Expr \| Block

In plain English, listFiles uses a FileSelection along with zero or more list options to find the files and directories (or folders) in a file system directory or from an archive file, and can do the following three things:

return the found files in an array or a tree,
return some statistics, such as count, size, lines or words; or
perform a number of operations on the found files or directories including:
1. predefined operations such as remove, setFileTime and setReadOnly,
2. execute shell command with the exec clause, or
3. run custom processing in a Block of code.

The ls command works like listFiles but just prints out the files and returns nothing. It emulates the Unix shell command ls or Windows dir command. For ls, no actions are allowed.

This command is rich and have many options. Some options and/or actions are valid and compatible with each other. The Judo parser enforces these rules and will issue error messages if incompatible options are specified. Let's see how this command can be used to achieve various purposes on files, starting with the FileSelection.

By default, fileOnly is on, meaning that only file names are returned or displayed. To show directory names along with file names, use the showDir option. To see directory names only, use dirOnly.

File Selection

The FileSelection includes an inclusive list, an exclusive list and a base, all of which are optional. The following is a couple of examples:

C:\>type ls.judo
ls '*.java, *.judo' except '*/save/*, */alfa*' in 'c:/temp';

C:\>java judo -q ls.judo
C:/temp/mytest.judo
C:/temp/Test.java

The ls command here lists and prints the files in a file system direcotry. The following example lists and prints files in a jar archive:

C:\devenv\envroot\projects\judoscript-0.9\testcases\2.ess_fs_archive>type ls_jar.judo
ls 'src/*.java' except '*/save/*, */alfa*' in 'c:/src.jar';

C:\>java judo -q ls_jar.judo
src/judo.java
src/juspProc.java
src/rjudo.java
src/rjudo_server.java
src/vjudo.java

The inclusive and exclusive (except) lists are expressions evaluated as strings, and the string value is a comma-separated list of path name patterns that may contain wildcard characters like * (for 0 or more characters) and ? (for a single character). Because the path names are all absolute during list operation, it is wise to use * as prefix to path patterns. If the inclusive list is not present, it is assumed '*' except for the remove operation. If the base directory or archive file is not specified, the list operation starts at the current directory.

There is one tricky situation with the inclusive list. Since the inclusive list can be any expression, it can also be a variable name. If a variable name happens to be a listFiles option name, confusion arises:

fileOnly = '*.java, *.judo';
listFiles fileOnly; // !#@~*$

The rule is, listFiles option names take precedence over variable names. So variable fileOnly can never be used listFiles and ls commands; you have to use a different variable name.

The listFiles and ls commands can also take options like recursive, noHidden, fileOnly and dirOnly; they direct the command and affects the file selection in a different way. Their meanings should be self-evident.

So far we have seen the simple way of listing files, that is, print out via the ls command. In contrast, listFiles either returns the result to the script for programmatic processing or it does in-line processing with the specified actions. We will cover all the usages in the following sections.

Obtain Selected Files in a List or a Tree

In the simplest case, listFiles returns the found file path names in an Array and store it in the predefined local variable $_. The following example use listFiles to get a list of file names and then print them out, behaving just like ls:

Listing 15.1 ls_clone.judo
listFiles '*.java, *.judo' except '*/save/*, */alfa*' in 'c:/temp'; for x in $_ { println x; }

Again, be careful with $_ since a number of statements in Judo uses it and can be overwritten without notice. It is safer to assign it to a dedicated variable immediately following the listFiles command.

When recursive option is specified, listFiles and ls commands recurse into sub-directories for all files and/or directories that match the file selection criteria. The returned files can be sorted by path name, file time or size via the ordered by clause.

Sometimes, you just need to find a few files. In this case, specify the limit clause to improve performance. The following code snippet assumes a junit.jar lies somewhere in the vicinity of the script, finds it and adds it to the classpath:

Listing 15.2 prepare_unittest.judo
cd #script.getFilePath(), '..'; // move up one directory listFiles '*/junit.jar' recursive limit 1; if $_.length <= 0 { println <err> "junit.jar is not found. Can't proceed with testing."; return; } #classpath.add( $_[0] ); // now, do some unit testing // ....

Speaking of classpath, listFiles provides another action, addToClasspath, that makes this even easier:

Listing 15.3 addtoclasspath.judo
cd #script.getFilePath(), '../lib/'; // move to the lib/ directory listFiles '*.jar, *.zip' addToClasspath; println #classpath; // verify

Most of the time, listFiles return path names as an Array in the $_ local variable. You can also merge the result into an existing array like this:

Listing 15.4 add_libjars_2cp.judo
/* * To add all the jar/zip files in ${deploy}/lib and ${thirdparty}/lib * into the (user) classpath. */ arr = []; listFiles <arr> '*.jar, *.zip' in '${deploy}/lib'; listFiles <arr> '*.jar, *.zip' in '${thirdparty}/lib'; #classpath.add(arr);

By the way, you can do addToClasspath on the second listFiles command to add all the found files to the classpath.

Sometimes it is convenient to process the content of a directory as a tree rather than a list (or array). This can be easily done with the as tree option:

C:\>type get_tree.judo
listFiles '*.java, *.judo' except '*/save/*, */alfa*' in 'c:/temp'
  recursive as tree;

println $_;

for x in $_.getChildren() {
  println x;
}

C:\>java judo -q get_tree.judo
{isDir=true,path=C:/temp}
{isDir=true,path=C:/temp/Adobe}
{isDir=true,path=C:/temp/Cookies}
{isDir=true,path=C:/temp/History}
{isDir=true,path=C:/temp/Temporary Internet Files}
{isDir=true,path=C:/temp/VBE}
{path=C:/temp/Test.java}
{path=C:/temp/mytest.judo}

The returned value is a TreeNode object for the root directory, which may have one or more children nodes holding information about the files and directories. Each node has a path attribute and an isDir boolean attribute. Needless to say, only directory nodes have children nodes. Refer to Tree Node for how to work with trees. For instance, you can use TreeNode's traversal methods like bfsAllNodes() or dfsAllNodes() to display all the nodes:

Listing 15.5 get_dir_tree.judo
listFiles '*.java, *.judo' except '*/save/*, */alfa*' in 'c:/temp' dirOnly recursive as tree; for x in $_.dfsAllNodes() { println x; }

The listFiles command can not only find files but also calculate some statistics about the files, that is, size, lines, words and count; for files within archives, you can get compressedSize, too. The return value for these commends are all different based on the options. The count is an individual option, where size, compressedSize, lines and words can be used together.

Count Files and Directories

The count option returns counts of the selected files and directories. It returns an array of three elements: count of files, count of directories or folders, and the total count. The last element is redundant: it is always the sum of the other two. The following example demonstrates its use by directly running some code on command-line:

C:\src>java judo -x "listFiles '*.java' count; println $_"
[5,0,5]

C:\src>java judo -x "listFiles '*.java' recursive count; println $_"
[289,29,318]

C:\src>java judo -x "listFiles '*.java' recursive fileOnly count; println $_"
[289,0,289]

Get Statistics about Files

You can get file statistics with the size, compressedSize, lines and words options. These options can be used together as well. When used individually, the return value is a number; if multiple options are used together, an array of numbers are returned.

Listing 15.6 dirstats.judo
listFiles '*' in 'C:/src/com/judoscript' dirOnly; for x in $_ { // get status for each directory listFiles '*.java, *.jj' in x recursive size lines words; println $_[0]:>8, ' ', $_[1]:>6, ' ', $_[2]:>6, ' ', x; }

The result is something like this:

   35889     993    4286  C:/src/com/judoscript/xml
  524476   15003   63402  C:/src/com/judoscript/util
   20219     455    2321  C:/src/com/judoscript/user
   40954    1273    4451  C:/src/com/judoscript/studio
  215570    6361   20776  C:/src/com/judoscript/parser
   12537     391    1398  C:/src/com/judoscript/jusp
    7181     195     883  C:/src/com/judoscript/jdk14
   29410     654    2911  C:/src/com/judoscript/gui
   32918     877    3775  C:/src/com/judoscript/ext
   70331    1886    7652  C:/src/com/judoscript/db
  297949    8179   33558  C:/src/com/judoscript/bio
       0       0       0  C:/src/com/judoscript/ant

And we just found an empty directory that can be removed.

You can also get the statistics per file. The return value is a SortedMap which are path names mapped a number or an array of numbers, depending on how many options are specified. The following example shows both cases:

Listing 15.7 filestats.judo
listFiles '*.java' fileOnly lines perFile; for f in $_ { stats = $_.(f); println stats:>8, ' ', f; } listFiles '*.java' fileOnly size lines words perFile; for f in $_ { stats = $_.(f); println stats[0]:>8, ' ', stats[1]:>6, ' ', stats[2]:>6, ' ', f; }

    ....  .......
      35  C:\src\com\judoscript\ValueBase.java
     369  C:\src\com\judoscript\ValueSpecial.java
      45  C:\src\com\judoscript\Variable.java
    1460  C:\src\com\judoscript\VariableAdapter.java
     273  C:\src\com\judoscript\VersionInfo.java
      83  C:\src\com\judoscript\_Thread.java
   16991  TOTAL
   .....    ....    ....  .......
    1409      35     190  C:\src\com\judoscript\ValueBase.java
   15734     369    1860  C:\src\com\judoscript\ValueSpecial.java
    1759      45     226  C:\src\com\judoscript\Variable.java
   63038    1460    6331  C:\src\com\judoscript\VariableAdapter.java
   14173     273    1762  C:\src\com\judoscript\VersionInfo.java
    2665      83     317  C:\src\com\judoscript\_Thread.java
  628614   16991   73332  TOTAL

For Unix users, features likes size, lines and words may remind you of the wc utility.

Summary of listFiles Return Values

Since listFiles command can be used for so many purposes, let us summarize its return values.

**Table 15.1 Return Values for listFiles**
Sample Command	Return Value
`listFiles`	`$_` is an `Array` of paths.
`listFiles as tree`	`$_` is a `TreeNode` with these attributes: `path` and `isDir`.
`listFiles count`	`$_` is an `Array` of three elements: `$_[0]` is the count of files, `$_[1]` is the count of directories or folders, and `$_[2]` is the sum of the other two.
`listFiles size`	`$_` is a number as the cumulative size of all files. The option can also be `compressedSize`, `lines` or `words`.
`listFiles size perFile`	`$_` is a `SortedMap`, where each path is mapped to a number as the size of that file.
`listFiles size lines words`	`$_` is an `Array` of three numbers for the size, number of lines and number of words.
`listFiles lines size`	`$_` is an `Array` of three numbers for the number of lines and size.
`listFiles lines size perFile`	`$_` is a `SortedMap`, where each path is mapped to an `Array` of three numbers for the number of lines and size.
`listFiles exec '..'`	`$_` is an `Array` of all the path names processed. See Run Shell Commands on Selected Files.
`listFiles { ... }`	`$_` is an `Array` of all the path names processed. See Arbitrarily Process Selected Files.

Remove and Set Attributes on Selected Files

The listFiles command natively supports three operations on selected files via the keywords remove, setFileTime and setReadOnly. The command returns the path names of all the files and directories affected.

The remove command can remove files and empty directories in the file system. The following example removes all the files left over by the vi editor:

C:\src>java judo -x "listFiles '*~' recursive remove; println $_"

To remove a directory that is not empty, you would have to use the rmdir mentioned below.

The setFileTime command can optionally take a Date value. If the time value is not specified, the current time is used, and this command becomes much like the Unix touch utility.

The setReadOnly command is used to set the read-only flag on a file or directory in the file system. The Java platform does not support setting files to be read-write, so this setReadOnly command does not have a counterpart to set files and directories to be read-write.

Of course, you can use the operationg system's commands or utilities to do these operations via the exec command explained next.

Run Shell Commands on Selected Files

For the selected files in the listFiles command, you can apply any operating system shell commands or utilities on them via the exec clause. The following is an example that does the same as setReadOnly:

Listing 15.8 list_exec.judo
listFiles exec isWindows() ? 'attrib +r $_' : 'chmod 666 $_';

The exec clause takes an operating system command-line, which uses $_ to represent the current file being processed. In this example, we use the attrib utility on the Windows platform or the chmod command on Unix to set the read-only attribute for files and directories. You can easily modify this to make files writable as well.

Sometimes it is more efficient to run native executables directly if possible. This is because the listFiles command runs the command line for a single file or path, where the native executable can take wildcard characters in the parameter to handle a set of files, for instance:

% chmod 666 *.java

The processed file names are also returned in an array as $_.

The exec command is really a shortcut. For example, listFile exec 'chmod 666 $_'; is a shortcut for:

listFile;
for x in $_ { exec 'chmod 666 ${x}'; }

This is also true for arbitrary file processing that is discussed next.

Arbitrarily Process Selected Files

In addition to the exec clause to run operating system commands on the selected files, you can specify a block of code to process the selected file. Again, the file name being processed in stored in $_. The following example counts the total blank and non-blank lines in all the files:

Listing 15.9 count_blank.judo
cnt1 = 0; cnt2 = 0; listFiles '*.java, *.jj' in 'c:/src' recursive { do $_ as lines { // now, $_ has become the line just read! if ($_.isEmpty()) ++cnt1; else ++cnt2; } } println ' Blank lines: ', cnt1:>7; println 'Non-Blank lines: ', cnt2:>7; println ' Total Files: ', $_.length:>7;

The result is:

    Blank lines:    5983
Non-Blank lines:   47562
    Total Files:     282

This command also returns all the path names in an array. So in this example, the meaning of all three occurrances of $_ are all different. The first $_ within do $_ as lines is a string representing the path name being processed; the second $_ in if ($_.isEmpty()) is the line just read from the file; and the last $_ in println ' Total Files: ', $_.length:>7 is an array of all the path names just processed.

Next is a more elaborate example. We will go through the source code tree and update the copyright note for all the source files owned by the project. The modified source files will be generated in another directory and left there.

Listing 15.10 upd_copyright.judo
src = 'C:/src/'; src_len = src.length(); target = 'C:/temp/new_src/'; mkdir target; listFiles '*.java, *.jj' in src fileOnly recursive { // Construct the path for the new file: var path = $_.getFilePath(); var file = $_.getFileName(); var newPath = target + path.substring(src_len); mkdir newPath; // make sure the dir is there; ok if exists. var newfile = openTextFile(newPath + file, 'w'); // Process the lines in the source file: var updated = false; do $_ as lines { // now, $_ holds the line just read. if !updated && $_.startsWith(' * Copyright (C) 2001-') { println <newfile> ' * Copyright (C) 2001-', #year, ' James Huang http://www.judoscript.com'; updated = true; } else { println <newfile> $_; } } // Done. newfile.close(); println 'Updated ', path.substring(src_len), file; }

This concludes our discussion of the listFiles command. The listFiles command does much more than what the name suggests. It is indeed a file processor, allowing you to obtain a set of files and directories, and process them individually (via the exec and a code block) or collective (such as getting the result in a tree via the as tree option). It can also return a number of statistics. This single command includes functionalities of a number of popular shell utilities, such as ls, wc and touch on Unix.

Beyond individual file processing, Judo has some other commands to do copying and moving. These are covered in the rest of this chapter.

List Files and Get Information in Archives

As we have mentioned eariler, the listFiles and ls commands can be applied to contents within zip, jar and tar (gzipped or not) archives. Obviously files and folders contained in archives are all read-only, so commands like remove, setFileTime and setReadOnly don't apply, and shell command with exec generally doesn't make sense. Information gathering is valid, and you can do read-only processing on files in zip or jar achives. You can't do anything in a tar archive due to its sequential nature.

Let's take a look at some examples involving zip and tar archives, starting with a zip archive like this:

C:\>jar tvf awebapp.zip
   276 Tue Jun 15 13:33:32 PDT 2004 index.jsp
   347 Mon Aug 30 13:32:30 PDT 2004 login.jsp
     0 Mon Aug 30 13:28:26 PDT 2004 META-INF/
    55 Tue Jun 15 13:33:32 PDT 2004 META-INF/MANIFEST.MF
     0 Mon Aug 30 13:34:16 PDT 2004 WEB-INF/
     0 Mon Aug 30 13:41:06 PDT 2004 WEB-INF/classes/
     0 Mon Aug 30 13:38:44 PDT 2004 WEB-INF/classes/foo/
     0 Mon Aug 30 13:39:06 PDT 2004 WEB-INF/classes/foo/bar/
  1604 Mon Aug 30 13:39:06 PDT 2004 WEB-INF/classes/foo/bar/LoginDAO.class
     0 Mon Aug 30 13:36:22 PDT 2004 WEB-INF/lib/
118726 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/lib/commons-beanutils.jar
 31605 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/lib/commons-logging.jar
498051 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/lib/struts.jar
     0 Mon Aug 30 13:28:24 PDT 2004 WEB-INF/src/
  3672 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/src/build.xml
     0 Mon Aug 30 13:40:54 PDT 2004 WEB-INF/src/java/
     0 Mon Aug 30 13:37:46 PDT 2004 WEB-INF/src/java/foo/
     0 Mon Aug 30 13:38:28 PDT 2004 WEB-INF/src/java/foo/bar/
  1026 Mon Aug 30 13:38:28 PDT 2004 WEB-INF/src/java/foo/bar/LoginDAO.java
  1923 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/src/README.txt
  8868 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/struts-bean.tld
 66192 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/struts-html.tld
 14511 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/struts-logic.tld
  1942 Tue Jun 15 13:33:32 PDT 2004 WEB-INF/web.xml

C:\>java judo -q "ls '*' in 'awebapp.zip' recursive"
WEB-INF/
WEB-INF/web.xml
WEB-INF/struts-logic.tld
WEB-INF/struts-html.tld
WEB-INF/struts-bean.tld
WEB-INF/src/
WEB-INF/src/README.txt
WEB-INF/src/java/
WEB-INF/src/java/foo/
WEB-INF/src/java/foo/bar/
WEB-INF/src/java/foo/bar/LoginDAO.java
WEB-INF/src/build.xml
WEB-INF/lib/
WEB-INF/lib/struts.jar
WEB-INF/lib/commons-logging.jar
WEB-INF/lib/commons-beanutils.jar
WEB-INF/classes/
WEB-INF/classes/foo/
WEB-INF/classes/foo/bar/
WEB-INF/classes/foo/bar/LoginDAO.class
META-INF/
META-INF/MANIFEST.MF
login.jsp
index.jsp

C:\>java judo -q "ls '*' in 'awebapp.zip' fileOnly recursive"
WEB-INF/web.xml
WEB-INF/struts-logic.tld
WEB-INF/struts-html.tld
WEB-INF/struts-bean.tld
WEB-INF/src/README.txt
WEB-INF/src/java/foo/bar/LoginDAO.java
WEB-INF/src/build.xml
WEB-INF/lib/struts.jar
WEB-INF/lib/commons-logging.jar
WEB-INF/lib/commons-beanutils.jar
WEB-INF/classes/foo/bar/LoginDAO.class
META-INF/MANIFEST.MF
login.jsp
index.jsp

The following program gets the overall sizes of the top level folders:

Listing 15.11 dirstats_zip.judo
listFiles '*' in 'awebapp.zip' dirOnly; for x in $_ { // get status for each directory listFiles '*.java, *.jj' in x recursive size compressedSize; println $_[0]:>8, ' ', $_[1]:>8, ' ', x; }

C:\>java judo -q dirstats_zip.judo
  748120    590201  WEB-INF/
      55        56  META-INF/

The same operations can be done on tar archives:

C:\>tar tvfz awebapp.tar.gz
drwxr-xr-x jhuang/None       0 2004-08-30 13:28:26 META-INF/
-rw-r--r-- jhuang/None      55 2004-06-15 13:33:33 META-INF/MANIFEST.MF
drwxr-xr-x jhuang/None       0 2004-08-30 13:34:17 WEB-INF/
drwxr-xr-x jhuang/None       0 2004-08-30 13:41:08 WEB-INF/classes/
drwxr-xr-x jhuang/None       0 2004-08-30 13:38:46 WEB-INF/classes/foo/
drwxr-xr-x jhuang/None       0 2004-08-30 13:39:08 WEB-INF/classes/foo/bar/
-rw-r--r-- jhuang/None    1604 2004-08-30 13:39:08 WEB-INF/classes/foo/bar/LoginDAO.class
drwxr-xr-x jhuang/None       0 2004-08-30 13:36:23 WEB-INF/lib/
-rw-r--r-- jhuang/None  118726 2004-06-15 13:33:33 WEB-INF/lib/commons-beanutils.jar
-rw-r--r-- jhuang/None   31605 2004-06-15 13:33:33 WEB-INF/lib/commons-logging.jar
-rw-r--r-- jhuang/None  498051 2004-06-15 13:33:33 WEB-INF/lib/struts.jar
drwxr-xr-x jhuang/None       0 2004-08-30 13:28:26 WEB-INF/src/
-rw-r--r-- jhuang/None    3672 2004-06-15 13:33:33 WEB-INF/src/build.xml
drwxr-xr-x jhuang/None       0 2004-08-30 13:40:54 WEB-INF/src/java/
drwxr-xr-x jhuang/None       0 2004-08-30 13:37:48 WEB-INF/src/java/foo/
drwxr-xr-x jhuang/None       0 2004-08-30 13:38:28 WEB-INF/src/java/foo/bar/
-rw-r--r-- jhuang/None    1026 2004-08-30 13:38:28 WEB-INF/src/java/foo/bar/LoginDAO.java
-rw-r--r-- jhuang/None    1923 2004-06-15 13:33:33 WEB-INF/src/README.txt
-rw-r--r-- jhuang/None    8868 2004-06-15 13:33:33 WEB-INF/struts-bean.tld
-rw-r--r-- jhuang/None   66192 2004-06-15 13:33:33 WEB-INF/struts-html.tld
-rw-r--r-- jhuang/None   14511 2004-06-15 13:33:33 WEB-INF/struts-logic.tld
-rw-r--r-- jhuang/None    1942 2004-06-15 13:33:33 WEB-INF/web.xml
-rw-r--r-- jhuang/None     276 2004-06-15 13:33:33 index.jsp
-rw-r--r-- jhuang/None     347 2004-08-30 13:32:31 login.jsp

C:\>java judo -q "ls '*' in 'awebapp.tar.gz' recursive"
login.jsp
index.jsp
WEB-INF/
WEB-INF/web.xml
WEB-INF/struts-logic.tld
WEB-INF/struts-html.tld
WEB-INF/struts-bean.tld
WEB-INF/src/
WEB-INF/src/README.txt
WEB-INF/src/java/
WEB-INF/src/java/foo/
WEB-INF/src/java/foo/bar/
WEB-INF/src/java/foo/bar/LoginDAO.java
WEB-INF/src/build.xml
WEB-INF/lib/
WEB-INF/lib/struts.jar
WEB-INF/lib/commons-logging.jar
WEB-INF/lib/commons-beanutils.jar
WEB-INF/classes/
WEB-INF/classes/foo/
WEB-INF/classes/foo/bar/
WEB-INF/classes/foo/bar/LoginDAO.class
META-INF/
META-INF/MANIFEST.MF

C:\>java judo -q "ls '*' in 'awebapp.tar.gz' fileOnly recursive"
login.jsp
index.jsp
WEB-INF/web.xml
WEB-INF/struts-logic.tld
WEB-INF/struts-html.tld
WEB-INF/struts-bean.tld
WEB-INF/src/README.txt
WEB-INF/src/java/foo/bar/LoginDAO.java
WEB-INF/src/build.xml
WEB-INF/lib/struts.jar
WEB-INF/lib/commons-logging.jar
WEB-INF/lib/commons-beanutils.jar
WEB-INF/classes/foo/bar/LoginDAO.class
META-INF/MANIFEST.MF

Files and folders within zip and tar archives all start at the root with no name. This fact becomes obvious when you get the file names in a tree:

Listing 15.12 get_tree_zip.judo
listFiles '*' in 'awebapp.zip' fileOnly recursive as tree; for x in $_.dfsAllNodes() { println x; }

C:\>java judo -q get_tree_zip.judo
{isDir=true,path=}
{path=login.jsp}
{path=index.jsp}
{path=WEB-INF/web.xml}
{path=WEB-INF/struts-logic.tld}
{path=WEB-INF/struts-html.tld}
{path=WEB-INF/struts-bean.tld}
{path=WEB-INF/src/java/foo/bar/LoginDAO.java}
{path=WEB-INF/src/build.xml}
{path=WEB-INF/src/README.txt}
{path=WEB-INF/lib/struts.jar}
{path=WEB-INF/lib/commons-logging.jar}
{path=WEB-INF/lib/commons-beanutils.jar}
{path=WEB-INF/classes/foo/bar/LoginDAO.class}
{path=META-INF/MANIFEST.MF}

The first node, which is the root, has an empty path name.

Process Files in Zip Archives

Let's port the count_blank.judo program to make it work with files residing in a zip archive.

listFiles '*.java, *.jj' in 'C:/src.jar' recursive
{
  do $_ in 'C:/src.jar' as lines {
    ......
  }
}

For brevity we omitted unchanged parts. There are two significant changes: one is in listFiles ... in 'C:/src.jar', and the second is in do $_ in 'C:/src.jar' as lines. This should work and give the right result.

But there is one performance concern. The do $_ in 'C:/src.jar' as lines will open and close the zip archive for every single source file contained in the zip archive. Since the zip archive is already opened by the listFiles command itself, why not use that open zip archive for this purpose? The remedy is, Judo has provided a built-in parameter, $$archive, that holds the open zip archive and is available only in the processing block. Hence the revised version:

Listing 15.13 count_blank_zip.judo
cnt1 = 0; cnt2 = 0; listFiles 'src/com/judoscript/*.java, src/com/judoscript/*.jj' in 'C:/src.jar' recursive { do $_ in $$archive as lines { // now, $_ has become the line just read! if ($_.isEmpty()) ++cnt1; else ++cnt2; } } println ' Blank lines: ', cnt1:>7; println 'Non-Blank lines: ', cnt2:>7; println ' Total Files: ', $_.length:>7;

As discussed in 14. Print, File I/O and In-Script Data, you can open the files within a zip archive such as $$archive.openTextFile($_) to do more sophisticated read-only operations.

Copy and Archive Files

The Judo copy command operates between directories and archives such as zip, jar, tar and gzipped tar files. It has good support for jar files including jar file manifest. The command's synta is:

CopyCommand	::=	`copy` ( FileSelection \| URL ) ( `to` \| `into` ) Expr ( CopyOption \| ArchiveOption )+
CopyOption	::=	`force` \| `echo` \| `Echo` \| `keepDirs` \| `dupOk`
ArchiveOption	::=	`compress` \| `store` \| ( `under` \| `strip` \| `manifest` ) Expr
URL	::=	Expr

The file selection is exactly the same as that of the listFiles command, that is, files can be selected from a directory or an archive. The to and into clause specifies the target; the to clause is for a destination directory or a file (where the source must be a single file); the into clause specifies a new archive, whose type is determined by the file extension such as .zip, .jar, .war, .tar and .tar.gz. Next, let's see copying files in the local file system first.

Copy a Single File to a Different Name

In a copy command, when

the target in the to clause does not exist, or
the target exists and is not a directory,

then it is assumed that a single source file will be copied to this name. If the target path name is not absolute, it is relative to the current directory. If there are more than one source file or the source is a directory, an exception is raised. Let's again run some Judo code from the command line and see what happens:

C:\x>java judo -x "copy 'alfa' to 'beta'"

C:\x>md y

C:\x>java judo -x "copy 'alfa' to 'y'"

C:\x>java judo -x "copy 'alfa' to 'y/gamma'"

C:\x>dir
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\x

09/02/2004  01:46p      <DIR>          .
09/02/2004  01:46p      <DIR>          ..
08/30/2004  06:15a                   5 alfa
08/30/2004  06:15a                   5 beta
               2 File(s)             10 bytes
               2 Dir(s)  38,433,390,592 bytes free

C:\x>dir y
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\x\y

09/02/2004  01:49p      <DIR>          .
09/02/2004  01:49p      <DIR>          ..
08/30/2004  06:15a                   5 alfa
08/30/2004  06:15a                   5 gamma
               2 File(s)             10 bytes
               2 Dir(s)  38,433,390,592 bytes free

When copying files from a base, the relative paths can be retained via the keepDirs option.

C:\>md z

C:\>java judo -x "copy 'y/gamma' to 'z'"

C:\x>dir z
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\x\z

09/02/2004  01:58p      <DIR>          .
09/02/2004  01:58p      <DIR>          ..
08/30/2004  06:15a                  5 gamma
09/02/2004  01:57p      <DIR>          y
               1 File(s)              5 bytes
               3 Dir(s)  38,432,976,896 bytes free

C:\>java judo -x "copy 'y/gamma' to 'z' keepDirs"

C:\x>dir z\y
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\x\z\y

09/02/2004  01:57p      <DIR>          .
09/02/2004  01:57p      <DIR>          ..
08/30/2004  06:15a                   5 gamma
               1 File(s)              5 bytes
               2 Dir(s)  38,432,976,896 bytes free

Copying a single file is simple, but the power of copy command is to deal with a set of files.

Copy Files to a Different Location

Suppose you want to copy a tree of files and directories to a different directory, use the recursive option; the keepDirs option is implicitly turned on:

C:\>md y

C:\>java judo -x "copy '*' in 'C:/x' to 'C:/y' recursive"

When copying file(s) from one location to another in the file system, by default the copy command compares the file's time and size; if both time and size are the same between the source and target file, it passes the file without physically copying it. The force option disables this optimization. The echo option displays the source files being actually copied, and the Echo option displayes both copied and passed files.

Copy Files into Archives

The same copy is also a versatile archiving command. In the simplest form, it is almost identical to copying files in the file systems except for using into clause rather than to.

copy '*' in 'C:/x' into 'C:/test.zip' recursive;

The archive can be a zip, jar, tar or gzipped tar file, whose type is determined by the archive file extension. These extension are recognized: zip, jar, war, ear, rar, tar, taz and tar.gz; these extensions can be in mixed case as well. What if a file with an unknown extension is intended to be used as a, say, tar file? You can create the archive file via the createZip(), createJar() and createTar() system functions, and use that open archive object as the destiny:

zip = createZip('iamdoc.doc');
copy '*' recursive into zip;
zip.close();

tar = createTar('ship.tarball');
copy '*' recursive into tar;
tar.close();

The open archive object is important for archiving multiple sources, as discussed in Save Multiple File Sets into a Single Archive.

The copy command has these archiving options: compress, store, manifest, under and strip. The first three options are zip-/jar-specific.

The under option allows you to copy a tree of files under a specific prefix within the archive. Conversely, when copying files out of an archive, you can use strip to strip that prefix. Suppose we have a directory like this:

C:\src\com\judoscript\
C:\src\com\judoscript\util\
C:\src\com\judoscript\parser\
C:\src_native\

and we want to archive files in C:\src\ into a zip (or tar) file under src/ like this:

src_java/com/judoscript
src_java/com/judoscript/util/
src_java/com/judoscript/parser/

This is the way to do this:

copy '*' in 'C:/src/' recursive into 'src.zip' under 'src_java';

Later, when copying them out of src.zip, we use this:

copy '*' in 'src.zip' recursive to 'C:/x' strip 'src_java/';

For zip or jar files, by default files are compressed. If you want to just store the files without compiling, such as creating Java executable jar files, use the store option. The compress option is also available but is almost always redundant. For jar files, you can also specify a manifest text along the way. The following is an example to create a Java executable jar:

Listing 15.14 make_xjar.judo
copy '*.java, *.properties' in 'C:/temp/classes/' recursive into 'judo.jar' store manifest [[* Manifest-Version: 1.0 Main-Class: judo Created-By: James Jianbo Huang (c) 2001-(* #year *) *]] ;

Save Multiple File Sets into a Single Archive

The copy target in the into clause can be an open archive object, returned by the createZip(), createZip() and createTar() system functions. Hence, you can easily copy multiple sets of files into a single archive. The under clause is also handy for organizing files stored within the archives. For instance, I have a source file directory, a documentation directory and an example directory. Everyday I make a backup file with this structure:

src/
docs/
examples/

This is easily done with Judo:

Listing 15.15 backup.judo
zf = createZip('~/archives/work-'+Date().fmtDate('yyyyMMdd')+'.zip'); copy '*' in 'c:/src/' except '*/alfa*, */beta*, */save/*' recursive noHidden echo into zf under 'src/'; copy '*' in 'c:/docs/' except '*/alfa*, */beta*, */save/*' recursive noHidden echo into zf under 'docs/'; copy '*' in 'c:/examples/' except '*/alfa*, */beta*, */save/*' recursive noHidden echo into zf under 'examples/'; zf.close();

In the first line, a new zip file is created based on the date. Then, three sets of files are copied under different folder names before finally the zip archive is closed (and saved). Don't forget to close it!

When copying multiple sets of files into a single archive, it is possible to have duplicate files. If this is allowed, specify the dupOk option; otherwise it will fail.

Copy Public Internet Resources

So far, we have seen how the copy command can copy files between file systems and archives. In fact, it can copy public internet resources as well. All you have to do is to specify a URL as the source, mostly likely a HTTP or FTP URL. The source is never more than one. You can still save it to a location in the file system or into an archive.

To copy the resource to a file, like copying a single file from the file system, you can specify a directory or a file path name:

C:\>cd z

C:\z>java judo -x "copy 'http://www.yahoo.com/index.html'"

C:\z>java judo -x "copy 'http://www.yahoo.com/index.html' to 'i.html';

C:\z>java judo -x "copy 'http://www.yahoo.com'";

C:\z>java judo -x "copy 'http://www.yahoo.com/'";

C:\z>dir
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\z

09/03/2004  10:22p      <DIR>          .
09/03/2004  10:22p      <DIR>          ..
09/03/2004  10:21p              36,682 default.htm
09/03/2004  10:22p              36,676 i.html
09/03/2004  10:22p              36,676 index.html
               3 File(s)        110,034 bytes

If file name is not specified in the URL, Judo provides a default file name, "default.htm". If the file name exists, such as index.html in the example, then it is used. Sometimes, the file name part of the URL is not really a file name, for instance, http://finance.yahoo.com/q/cq?s=%5edji+%5eixic+beas+goog, but Judo will simply use cq as the target file name. Therefore, if you are copying a resource with a dynamic URL in nature, it's better to provide a target file name.

For static internet resources, you can retain the path of the remote resource in the local file system or archives via the keepDirs option:

C:\z>java judo -x "copy 'http://dir.yahoo.com/Computers_and_Internet/index.html' keepDirs"

C:\z>dir Computers_and_internet
 Volume in drive C is Local Disk
 Volume Serial Number is 8097-678E

 Directory of C:\z\Computers_and_internet

09/03/2004  10:30p      <DIR>          .
09/03/2004  10:30p      <DIR>          ..
09/03/2004  10:30p              22,589 index.html
               1 File(s)         22,589 bytes

This feature, coupled with 25. SGML and JSP Scraping, can be used to efficiently construct a web crawler.

Network resources can also be copied into archives. The following example emulating copying the Yahoo! directory of Computers and Internet into a zip and a tar files.

Listing 15.16 download_yahoo_dir.judo
tar = createTar('yahoo_comp.tar.gz'); zip = createZip('yahoo_comp.zip'); urls = [ 'http://dir.yahoo.com/Computers_and_Internet/index.html', 'http://dir.yahoo.com/Computers_and_Internet/Software/index.html', 'http://dir.yahoo.com/Computers_and_Internet/Macintosh/index.html', 'http://dir.yahoo.com/Computers_and_Internet/Internet/index.html', 'http://dir.yahoo.com/Computers_and_Internet/Internet/WAIS/index.html' ]; for u in urls { copy u into tar keepDirs; copy u into zip keepDirs; } tar.close(); zip.close();

After execution, the zip archive has these files:

     0 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/
 22721 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/index.html
     0 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Software/
 23306 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Software/index.html
     0 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Macintosh/
 25034 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Macintosh/index.html
     0 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Internet/
 20739 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Internet/index.html
     0 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Internet/WAIS/
  8981 Fri Sep 03 22:35:06 PDT 2004 Computers_and_Internet/Internet/WAIS/index.html

Other File Utilities

Encrypting and Decrypting Files and Data

Judo provides built-in encryption for files and data, based on the javax.crypto package included in JDK1.4 and up. In Judo, encryption and decryption are password-based; they are provided via these system functions:

function encryptFile

function decryptFile

function encrypt

function decrypt

function setCryptoClassName

The encrypt and decrypt functions can take byte arrays, strings or java.io.InputStream as input, and produced the encrypted or decrypted result in a byte array.

The default implementation uses MD5 and DES encryption implemented in class com.judoscript.util.PBEWithMD5AndDES. If you have highly confidential information to safeguard, you can provide your own crypto class via the system function setCryptoClassName(); the crypto class must extend com.judoscript.util.PBEBase and implement its encrypt() and decrypt() methods.

Chopping and Assembling Files

For big files are hard to transfer or save on different media. Downloading a 550MB file, for instance, may pose great problems for less-than-fast connections. Sometimes you may want to back up a 1GB file onto a 650MB CD-ROM. Judo provides a file chopping and assembling utility just for this purpose.