HTTP Fun and HTML Processing

This article is old and is being consolidated into the book.
Please refer to the corresponding chapter(s) therein.
If the chapters or sections are not completed yet, you can use this article.
Refer to the examples as they are tested against the latest code.

Table Of Content

Make HTTP Requests
» Use a Proxy Server
» Save and Load Cookies
HTML Document Processing
Your Own HTTP Server
Hack the Web with Proxy Server
Server-Side Scripting with JUSP
Summary
Code Listings

HTTP Fun and HTML Processing

By James Jianbo Huang October 2001 non-printer version

Abstract JudoScript is an effective HTTP client and server language. HTTP requests can be made to servers with any HTTP headers and content. In particular, cookies can be examined, saved and loaded. For HTML pages, JudoScript has do..as html event-driven statement that processes the document and treats each tag as an event, for which actions can be specified. There are special events like :BEFORE, :AFTER and :TEXT. JudoScript is also a HTTP server language because of the startServer() and acceptHttp() system functions. Combining client and server capabilities, we get HTTP proxy servers, useful to debug web applications and hack the web. The JudoScript Server Page (JUSP) is another server-side scripting technology. JUSP pages are identical to JSP and ASP to embed code in the pages. Servlet com.judoscript.jusp.JuspServlet runs the JUSP pages. It provides a few predefined variables in the code and a couple of Java classes as a convenience.

1. Make HTTP Requests

JudoScript allows you to fully control all aspects of a HTTP request. However, if all you need is to read a file off the web, it suffices to call openFile() or openTextFile() with a URL:

copyStream openFile('http://www.xxxxx.com/collections/cesoir.mp3'),
           openFile('cesoir.mp3','w');

That's too easy. It is also easy to make more detailed HTTP requests with system function httpGet() and httpPost(). They prepare and returns a HTTP object but the server is not connected. This HTTP object has general, request and response methods. The general methods returns information about this connection:

getUrl()
getHost()
getPort()
getDomain()
getPath()
getFile()
getQuery()
getRef()
getMethod()

The request methods are for setting HTTP request headers and getting the output stream for writing content; if nothing to write, call the connect() to explicitly initiate the connection.

getTextOutput()
getOutputStream()
connect()
addCookie(cookie | name,value)
loadCookies(filename)

Why there are no methods for setting headers? HTTP request headers are set with the struct member syntax:

h.'Content-Type' = 'text/html';
h.'Content-Length' = 2840;
h.Date = date(2001,10,1,14,30,0);
h.addCookie(cookie);

Notice on the third line that a date value is formated to a HTTP date automatically. Likewise, use the same syntax to get response headers except for date headers. The following methods are for response:

getDateHeader(header)
getAllHeaders()
getStatusCode|getStatus()
getResponseMsg|getResponseMessage()
getTextInput()
getInputStream()
getCookies()
saveCookies(filename)

To see all HTTP headers, for instance, do this:

Listing 1. http_headers.judo
1: url = 'http://www.yahoo.com'; 2: h = httpGet(url); 3: for x in h.getHeaders() { 4: println x, ': ', h.(x); 5: }

For posting, call the HTTP object's getOutputStream() to write data up to the server. If you can store multiple files (such as music) on a web site, a script can handle any number of files, which is extremely convenient.

Use a Proxy Server

HTTP requests are internally made by Java classes java.net.URL and java.net.URLConnection. They can use a HTTP proxy server to make the network connection. In JudoScript, simply call the system function setHttpProxy() with a host and port number to set this proxy server setting.

Save and Load Cookies

JudoScript allows you to intercept cookies set by server and save them to files. You can retrieve them any time and send them back to relevant servers in the subsequent HTTP requests. The following program displays all the cookies set by the server:

Listing 2. show_cookies.judo
1: url = 'http://localhost:8080'; 2: h = httpGet(url); 3: for x in h.getCookies() { println x; }

The saveCookies() saves the cookies to a file; if the file name is missing, it defaults to "cookies.txt" in the current directory. The same-name cookies will be overwritten, provided the cookie's max-age is positive; if not, the cookie will be removed or not set if did not exist. The loadCookies() reads in the cookies from a file; if the file name is missing, it defaults to "cookies.txt" in the current directory. Only the cookies that match the domain and path of this request are set to the "Cookie" header.

To create a new cookie, whether to be sent to the server or set to the client (see below), use the system function cookie():

cookie(name,value)

Review Questions

Can you open a URL like a file? What's the difference?
The system functions httpGet() and httpPost() creates a HTTP object; when is the server connection started?
How to set request headers? How to get response headers? How to get all the response headers?
How to make HTTP requests through a proxy server?
What parameters does addCookie() method take? How to generate a new cookie?
How to save cookies to a file for a request? What cookies are not saved?
When you call loadCookies() method of a HTTP object, are all the cookies in the file set to the request?

»»» Top «««

2. HTML Document Processing

The majority of the web is HTML pages and increasing number of multimedia files. HTML pages may contain information needed by other software. Ultimately web services should prevail with HTTP, XML and emerging new technologies; but they are not here yet, and the information may not be as affordable and accessible as HTML pages. Other times you may want a mini robot to make private collections of stuff you are fond of.

JudoScript has a HTML processing engine that treats each tag (including text) as an event. Actions can be specified for certain tags. For example, the following program prints out all the <a> links:

do url as html {
<a>: if $_.href { println $_.href; }
}

The dot command is the shortcut for println. The internal variable $_ represents the current tag. The tag's attributes are accessed as its data members, which are case insensitive. A tag can be closed as in XML; if so, $_.isClosed() returns true; the corresponding end-tag event is not fired. Text in the page is represented by a special tag, :TEXT. Other special tags include :BEFORE, :AFTER, <?>, <!>, <!--> and <>; the last one matches any unhandled regular tags. The following reproduces a HTML page:

do url as html {
:TEXT: flush $_;
<>:    flush $_;
<!>:   flush $_;
<?>:   flush $_;
}

The source can be a HTTP URL, a file name, or any input stream. Note that <!> includes <!--> only if the latter does not have a handler.

Next is a practical program that retrieves quotes data from Yahoo Financial web site. It implements a state machine to get to the right information, based on a study of the page structure that the information we need is in a HTML table like this:
What it does is a) construct a URL containing all the symbols (lines 43 through 45), b) get the page back, and c) scrape it for interested pieces (lines 49 through 73). The abort statement (line 65) aborts this HTML statement.

Listing 3. get_quotes.judo
1: /* Usage: java judo get_quotes.judo sym1 sym2 .... 2: * 3: * This script scapes quotes date from a Yahoo Financial page, whose 4: * center piece is a HTML table like that in 'yahoo_quotes.html'. 5: * A state machine is used to retrieve pieces of information: 6: * 7: * ---------- ------------------ -------- ------ 8: * From State Input To State Output 9: * ---------- ------------------ -------- ------ 10: * 0 <th> 1 11: * 1 :TEXT:"Last Trade" 2 12: * 2 <tr> 3 13: * 3 <td> 4 14: * 4 <a href="/q?s=.."> 5 15: * 5 (text > 10 char's) abort 16: * 5 :TEXT 6 symbol 17: * 6 <td> 7 18: * 7 :TEXT 8 time 19: * 8 <td> 9 20: * 9 :TEXT 10 quote 21: * 10 <td> 11 22: * 11 <td> 12 23: * 12 :TEXT 13 delta 24: * 13 <td> 14 25: * 14 :TEXT 2 volumn 26: * ---------- ------------------ -------- ------ 27: * 28: * Note: this state machine is based on such a Yahoo page; if Yahoo 29: * changes the structure of the page, this state machine may very 30: * well become invalid and need to be updated. 31: */ 32: 33: if #args.length==0 { 34: println [[* 35: Usage: java judo (*#prog*) "^DJI" "^IXIC" ibm msft csco jdsu 36: Symbol names should comply with "http://quote.yahoo.com", 37: especially the standard indices. 38: *]]; 39: exit 0; 40: } 41: 42: // construct the URL -- 43: url = 'http://quote.yahoo.com/q?s='; 44: for sym in #args { url @= sym @ '+'; } 45: url @= '&d=v1'; 46: println url; 47: 48: // get the quotes -- 49: do url as html 50: { 51: :BEFORE: state = 0; 52: <th>: if state == 0 { ++state; } 53: <tr>: if state == 2 { ++state; } 54: <td>: 55: switch state { 56: case 3: case 6: case 8: case 10: case 11: case 13: 57: ++state; 58: } 59: <a>: if state == 4 { 60: if $_.href.startsWith("/q?s=") { ++state; } 61: } 62: :TEXT: switch state { 63: case 1: if $_ == "Last Trade" { ++state; } 64: break; 65: case 5: if $_.length() > 10 { abort; } 66: flush $_:8; ++state; break; // symbol 67: case 7: flush $_:8; ++state; break; // date/time 68: case 9: flush $_:6.2; ++state; break; // price 69: case 12: flush $_:>9; ++state; break; // delta% 70: case 14: println $_:>13; state=2; break; // volumn 71: } 72: 73: } // end of do as html.

JudoScript does not really know or care about HTML tags -- it treats any tags equally. In processing the documentation for JudoScript itself, I used many custom tags that make the documents look terse and clean. The documents are processed by programs to yield HTML. In fact, sometimes the code can cohabit with the document, using the local data feature:

Listing 4. body.htmx
1: do $$local as html 2: { 3: :BEFORE: html = openTextFile(#prog.replace(".htmx",".html"),"w"); 4: <doc>: println <html> [[* 5: <html><head><title>$_.title</title></html><body> 6: <center><table width=650><tr><td> <h1>(* $_.title *)</h1> 7: *]]; 8: :AFTER: println <html> '<hr></td></tr></table> </center></body></html>'; 9: html.close(); 10: :TEXT: print <html> $_; 11: <>: print <html> $_; 12: <?>: print <html> $_; 13: <!>: print <html> $_; 14: 15: <quote>: print <html> '<blockquote><font size=-1>'; 16: </quote>: print <html> '</font></blockquote>'; 17: } 18: 19: EndScript ------------------------------------------------ 20: 21: <doc title="Body Systems"> 22: <p> 23: The body can be thought of as a number of systems: 24: <ul> 25: <li> skeleon 26: <li> musculature 27: <li> cardiovascular system 28: <li> digestive system 29: <li> excretory system 30: <li> immune system 31: <li> respiratory system 32: <li> reproductive system 33: </ul> 34: <quote> 35: "The digestive system breaks down food and turns 36: it into the right chemicals for the body to use." 37: </quote>

When the document is modified, run

%java judo body.htmx

and the HTML page is generated. On line 1, the $$local is an input stream that contains the content below the line of EndScript on line 19. On line 3, before the processing begin, a text file is opened with the same name but extension of ".html"; it is closed when the document is processed (line 9). The header of the page is printed when processing the tag "<doc>" (lines 5 and 6), and footer after processing (line 8). For the rest, it basically prints out the content as-is (lines 10 through 13). This serves as a server-side style support; but you can do much more than just changing styles. For instance, for this article, all the captions are centrally managed, and sample code were collected to produce the final code listing.

Review Questions

Can you specify a file name in a HTML statement?
Write a program to collect all image URLs in a HTML file.
Write a mini robot that downloads a page along with all the enclosed images from the same site.
What are the special events BEFOER and AFTER in a HTML statement?
What is the difference and relationship between httpGet() system function and do..as html statement?
Many HTML pages contain many characters that are useless for page rendering, a senseless waste of network bandwidth. Write a program to clean up unwanted whitespaces in the TEXT handler. (Hint: take care of <pre> tags and string literals in script source code.)
Write a program to get rid of all the HTML comments except for those between <script> and </script> tags.

What is $$local? What is the outut of the following program?

sum = 0;
while (line = $$local.readLine()) != eof {
  println line : > 20;
  sum += line;
}
println '-----------' : > 20;
println sum : > 20;

EndScript ---------------------------------

1345
98789797
344

»»» Top «««

3. Your Own HTTP Server

Have you ever written a web server? CGIs or servlets do not count because they are web server extensions. With JudoScript, writing a server is surprisingly simple.

Listing 5. mini_server.judo
1: ss = startServer(8088); 2: while { 3: start thread miniHandler(acceptHttp(ss)); 4: } 5: 6: thread miniHandler a { 7: a.'Content-Type' = 'text/html'; 8: os = a.getTextOutput(); 9: println <os> '<html><body>This is all you get.</body></html>'; l0: os.close(); 11: }

On line 1, the system function startServer() takes a port number and opens a server socket. Anything can come in; it depends on handlers to decide which protocol(s) to support. On line 3, another system call acceptHttp() (waits and) accepts client connections on the server socket, and returns a HTTP service object, which is passed as a parameter to a newly created thread that handles this HTTP request. How it is handled is up to the thread code. In our case, lines 6 through 11, it always prints out a HTML page with message "This is all you get.".

The HTTP service object is very similar to the HTTP client object. It has the same struct member access syntax to get headers and these methods for client requests:

getServerName()
getServerPort()
serveFile([doc_root])
serveError(status)
getUrl()
getHost()
getPort()
getDomain()
getPath()
getFile()
getQuery()
getRef()
getMethod()
getDateHeader(header)
getAllHeaders()
getTextInput()
getInputStream()
getCookies()

It also has struct set member syntax to set headers and these methods for sending response to client:

getTextOutput()
getOutputStream()
addCookie(cookie | name,value)

The following program does an "echo" on everything the client sends, which can serve as a reference and template for other server handlers.

Listing 6. snooper.judo
1: ss = startServer(8088); 2: while { 3: start thread snooper(acceptHttp(ss)); 4: } 5: 6: thread snooper a { 7: path = a.getPath(); 8: if path.indexOf('snoop') < 0 { 9: a.serveFile(); 10: return; 11: } 12: a.'Content-Type' = 'text/html'; 13: os = a.getTextOutput(); 14: flush <os> [[* 15: <html><body> 16: <h1>Snooper Server</h1> 17: <table> 18: <tr><td> </td><td> </td><td><em>Request URI Parts</em></td></tr> 19: <tr><td><b>Name</b></td><td></td><td>(* a.getServerName() *)</td></tr> 20: <tr><td><b>Port</b></td><td></td><td>(* a.getServerPort() *)</td></tr> 21: <tr><td><b>URI</b></td><td></td><td>(* a.getUrl() *)</td></tr> 22: <tr><td><b>Host</b></td><td></td><td>(* a.getHost() *)</td></tr> 23: <tr><td><b>Port</b></td><td></td><td>(* a.getPort() *)</td></tr> 24: <tr><td><b>Domain</b></td><td></td><td>(* a.getDomain() *)</td></tr> 25: <tr><td><b>Path</b></td><td></td><td>(* a.getPath() *)</td></tr> 26: <tr><td><b>File</b></td><td></td><td>(* a.getFile() *)</td></tr> 27: <tr><td><b>Query String</b></td><td></td><td>(* a.getQuery() *)</td></tr> 28: <tr><td><b>Ref</b></td><td></td><td>(* a.getRef() *)</td></tr> 29: <tr><td></td><td></td><td></td></tr> 30: <tr><td></td><td></td><td><em>Request Headers</em></td></tr> 31: *]]; 32: for x in a.getAllHeaders() { 33: flush <os> [[* 34: <tr><td nowrap><b>(* x *)</b></td> 35: <td></td><td>(* a.(x) *)</td></tr> 36: *]]; 37: } 38: flush <os> [[* 39: <tr><td></td><td></td> 40: <td><em>Cookies</em></td></tr> 41: *]]; 42: for x in a.getCookies() { 43: flush <os> [[* 44: <tr><td><b>(* x.getName() *)</b></td> 45: <td></td><td>(* x.getValue() *)</td></tr> 46: *]]; 47: } 48: 49: flush <os> '</table></body></html>'; 50: os.close(); 51: }

In the while loop on line 3, a thread snooper(), is started for each connection. The bulk of the program prints out HTML code in a here-doc string, with embedded expressions for various values. Don't worry about performance, because the print and flush statements are smart enough to directly print out each value in a string concatenation expression.

Lines 8 through 11 checks the URL for the word "snoop"; if found, do snooping. Otherwise, treat it as a file request and tries to serve it with method serveFile(). This is a fairly common pattern for implementation your own server. The serveFile() uses a system mime-type map, which is accessible via the system function getMimeTypeMap(), a struct that maps file extensions to mime-types. You may add more to it. If a doc-root directory is not specified, it uses the current directory. Similar to serveFile() method is serveError(), which taks a response status code; default is 500.

Review Questions

Using startServer(), how to implement a server that runs the protocol you defined yesterday?
Using the HTTP service object's serveFile() methods, implement a simplest multi-threaded web server.
Then extend the server to check URLs that starts with "/ext/"; if found, based on the rest of the path, call a function to handle the request. Try to formalize this as much as possible; think about servlet API or CGI.

»»» Top «««

4. Hack the Web with Proxy Server

We can create a HTTP server; we can create HTTP clients. Combining these two, we get a HTTP proxy server. Proxy servers are typically used for network security reasons, or for machines running on local area network to share a single internet connection. You might want to create a proxy server for this purpose, but this is probably unlikely. The reason we are interested in proxies is to monitor the content going through the wire between the client (a web browser) and servers.

HTTP proxy servers handle HTTP requests natively, that is, they understands HTTP protocols and do a lot more than just passing things through. HTTP proxy servers typically handles other protocols by passing them through. Suppose a browser is set to use a HTTP proxy server for all its internet requests; when it is making an FTP or HTTPS request, it actually sends a HTTP request to the proxy server with an FTP or HTTPS URI. Seeing a protocol it does not handle, the proxy sets out to do pass-through.

We do not want to do too much with proxies such as caching, persistent connection, etc. All we want is to see what goes across the wire. Though sounds simple, it may be invaluable in debugging web applications and hacking web browsers and servers.

Listing 7. proxy.judo
1: ss = startServer(8088); 2: while { 3: relay(acceptHttp(ss)); // single-thread 4: } 5: 6: function relay c { 7: // Connect to server; browser should send absolute URL. 8: url = c.getUrl(); 9: doPost = c.getMethod().equalsIgnoreCase('post'); 10: println '>>>>>>>> ', url, ' >>>>>>>>'; 11: if doPost { 12: s = httpPost(url); 13: } else { 14: s = httpGet(url); 15: } 16: 17: // pass all client headers and content to server 18: for x in c.getAllHeaders() { 19: s.(x) = c.(x); 20: println x, ': ', c.(x); 21: } 22: if doPost { 23: copyStreams c.getInputStream(), s.getOutputStream(); 24: } 25: 26: // pass all server headers and content to client 27: println '<<<<<<<< ', url, ' <<<<<<<<'; 28: for x in s.getAllHeaders() { 29: c.(x) = s.(x); 30: println x, ': ', s.(x); 31: } 32: copyStreams s.getInputStream(), c.getOutputStream(); 33: 34: catch: 35: println <err> '[', $_.name, '] ', $_.message; 36: finally: 37: println; // separate requests. 38: }

Lines accepts a client request, and calls function relay(). It is better to call a handler function than start a thread, because we want to see serialized output for consecutive requests. Line 8 obtains the request URL, and makes its own request to the server on behalf of the client (lines 12 and 14). Older browsers may send a relative URI and use the "Host" header to indicate which server; we don't care. Then it relay headers and content from the client to the server (lines 17 through 24) and vice versa (lines 26 through 32). That's all it does. Open your browser, click the network settings for the browser and set it to use a HTTP proxy server at localhost:8080 and access some web sites. The following is the output for an Internet Explorer 5 visite of a server run locally; the output is slightly edited to fit the page.

>>>>>>>> http://localhost:8080/ >>>>>>>>
accept-language: en-us
cookie: generic=mtoeX2jZb
accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
        application/vnd.ms-excel, application/msword, */*
host: localhost:8080
if-modified-since: Fri, 26 Oct 2001 04:17:08 GMT; length=2890
proxy-connection: Keep-Alive
user-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
<<<<<<<< http://localhost:8080/ <<<<<<<<
Date: Sun, 28 Oct 2001 01:02:07 GMT
Status: 200
Servlet-Engine: Tomcat Web Server/3.1 (JSP 1.1; Servlet 2.2; Java 1.3.0;
                Windows 2000 5.0 x86; java.vendor=Sun Microsystems Inc.)
Set-Cookie: generic=mtoeX2jZb;Expires=Tue, 28-Oct-2003 01:02:07 GMT;Path=/
Content-Type: text/html
Last-Modified: Fri, 26 Oct 2001 04:17:08 GMT
Content-Length: 2890
Content-Language: en

>>>>>>>> http://localhost:8080/mystyles.css >>>>>>>>
accept-language: en-us
cookie: generic=mtoeX2jZb
accept: */*
proxy-connection: Keep-Alive
host: localhost:8080
if-modified-since: Fri, 26 Oct 2001 04:17:08 GMT; length=1600
user-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
referer: http://localhost:8080/
<<<<<<<< http://localhost:8080/mystyles.css <<<<<<<<
Date: Sun, 28 Oct 2001 01:02:09 GMT
Status: 200
Servlet-Engine: Tomcat Web Server/3.1 (JSP 1.1; Servlet 2.2; Java 1.3.0;
                Windows 2000 5.0 x86; java.vendor=Sun Microsystems Inc.)
Set-Cookie: generic=mtoeX2jZb;Expires=Tue, 28-Oct-2003 01:02:09 GMT;Path=/
Content-Type: text/css
Last-Modified: Fri, 26 Oct 2001 04:17:08 GMT
Content-Length: 1600
Content-Language: en

>>>>>>>> http://localhost:8080/judoscript.gif >>>>>>>>
accept-language: en-us
cookie: generic=mtoeX2jZb
accept: */*
proxy-connection: Keep-Alive
host: localhost:8080
if-modified-since: Fri, 26 Oct 2001 04:17:08 GMT; length=3298
user-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
referer: http://localhost:8080/
<<<<<<<< http://localhost:8080/judoscript.gif <<<<<<<<
Date: Sun, 28 Oct 2001 01:02:09 GMT
Status: 200
Servlet-Engine: Tomcat Web Server/3.1 (JSP 1.1; Servlet 2.2; Java 1.3.0;
                Windows 2000 5.0 x86; java.vendor=Sun Microsystems Inc.)
Content-Type: image/gif
Last-Modified: Fri, 26 Oct 2001 04:17:08 GMT
Content-Length: 3298
Content-Language: en

>>>>>>>> http://localhost:8080/bo2.gif >>>>>>>>
accept-language: en-us
cookie: generic=mtoeX2jZb
accept: */*
proxy-connection: Keep-Alive
host: localhost:8080
if-modified-since: Fri, 26 Oct 2001 04:17:08 GMT; length=4202
user-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
referer: http://localhost:8080/
<<<<<<<< http://localhost:8080/bo2.gif <<<<<<<<
Date: Sun, 28 Oct 2001 01:02:09 GMT
Status: 200
Servlet-Engine: Tomcat Web Server/3.1 (JSP 1.1; Servlet 2.2; Java 1.3.0;
                Windows 2000 5.0 x86; java.vendor=Sun Microsystems Inc.)
Content-Type: image/gif
Last-Modified: Fri, 26 Oct 2001 04:17:08 GMT
Content-Length: 4202
Content-Language: en

Review Questions

If a LAN uses IRONGATE:8024 as a gateway to internet. How to modify listing 0 to make it work?
Modify listing 0 so that any HTML page is saved to directory "c:\fbi" if it contains the word "anthrax".

»»» Top «««

5. Server-Side Scripting with JUSP

JudoScript Server Page technology is JudoScript's server-side scripting. It's syntax is identical to JSP or ASP, except it does not have an XML counterpart. To summarize, inside a JUSP page, <% and %> quotes JudoScript statements and <%= and %> quotes expressions. Anything outside these are treated as HTML text. In the code, these variables are available:

servlet
request
response
session

and these Java classes are predefined:

const #Cookie   = javaclass javax.servlet.http.Cookie;
const #HttpUtil = javaclass javax.servlet.http.HttpUtil;

Listing 8. first.jusp
1: <% 2: response.setContentType('text/html'); 3: a = 'The Very First Time!'; 4: b = 'This is the very first test of JUSP!'; 5: %> 6: <HTML> 7: <HEAD> <%=a%> </HEAD> 8: <BODY> 9: <H1><%=a%></H1> 10: <%=b%> 11: </BODY></HTML>

In order to use JUSP, you need to configure a servlet of class com.judoscript.jusp.JuspServlet onto your servlet-enabled web server. This server takes an init parameter juspRoot to tell the servlet where to look for JUSP pages.

Review Questions

What are the differences and commonalities between JUSP and JSP?
Which variables can you use in a JUSP page? Which Java classes are predefined as constants?
Create a JUSP page that sets a session cookie.

»»» Top «««

6. Summary

JudoScript nicely supports various aspects of the web technologies, from clients to servers and in between. They can be combined to create powerful web tools, or used as communication agents such as sharing resources or mini messaging systems.

To get documents from the web, simply call openFile() or openTextFile() with a HTTP URL. For more detailed web interactions, such as sites that use cookis to maintain sessions, use httpGet() or httpPost(), which return a HTTP object that allows to manipulate HTTP headers and contents for both directions. In particular, cookies can be examined, saved (to file) and loaded and sent back to the sites.

HTML pages may contain useful information that programs want to extract. Using the do..as html statement, you can easily specify actions for each tag (including text). The $_ variable contains the current tag, its attributes are accessed as its data members. Some special tags represent special tags and events, such as :BEFORE, AFTER, <!> and <?>. Unmatched "normal" tags are collectively represented by <>.

To create a HTTP server in JudoScript, call startServer() to start a server socket, then repeatedly call acceptHttp() to accept user connections, turn that socket into HTTP service objects and pass them to a handler. The handler is normally a thread but can also be a function, so that the server is single-threaded. The HTTP service object is in many ways similar to the HTTP object but the role is reversed. The handler can call helper functions, serveFile() and serveError(). The difference between such a server and Apache or Tomcat is, it does not have rigorous configuration to support servlet, CGI, ... Your handler function or thread controls everything.

By combining a HTTP client and server, a proxy server is born. It may be extremely helpful for, say, monitoring what is going on across the wire. To do so, create a single-thread server that takes client connections and relay everything between the server and client; modify your brower's proxy setting to point to this little one.

The JudoScript server page (JUSP) technology allows you to develop web sites with your favorate language, i.e., JudoScript. The page structure is identical to JSP or ASP that code and expressions can be embedded in the pages. Servlet com.judoscript.jusp.JuspServlet handles the pages, and establishes these variables for the code in the page: servlet, request and response. The following Java classes are predefined for your convenience: #Cookie for javax.servlet.http.Cookie and #HttpUtil for javax.servlet.http.HttpUtil.

»»» Top «««