Having followed the discussions in this thread for a while I have decided to throw my 2 cents in as well.
Beware, this will be long, but I hope that it is understandable even to people without a degree in computer science.
What is interpretation ?
To me it is the process of mapping (somehow encoded) instructions to the actual operations associated with them, thus giving them meaning. In this way interpretation and execution are the same thing.
Because of this every implementation of any language is interpreted, they only differ in the level at which the interpretation happens, or is most conveniently to look at by us.
Digression
It might have passed unnoticed, but I made a rather important distinction in the last sentence: Languages are not implementations, they have implementations. And properties like interpreted, compiled, etc. belong to the implementation, not to the implemented language.
I believe that some the arguments I saw and the confusion arising from it stem from the intermixing use of these two notions. I guess that this comes, partially at least, from the fact that most (all?) languages seem to have a preferred type of implementation, for example C interpreters exist, but are much less in number than C compilers.
So, if I say
xxx
language in the future it means that the usual implementation for that language has the propertyxxx
.
I implicitly mentioned different levels of interpretation, but what are they ?
At the lowest level we have the hardware. Yes, a (micro)processor
is an interpreter, as it associates machine language instructions
(encoded in bytes) to operations like clear register
, etc.
This is usually the lowest level of interpretation we see in a computer system, but depending on the implementation of the processor not necessarily the lowest one existing.
All microprogrammed
processors have a small machine inside busily
interpreting microinstructions
(bit vectors) for the sole task of
interpreting/executing the incoming machine instructions. And
depending on the complexity they might even have a nanoprogram
level (for example the M68xxx series).
In the end every computation is reduced to this level, but often it is not very convenient to look that deep.
One step above we have virtual machines. These are basically interpreters implemented in software instead of hardware.
They often employ simple machine languages, but not necessarily so.
An implementation of a domain specific language like the one used
by Brent Welch to describe forms via tcl procedures is a virtual
machine too, executing form
-instructions.
A virtual machine capable of executing the language of a hardware processor is called an emulator. One application is the simulation of embedded systems on bigger machines, for example to test a new robot control application, and another the emulation of the hardware of long-gone game-consoles to allow the usage of their games even today.
Back to the simple machine languages: The usual name for them is
bytecode
, due to the fact that many of them use single-byte
instructions. Another thing to note is that most(all?) seem to
implement stack-based processors instead of the (usually)
register-based hardware processors. The easy compilation (see
below) of higher languages into a language so structured seems to
be the deciding factor.
There is nothing above virtual machines, just different techniques for implementing them.
Before I am coming to that I should explain the notion of compilation
too.
The general notion behind the term is that of a converter, taking a program written in a language X as input and producing a program in another language Y as output. But not all converters are compilers; to be one, some restrictions have to be met.
An application like a cross-referencer is certainly no compiler although its output can be seen as a program in a simple language. The problem is that the output does not contain all of the information which was in its input.
The output of a compiler has to be semantically equivalent to its input. That means, if we execute input and output (remember, both are programs!) on some data D both have to produce the same results.
This definition is a bit broader than a programmer might see it, as it classifies pretty-printers as compilers too, but that is fine with me, they simply belong to the special class of compilers translating from a language X to X itself.
Ok, now we can look at the possible approaches to implement virtual machines.
The interpreting software reads the program in the language to execute, parsing and executing it along the way. During iterations of language elements the input is parsed multiple times.
This is an immediate interpreter and seems to be usually
meant or implied if a language is called interpreted
.
Two-stage interpretation:
In the first stage the input is read completely, compiled into a lower-level language and then executed by the second stage, which is an immediate interpreter for the low-level language.
This often uses a bytecode as the low-level language, thus reducing/eliminating the parsing overhead for the 2nd stage interpreter. And as the complex and costly parsing of the high-level language now happens only once the combined system is usually faster than an immediate interpreter for the high-level language itself.
Please note that this implementation uses a compiler, but it is
implicitly called by the environment and not seen by the
user. Terms used for this are on-the-fly compilation
or
just-in-time compilation
(See comments about Java later on).
Compiler based
The user has to explicitly invoke a compiler to translate his program in the high-level language into a low-level language. The resulting output is then executed, either on hardware or by a bytecode interpreter.
Both (1) and (2) allow the construction of interactive shells wrapping around the core engine, reading and executing user input as it is typed, with immediate feedback of the results.
An explanation of the term scripting language
is still missing. Now
that I laid the foundations in the text above I will work on this one.
Many people seem to believe that scripting
and (immediate) interpretation
are synonyms. I disagree, but have to digress a bit (again) to explain
that. Two general notes on this topic before that:
The definition of the categories above are based on language
implementations, and therefore rather strict. scripting
, and
its counterpart programming
are much more diffuse and more a
question of how to use a language, i.e. the way a programmer
works with it and for what tasks.
Whenever I encountered discussions about scripting
vs. programming
I had the feeling that some of the participants
believe that scripting
languages are inferior to programming
languages and not to be taken as serious languages, that they are
more for hacking on a problem rather than real programming.
I have to disagree with that opinion. To me a scripting language is as serious as any other programming language. It is just that they have different problem domains to apply them to.
On to the digression. I will talk about various computer languages,
their implementations and wether I consider them as a scripting
language or not. After that section I will then try to describe the
major characteristics of scripting
as I see them. This should make
clear why the boundary between the scripting
/ programming
is a
fuzzy one too.
Basic
is one of the earliest languages, and still alive. The earliest implementations were immediate interpreters, but modern implementations at least use tokenization to speed up execution. I don't know enough about the exact implementations to call it a byte-code and 2-stage interpreter, but it definitely is a precursor for such techniques.
The only variant I am sure about that it can be compiled is Visual Basic from M$. Another important feature is its extendability via VBX'es and the successor technology OCX / (Radio)ActiveX.
I am not sure wether the VB-IDE allows interactive development of applications. If yes it would be a scripting language (explanation later, be patient).
Pascal
, Modula
, Algol
, C
, C++
, Ada
, FORTRAN
, Eiffel
, Java
.
The main set of programming languages
. They are usually
compiled.
C
I have heard that an interpreter exists, but cannot confirm it.
Pascal
The UCSD-Variant of this language was compiled into P-code
, a
bytecode-language, and then interpreted. Due to the explicit
compilation this is no 2-stage interpreter.
Java
This language uses the same technology as did UCSD-Pascal,
explicit compilation into a portable bytecode. The so called
just-in-time-compilation
is basically the same technology which
is used by 2-stage interpreters, but one level deeper:
2-stage Interpreter: Highlevel-Lang -(on demand) -> Bytecode.
JIT: Bytecode -(on demand) -> Maschinecode.
AFAIK Sun is developing hardware for the immediate and fast execution of Java-bytecodes.
SmallTalk
One of the first OO-languages. Very interactive, unfortunately a closed system. Not a scripting language.
Standard implementation technique is a 2-stage interpreter, but I read in the Handbook of programming languages about a commercial variant using JIT-technology to improve performance. This predates its use by Java, AFAIK.
The earliest implementations were run on hardware, microprogrammed processors who interpreted the then bytecode language immediately (the then because I believe that the bytecode of SmallTalk varies between the first implementations and the standard SmallTalk-80).
Lisp
Another language from the beginnings of the computer era with a very simple, some say non-existent syntax.
The earliest implementations were immediate interpreters for the list-structures coming from the input reader, but modern ones use 2-stage interpreters and bytecodes.
There were even machines executing such bytecodes immediately on hardware (TI-Explorer, Symbolics) but they did not survive in the market.
Examples of this family are Commmon Lisp
, Scheme
, Emacs Lisp
and Guile
.
AFAIK Guile
from the GNU project is the first variant (itself a
descendant of Scheme) containing extendability features, like
loading of shared libraries.
I see Guile
as scripting language, but not the other variants.
Remark: Thinking back to the introduction the existence of
hard- and software implementations for SmallTalk
, Lisp
and
Java
shows very clearly how movable the boundary between these
two really is and why a comp.scientist sees every execution as an
interpretation of some kind.
sh
, bash
, ksh
, csh
, tcsh
, ...
The bourne shell
and its descendants, all implement basically
the same language with a simple syntax. All implementations are
immediate interpreters and feature interactive shells.
Basic features for extension: Execution of arbitrary commands, pipes and I/O direction for building more complex applications from basic building blocks.
All are scripting languages.
Tcl
Before version 8 this language was implemented with an immediate interpreter, but version 8 and higher use bytecodes and a 2-stage interpreter.
ICEM CFD sells a Tcl2C compiler, thus essentialy providing a compiled implementation. Jacl is a tcl interpreter written in Java, giving us a nice example for the concept of stacking virtual machines on each other: The tcl interpreter is a virtual machine, as it is written in Java it is executed by the Java-VM, which in turn is machine language, interpreted by hardware. SICP contains more nice examples for this type of construction.
Extensible through facilities to call arbitrary external commands, like in shells, and through loading of shared libraries. Offers an interactive shell.
A simple grammar, a streamlined (simpler quoting model) and
extended (more quoting possibilities) version of sh
-syntax.
Definitely a scripting language.
Perl
, Python
Two more scripting languages, both implemented by 2-stage
interpreters, from the very start. Their grammars are more
complex than the ones of Lisp
, Tcl
and sh
, but less than,
say C
(At least for Python
;-).
Extensible like Tcl
.
Python
When I heard first from JPython I thought it to be something like Jacl, a Python interpreter implemented in Java. Newer information suggests that it is more of a compiler from Python to Java-Bytecodes, skipping the Python-bytecode language entirely.
So, looking back at that long list of languages, what caused me to
label some of them as scripting
languages but not others ?
Well, the first important point is interactivity, the ability to play with an application while it is in development (after), to try and compare different possibilities for its implementation in a very short time, to develop it incrementally, starting with a simple implementation which is then refined.
This is a very different style of application development than is used with programming languages whose larger turnaround times in the edit-run/test-edit cycle make such playing around difficult. It is maybe even (part of) the reason why they are sometimes seen as inferior or hackish (see general note 2 before the language section).
Please note that an interactive shell is only one way to provide
this element. Neither Perl
nor Python
feature such a shell, yet
they are fast enough to create a good enough approximation of a shell
to allow the incremental style of development.
A trade-off can be seen too. All of the languages with an interactive
shell (tcl
, sh
, guile
) have a much simpler grammar than the
languages without (perl
, python
). IMO this is because for
interactive shells we users have be able to type statements of the
language at the shell prompt without thinking too much ahead and/or
about syntactical structure, as it might be necessary in full-blown
languages with complex grammars. To write an application in a
scripting languages without shell we need an editor, thus are not
constrained by its basically single-line input.
A universal point is the existence of fast interpreters for the language.
The second thing required for me to consider a language as scripting
is easy extensibility, allowing it to act as glue between
highly diverse components. That is the reason why I said that Guile
is a scripting language, but not the other Lisp
-variants. The same
is true for M$-VisualBasic, it is a scripting language if and only if
the IDE allows interactive development.
I concede that the extensibility features of sh
are at the low end
of the spectrum but the near universal use of shell-scripts to
administrate and control unix systems shows that even they are
powerful enough for many and complex tasks.