Pushing ChatGPT
What Will Today's Edge Look Like Tomorrow?
Another neural network post! Let's have fun testing if we can use ChatGPT as a technical assistant for security stuff!!
The idea here is to push ChatGPT further than it should be able to go in order to measure where the limit might be :)
First off, what is ChatGPT?
How does it work?
Could it do something like decode base64?
So Base64 an interesting target because we can use it to turn binary files into ASCII text, which conveniently is what ChatGPT uses for input output. So if it can operate on Base64, that opens a new horizon of possibilities.
Here's how Wikipedia classified Base64 encoding:
I guess this isn't quite 'computation', but more like 'translation', so we'll see how it goes!
So we start with a 'simple' test by encoding a sample string to see if ChatGPT can decode it correctly:
Well that's impressive to me! Let's throw some non-printable ASCII characters in and see what happens:
So this is also really impressive!! It correctly identified:
the plain text value
the presence of non-printable ASCII
But the decoded non-printable characters did not precisely match the output provided by ChatGPT, which is kinda interesting.
Arguing With Robots
Doing this "work" showed a potential window into the future. Getting ChatGPT to do what I wanted required some degree of coaxing and convincing. It is both fun and frustrating to find yourself arguing with a non-sentient robot that's just contradicted itself (again), but when that's happening at the bank or a government office then it will probably be much less amusing.
It is quite interesting 'hacking' the AI to get it to do things that it seems to have guard-rails to prevent, and apparently this idea is becoming more and more popular in different circles.
I guess we’re seeing hints of the attack and defense cycle playing out in this new sandbox ;)
Complexity++
Ok, so sometimes we might need to look at data we don't immediately understand and need to spend cycles figuring out what's going on. Could ChatGPT help with this activity?
To find out I fired up Burp and captured some Chrome traffic to Google domains. I took a JSON data structure from Burp, Base64 encoded it, and asked ChatGPT what it could tell me:
Again, this response is pretty promising. This general purpose chatbot can decode a long Base64 string and see that it contains something that looks like JSON. What else can it tell us?
I recently had a conversation with someone about how difficult it can be to program a computer to do some things that we take for granted as being 'easy'. But this looks like a great example of ANN/DNN tech making old difficult problems look easy to solve.
So for scorekeeping:
it decoded a longer Base64 string
tentatively identified it as JSON
and was able to draw basic conclusions about the nature of the JSON data
But again, I didn’t pay close attention to the details of the response the robot produced.
Binary Testing
I went down an avenue of testing binary files encoded as Base64 strings.
The idea was: dang it would be useful to upload a file and say "does it look like there's anything malicious in here", or "can you list the exported functions in this DLL", etc.
So I did some initial ‘basic’ testing, but every time we moved towards more complex questions I found myself arguing with the robot.
It would perform some analysis that was helpful, and then when I asked a follow-up question it would tell me it was impossible to do what I was asking. And so then I'd point out that it had already done some of what I was asking, and around and around we'd go.
So I didn't come up with much to show here, except to say there seems to be some potential for the future.
Obfuscated JavaScript
So let's take it down a notch and try to solve a more reasonable problem.
Minimized and obfuscated JavaScript files are a real pain because there are tons of them everywhere and they are hard to read and understand.
Driving the purpose-built utilities to decode them still leaves you with a lot of time on keyboard to produce results and figure things out.
But wouldn't it be great if ChatGPT could do it for me?!? ;)
Here's a small example file that Chrome saw when connecting to Google:
So we Base64 encode it and send it to the robot to see what it can tell us:
Yet again, I found this response to be surprising and cool, and started down the path of more questions before looking too closely at the response:
And at this point my jaw is basically on the floor when I'm reading the bit "appears to be handling a network error by retrying the network request." This is precisely the type of response I was dreaming of, where I'd fed it a complex input and in moments it told me something that would easily take me more than an hour to do myself!
But to confirm things we need to get it to show it's work, or else do the analysis myself manually (whoops, could have planned and prepared better here ;)
Well, back to my favorite new hobby: arguing with robots!
So I asked ChatGPT to tell me what part of the input JavaScript was related to retrying the network request. ChatGPT started generating responses that weren't making sense to me, and when I asked it to clarify if these responses were based on the encoded JavaScript it said no. So when I asked it if the prior conclusion about the network request was based on the data I sent, then it said yes.
And here I am left to deobfuscate the JavaScript myself to figure out how useful ChatGPT is (or isn't) at this type of advanced edge case.
But in this process I found something very interesting that might be a cool place to wind down this post.
I'd taken for granted that the Base64 decode on the obfuscated JavaScript was done correctly, and when starting the manual deobfuscation I realized that there were some errors in the decode process that were kinda fascinating:
The very first character was omitted by the ChatGPT decode
"Ui" was consistently transposed as "UI" (probably statistically much more common?)
The string 'bm51tf' was consistently transposed as 'bm51t" '
Other subtle or significant changes to the code blocks, for example lines 29 and 60-67
Conclusions
So, go figure the general purpose robot isn't gonna do a great job when we ask it to do things we know it shouldn't be good at.
But how amazing is it that I fed ~1600 characters of Base64 to this thing and it output something around 90% accuracy in decoding it to JavaScript?
And it’s neat to look at the errors it produced, like how the errors in the JavaScript B64 decode didn’t result in a complete failure to decode properly. A ton of that decoded value is a precise match for the source material. And the nature of the errors throughout the output is something that just feels surprising and odd somehow.
It remains to be seen if the conclusion about the function of the JavaScript was accurate or not ;)
As it stands today, if you need to double-check the conclusions the system gives you then it isn't very useful as a technical assistant.
But this seems like a window into the future. It seems like you could train something like ChatGPT to be a purpose-built binary reversing or JavaScript deobfuscation robot, and then you might have a very useful digital assistant that feels like something out of 'StarTrek'. You might be able to say “are there indicators of malicious capability in this executable” and then have it tell you not only yes or no but also help you understand why.
It will be interesting to see how purveyors of these technologies roll them out. It is difficult to put faith in unreliable systems, and when robotic systems fail they tend to fail confidently without the hesitation that you might find in a person in a similar situation. But the flip side is that we’re already used to dealing with software bugs, and also with different people with their unique strengths and weaknesses.
So how far away are we from leveraging this type of tech in a massive way?
I guess I’ll gamble a prediction that in the next ~5 years we might see commercialized ANN/DNN products competing with services currently handled by teams of specialized and trained people..!
But it’s hard to assess how fast things might go or what the impact might look like. Rumor has it you can leverage ChatGPT today for some types of IT work where even if you have to check and bugfix the output, you might not need to pay someone to do the bulk of the work for you.
What does it look like when a free technology could put a dozen niche companies out of work? Does the parent company try to generate revenue by nerfing the free product and developing something aligned to the task? Can the parent company actually make money off of it if someone else can just train up their own model that’s 80% as good and give it away for free?
Does this cascade into lots of jobs that we didn’t anticipate being at risk? The conventional wisdom was that AI could do lots of stuff before it could make art, but here today have lots of ‘art’ generating robots. So does that mean that lots of white-collar jobs could be impacted? Are we going to have AI patent-laywer chatbots and AI mortgage chatbots and AI therapy chatbots?
How about hybrid models? For example, there are lots of terrible developers in the world (like me ;). What does it look like if you make a competition between a software development company with 100 developers vs a software development company with one AI code generating chatbot and 10 developers who bugfix and modify the output from the robot?
I’m not sure what I’ll be doing after the robots come for my job :(
Until next time!!!