Tuesday, August 15, 2023

Korelogic's CMIYC 2023 @ DEF CON 31 Write-up

 Members that participated (10 crackers/ 1 support)

  • s3in!c

  • golem445

  • hops

  • blazer

  • gearjunkie

  • winxp5421

  • AMD

  • cvsi

  • pdo

  • Waffle

  • Usasoft (support)

Peak computing power (25-30 standardized to 4090)

Before the contest

The test hashes gave us a gleam of what the potential hash data would look like.  After successfully cracking all the test hashes, we noticed very heavy use of UTF-8 encoded characters. We ensured we had adequate tooling to handle UTF-8 strings, detect character sets and expanded toolkits to leverage translation API for batch language translations. We also created tooling to parse the yaml data into usable and more manipulable formats.  

During the contest

Our very first issue we encountered was the {ssha512} hashes, since hashcat outputs these as {SSHA512} we had to quickly update our system to perform some translations to handle the alternate case.

We identified the timestamp pattern early on and those were used to quickly gain cracks on bcrypt. The metadata did throw us off a little bit initially as we were not totally sure how it was incorporated into the plaintext. Early on we were unsure whether they gave us a hint or whether the plaintext contained a portion or manipulated form of the metadata. Due to the insanely slow hashrate of bcrypt and sha512crypt sha256crypt it took our team quite some time to gather enough samples to deduce that the plaintext patterns were distributed across all the algorithms evenly. 

We were quite perplexed 12 hours into the contest by how other teams were able to consistently yield cracks for bcrypt while it appeared we were not really traveling anywhere. Once the plains were analyzed we identified some movie lines. We initially started off with Star Wars movie lines, then progressed to Star Trek movie lines, these were the following critical patterns identified.

Lines containing 2,3,4 words were extracted based on the word boundaries from movie scripts/subtitle files, then divided into various lengths for attacks

  • Len13?s (where s = !@$%)

  • Len12?d?s (where s=!@$%)

  • Len14+ suffix 1

We initially were not totally sure whether all the symbols were used or what the specific attributes were, so there were wasted resources used to check these. However, as the runs progressed, we were able to reduce the keyspace by improving our parameters, such as using only certain suffix patterns with certain lengths.

Once we knew the correlation between the hash sets, it was merely a game of attacking the fastest algorithm MD5, then filtering the attacks through all the algorithms to maximize points. Since we used a much larger repertoire of movie lines and corpuses on the faster hash types, we used the obligatory ChatGPT to identify the origin of the phrases. This involved converting our cracks to base words and asking where the phrases came from. Once all the sources were identified, we manually gathered all the movie lines/srt files and processed them as described above. Very large tasks were spawned to cover exactly those patterns, which gave us consistent cracks throughout the contest. The list below were the films we identified.

  • This was the movie list we used

  • 2001 A Space Odyssey

  • Alien series

  • Army of Darkness

  • Battlestar Galactica

  • Blade Runner

  • Close Encounters of the Third Kind

  • Contact

  • Dune

  • Event Horizon

  • Ex Machina

  • Firefly

  • Ground control

  • Guardians of the Galaxy

  • I, Robot

  • Inception

  • Interview with the Vampire

  • Mad Max

  • Minority Report

  • RoboCop

  • Star Trek

  • Star Wars

  • The Day the Earth Stood Still

  • The Expanse

  • The Fifth Element

  • The Galaxy Quest

  • The Hitchhiker's Guide to the Galaxy

  • The Matrix Trilogy

  • The Terminator

  • The Thing

  • The War of the Worlds

  • Tron

Since many plain texts were recovered using this method, we will add some additional information. We suspected there was a parsing defect (it was disclosed in the discussion afterward that delimiting on punctuation characters was used, which is why we noticed this obscure behavior).

When parsing our datasets, we were delimiting only on space as opposed to a sliding window method to ensure full word boundaries sentences were created, e.g. “something was here” instead of “omething was here”. However, we noticed in our cracks that we had phrases that were not “word bounded”, they appeared like this

“t go there today”

“ve got to be here”

For some reason, it did not occur to us that the phrases could be delimited on punctuation. Instead, we took a different approach and emulated this behavior. We simply took our existing phrase lists and prefixed d/t/s/m or ve/re while maintaining length constraints, this also gave us very good results. It was important to note that since we were dealing with long lengths, it was important to turn off “-O” when used with SHA512crypt sha256crypt due to the length 15 limitation.

We crafted some additional tooling during the contest to visualize and query the datasets with standard SQL like queries (yay to SQLite). All our new and existing plaintext cracks along with all the metadata were associated and synced constantly. This tooling played a pivotal role not only in creating -a 9 association attacks, but also in helping us identify patterns and create hash subsets. Some of these patterns included.

  • #3&4%# only applied to Telecom users

  • Russian users with russian words with suffix ‘1’

  • Icelandic phrases for users with Icelandic names

  • Ghosting users with the hinted passwords

  • Company names with word suffix

  • Saleswords + prefix/suffix rules for sales team

Being able to cut down the salt list for the slow algorithms meant we did not actually need lots of computing resources, we would strategically target subsets of hashes when we suspected a pattern. It was noticed that lots of effort was put in by Korelogic in generating the dataset. We noticed that Japanese users had Japanese passwords, usually users with UTF-8 encoded names used UTF-8 encoded passwords, even the area the user was from determined the password such as Indian users having Hindi passwords.

We were able to decode all the hints. However, at times I think we read too deep into these, and they almost sent us down a rabbit hole, throwing us off.

The workflow below roughly describes the process used. When large attacks were crafted and dispatched, users who felt like joining would participate, while others continued to run other attacks and discover new patterns for analysis. We also ran translations of the base words through various languages and tested numerous times to see if we could spot new languages.



We did not have any designated role, the team just played off on everyone's strengths. An example is that we initially found many of the word phrases via a large generic reddit comments wordlist which was quickly distributed among us, this then progressed to movies, from there on we identified the subset of movies. Due to the short phrases, it was not easy to determine the origin early on as they could have come from anywhere, though we were eventually able to piece things together. We worked collectively in gathering the various resources and once ready these resources and tasks were distributed via our job management system and those who opted to would join in. During this entire time various scripts/tools/ideas/automation/platform updates were made on-the-fly and contributed to ensure we were working with optimal resources and efficiently.

Things we missed

We were able to partially solve the prod/dev [separator] hint. We found the following separators (-_| %20). Sadly, we tested the other URL encodes in upper hex format %2F instead of %2f, so we missed numerous cracks here.

We did spend some time on the CRC hint and parsed the CRC book, scoured the web for chemical compounds. Wrote scripts to generate carbon chains. While we did get a few cracks, it did not appear that significant, or maybe we were not able to find them.

We did not use a large repository of books, it appears that our movie list gave us enough work to compute through and most likely had overlap with books.

We did see some streets/roads early on but forgot to pursue this further.

We noticed some mathematical formulas as passwords, though did not look took deep into this

We had a member suggest the use of a dictionary labeled as ‘polkski_dict’ (which appeared to contain a random assortment of things) very early on. This was far too big to test across the tougher algorithms, we put it aside and forgot about it until near the end where we were able to cut down the contents and found it to be decent in producing founds, we were not able to fully exhaust this dictionary due to time limits.

Take-away

While a large number of compute resources certainly makes a difference, if you are simply throwing hashes and random lists/attacks at the situation and hoping for the best, more likely than not the outcome won't be desirable. Identifying patterns and sources then optimizing the attack parameters such as cutting down salts via attacking a subset of hashes or using a specific list set/rule helps dramatically, also ensuring the workload is able to saturate the compute cores is also critical.

All in all, our team as usual had very little sleep and a wonderful time solving the challenges and competing against the best teams. It was great to be able to use correlated data in hash cracking. We can only imagine the thought and effort involved in creating the challenges along with hints and finally wrapping it all up in a nice, well-run contest. Kudos to Korelogic.


Saturday, August 20, 2022

Korelogic's CMIYC 2022 @ DEF CON 30 Write-up

What a breath of fresh air to have DEF CON 30 not be canceled this year. We are thankful that Korelogic’s CMIYC is running strong 13 years in. Going into this contest we assumed the competition would be quite fierce, however we are always up for a fair challenge. As most members on our team are hobbyist password crackers not working in the cybersec industry, this gave us a wonderful opportunity to dust down and re-paste our GPUs.

Our roster this year comprised of 9 active crackers, 1 part-time cracker and 2 others providing ancillary support. The members included were AMD, blazer, cvsi, gearjunkie, golem445, hops, MXall0c, Waffle, s3in!c, pdo with winxp5421 and usasoft playing support roles in our comms and hash management system.

Our hardware list consisted of

 Confirmed

      1x 3080

      3x 2080 TI

      2x 2080

      4x 1080

      7x 1080 TI

      2x 1070

Unconfirmed

      We are missing the GPU count from two of our crackers. You know who you are, if you read this please report back.

Brief challenge overview

The 7zip and half-md5 archives were cracked easily using john the ripper (JTR). JTR was subsequently used with the correct libxcrypt library to crack the yescrypt hashes as that was the only cracker supporting yescrypt.

web.conf challenge involved setting up Jasypt (the Java equivalent to Python’s hashlib). The runme.sh indicated that each line in the output.txt was encrypted with PBEWITHMD5ANDDES, with both the input and password being the same. Therefore as a verification, if the ciphertext was correctly decrypted, the resulting output would be identical to the password.Initially, the decrypt.sh provided by Jasypt was used to manually feed in the candidates, a python script was later written to automate the process. This wasn’t relied on as it was easy enough to guess by hand once the pattern was observed. We were able to quickly form the url for this challenge and decipher the url to reveal the ‘tennis shoes’ hash list as a team.

LoopAES challenge; while this appeared to be a mountable filesystem, the length did not coincide with this, therefore it was presumed to be aespipe instead. However, since there is no ability to check whether the file is decrypted correctly a quick perl script was produced to run sanity checks on the output to determine a successful decrypt.

list23-Authoritiesappeart…gpg was cracked using JTR with a small dictionary containing the potential candidates while the heavy lifting was carried out by JTRs rule engine.

DEFCON-with-key.kdbx; initially attempted with JTR, once the keyfile was released during the contest we had issues obtaining a valid hash from keepass2john. A closer inspection showed that Korelegic messed with the keyfile and instead of using base64 encoded data, a hex string was present. This involved fixing the keyfile by decoding the hex and encoding the resulting data with base64 to extract a valid hash. Admittedly more time was spent than we would like to admit until someone tried opening the KeePass database only using the keyfile and no password.

DEFCON.kbdx (released later in the contest), once the file was released it was converted to a hash and hashcat was used to crack the hash.

Gocryptfs, a VM was spun up for this challenge and the hash quickly cracked manually on the 4th try, no cracking software was used.

Salt_and_pepper took us slightly longer than expected to identify the salt and peppers for. We did test other hash types as a precaution.

The odt file was cracked using Passware’s Passcovery, though JTR should have been able to handle this as well. We also found out that ms-word does not open password protected open office documents.

Riddled_wrapped_in_an.zip was cracked with both JTR and Passcovery independently.

Problems

We did have a slight issue parsing the nsldaps SSHA-1 hashes, this was quickly corrected, though it did accidentally trigger a double upload to Korelogic, other than this incident we did not run into any submission difficulties.

We spent a considerable amount of time wasted on cracking the initial uncrackable KeePass file. Although the name did suggest a key was needed, it wasn’t evident whether this was a hint for hashes inside, so it was better to at least try. GPUs were put on the extracted hash for this file but obviously was not able to yield any results.

We initially used hashcat mode 25400 to attack the converted pdf hash. This resulted in many false positives. These false positives, many which contained untypable characters, were tested and unsuccessful in opening up the file with non-typable characters. Mode 10500 was then used to obtain the correct password. The pdf was then required to have the security removed as there were restrictions preventing the hashes from being copied out. Furthermore, once the hashes were copied out, slight reformatting was necessary to get them usable.

Fooo file from the enigma zip; Resources were used to analyse and tear apart the bizarre fooo file. Analysis of the file indicated it was a repeating block of 3.8M of data 27 times, which suggested the data was not encrypted due to the low entropy of output produced.  Tools used included binwalk, veracrypt, random and suspicious enigma file encryption/decryption application from softpedia and sourceforge and finally the FreeBSD enigma crypt utility.  Like the LoopAES decrypts, an automated decrypt and validate script was used to scan the decoded output for meaningful data, our attempts to decode the file were futile. Another suggestion was that since the data was nicely divisible by 16/20/32 bytes it could suggest that they were MD5/SHA1 or SHA256 hashes dumped straight out of memory. The bytes were reassembled in both endian forms, then run through crackers to try “de-hash*” them.

Analysis

The actual “de-hash*” procedure was relatively straightforward, once a challenge was solved the hashes were parsed and uploaded to our hash management system. From here, members worked collaboratively but autonomously. Once a pattern was spotted, it was reported and members would share this information, users parsed their own lists which ensured a variation of coverage for that wordlist set. This enabled us to cover each other’s parsing errors, tooling issues and hopefully differences used in the plaintext generation by Korelogic.

After a hashset was cracked close to completion, the difficulty would increase drastically. This could have been caused by missing the baseword/phrases or missing a ruleset or pattern to match the plaintext. To push that last few percent, members would shift their attack strategy or redeploy to a different algorithm as having a fresh set of eyes/wordlist does wonders.

Although we had a distributed hashtopolis instance setup, we did not have to tap in it. We did not require a large amount of compute resources for this contest and even if it was available, the challenges would have been a bottlenecking factor rather than being compute bound. Some of the challenges were even solved on a laptop or legacy hardware. Where possible, we would try to reduce the keyspace tested to a minimal number of candidates. It was also beneficial to communicate to others what was happening to prevent work being overlapped. The most important part was to recognize the patterns and adapt quickly or leverage the correct toolset.

The SHA224 hashes proved to be quite intriguing. We identified two basewords ‘hacker’ and ‘homecoming’ and quickly noticed the plaintexts were based on a heavily mangled version of these words. This included case toggles throughout the whole word, coupled with insertions/deletions, overstrikes and character swaps. It was evident that although being 2 simple basewords, this would result in an insanely large keyspace as the plaintext length increased. Once an adequate amount of plains were cracked, some users switched over to markov models, including the use of OMEN, PCFG_Cracker and PrinceProcessor and these tools gave us a decent amount of hits. We would also cycle the extracted rules from the plaintext through other basewords, as this would allow us to identify potential basewords, we thought we found another ‘ihatecooking/ihatecoding’, though this was possibly an artifact of the mutations on the original two basewords.

Thoughts

It would appear we were the first team to submit cracks for every algorithm, though we are unsure whether we were the first to actually solve all the challenges as other teams may have been working on the hashes or busy parsing them or had not actually submitted cracks yet.

If we didn’t push the envelope of password cracking, we sure pushed the envelope of interpreting a chunk of repeating bytes and demonstrating our creativity in trying to make something useful out of it. More than enough resources were spent on this task, if you are reading this Korelogic, please check your inbox for a laugh.

This year’s contest was structured quite differently to last year. Teams who were unable to solve the challenges would not have been able to unlock the hashes, in a way it pushed teams to explore. However, this would have punished teams less experienced with CTF style challenges causing a huge idle of processing power. We tried our best to keep Team Hashcat on their toes and traded places with them 5 times throughout the contest. Team Hashcat demonstrated their expertise and skills by cracking almost the most hashes across algorithms, while Hashmob was also able to keep up by solving the hashes quite rapidly once a challenge was unlocked. Towards the end of the contest when all the teams had solved all the challenges it came down to being able to constantly supply new founds as this would allow us to at least maintain our position.

As a group we had excellent team synergy, as a contest other than the red herrings it was well thought out and planned, as a competition we thoroughly enjoyed facing our competitors Team Hashcat, Hashmob, john-users, achondritic and trontastic.

Footnotes:

We are well aware that you cannot “de-hash” as hashes are strictly one way digests. It has not stopped others using it and has not stopped us from having a friendly poke at it. See here for more info.