PenTest Edition: Using “theHarvester” to Gather E-mail accounts, Subdomains, Hosts, LinkedIn Users, Banner Information, and More!

theHarvester is a neat information-gathering tool used by both ethical and non-ethical hackers to scrape up emails, subdomains, hosts, employee names, open ports, and banners from different public sources like popular search engines, PGP key servers, and the Shodan database. This program is particularly useful during the reconnaissance phase of gathering Open Source Intelligence (ONSIT).


The information provided on the cybersecurityman is for educational purposes only. I am in no way responsible for any misuse of the information provided. All the information here is meant to provide the reader with the knowledge to defend against hackers and prevent the attacks discussed here. At no time should any reader attempt to use this information for illegal purposes.The information provided on the cybersecurityman is for educational purposes only. I am in no way responsible for any misuse of the information provided. All the information here is meant to provide the reader with the knowledge to defend against hackers and prevent the attacks discussed here. At no time should any reader attempt to use this information for illegal purposes.


This program comes pre-installed in Kali Linux and it was created by Christian Martorella. The current version is version 3.0 (edit, I realized after I completed this post that I was using version 2.7.2 the whole time, so if you need to update theHarvester, you can find it here: https://github.com/laramies/theHarvester). Here is a short list of some of the options the theHarvester has to offer.

theharvester options.png

This isn’t an extensive list, and that makes it easy to use. But, notice all the data sources we can use using the -b argument, such as Baidu, Bing, Google, GoogleCSE, LinkedIn, PGP, Twitter, vhost, VirusTotal, netcraft, Yahoo, and so forth. We can also perform active attacks, including DNS brute force attacks, DNS reverse lookups, and DNS Top-Level Domain (TLD) expansions. Additionally, theHarvester comes with some examples to assist users in crafting useful commands.

theharvester examples.png

  1. In the first example, theharvester -d microsoft.com -l 500 -b google -h myresults.html tells theHarvester program to target microsoft.com and search for any information it can find using the Google search engine and discover available hosts by querying the Shodan database. However, the -l argument limits the number of results in a Google search to only 500. The myresults.html at the end of the command saves the results in an html file.
  2. The second command, theharvester -d microsoft.com -b pgp, searches for e-mail accounts for the domain microsoft.com in a PGP server.
  3. The third command on the list, theharvester -d microsoft -l 200 -b linkedin tells theHarvester program to search through the first 100 results of a Microsoft search on LinkedIn. This would identify a list of employees who either currently or previously worked for LinkedIn.
  4. And the final command, theharvester -d apple.com -b googleCSE -l 500 -s 300, limits the search results for apple.com to 500 using Google’s custom search engine, but starts at 300 due to the -s argument.

Hopefully, this has made theHarvester syntax a little easier to understand. So, let’s work through a couple of examples on our own. I won’t be able to cover everything theHarvester can do, but I will try to cover most of them.

Gathering LinkedIn Users

Assume I am a penetration tester authorized to work for Apple. To a penetration tester who is gathering ONSIT, a job site or social media site is a sanctum. For job sites, such as LinkedIn, users voluntarily and publicly submit all types of information about themselves, such as their personal data, professional work history, education, contact information, interests, hobbies, and so forth.

Open up a terminal and use the command theharvester -d apple -l 100 -b linkedin.

theharvester linkedin command.png

This command searches for LinkedIn users who are affiliated with Apple, Inc.

theharvester linkedin results.png

These are all LinkedIn users affiliated with Apple in some way. Keep in mind that this is all publicly-available information. In this list of LinkedIn users, we could have Apple data scientists, Apple data engineers, Apple managers, or even people just interested in Apple.

Gathering E-mail Addresses

Or, let’s say I’m working for The Guardian and want to gather email addresses on journalists. I can do this using the command theharvester -d theguardian.com -b pgp.

theharvester guardian command

This command will tell theHarvester to search for email accounts with the domain name “theguardian.com” in a pgp server, which is used for encrypting emails.

theharvester emails 2.png

The output is cut, but this command provides a very long list of email addresses.

Using Microsoft’s Bing for E-mails and Hostnames

Many users like to use Google, but you can also use other popular search engines, such as Bing. Let’s get a list of email addresses and hostnames for UMD on a bing search.

theharvester bing 1.png

We can use the command theharvester -d umd.edu -l 200 -b bing. I used the -l argument to limit the search number to 200 results.

theharvester bing results.png

The results give us quite a few emails to work with, but also several domain names and their corresponding IP addresses.

Finding Twitter Usernames

Social media sites, like Twitter, allow penetration testers to gather additional clues. With social media profiling, penetration testers can gain insightful information about a target employee, such as phone numbers, email addresses, photos, locations, etc. Using theharvester, users can gather specific Twitter usernames.

I had a lot of success with this feature until it came time to take a few snapshots. Unfortunately, theharvester stopped working. Maybe I’ll update this section if I find it starts working again.

DNS Brute Force Attacks

Users also have the ability to conduct DNS bruteforce attacks, which queries the target domain using a wordlist file. For example, we can use the command theharvester -d google.com -c -d google.

theharvester google brute force command.png

The -c argument is used to conduct a DNS brute force attack against google.com. With this command, we get the following subdomains.

theharvester google bruteforce.png

By default, this command uses the “dns-names.txt” file, which is found in /usr/share/theHarvester/. If there is an error that occurs during this attack, it’s likely because there’s a small error in the configuration file. Open up the “dnssearch.py” file in usr/share/theHarvester/discover/ and locate where it says “Class dns_force()”. Use the “find” tool to make this easier. After you locate this section, change the following line from “self.file = dns-names.txt” to “self.file = usr/share/theharvester/dns-names.txt.” This should fix the error.

Helpful Tip

If you really want to get the most out of theHarvester, you can use the -b all argument to use all the data sources available when gathering information on your target.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: