[转] Java HashMap 的工作原理

本文转载自 ImportNew - miracle1919, 翻译自 javacodegeeks。

面试的时候经常会遇见诸如：“Java 中的 HashMap 是怎么工作的”，“HashMap 的 get 和 put 内部的工作原理”这样的问题。本文将用一个简单的例子来解释下 HashMap 内部的工作原理。首先我们从一个例子开始，而不仅仅是从理论上，这样，有助于更好地理解，然后，我们来看下 get 和 put 到底是怎样工作的。

我们来看个非常简单的例子。有一个“国家”（Country）类，我们将要用 Country 对象作为 key，它的首都的名字（String 类型）作为 value。下面的例子有助于我们理解 key-value 对在 HashMap 中是如何存储的。

Country.java

public class Country {
    String name;
    long population;
    public Country(String name, long population) {
        super();
        this.name = name;
        this.population = population;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public long getPopulation() {
        return population;
    }
    public void setPopulation(long population) {
        this.population = population;
    }
    // If length of name in country object is even then return 31(any random number) and if odd then return 95(any random number).
    // This is not a good practice to generate hashcode as below method but I am doing so to give better and easy understanding of hashmap.
    @Override
    public int hashCode() {
        if(this.name.length() % 2 == 0)
            return 31;
        else 
            return 95;
    }
    @Override
    public boolean equals(Object obj) {
        if(this == obj) {
            return true;
        }
        if(null == obj || !(obj instanceof Country)) {
            return false;
        }
        Country other = (Country) obj;
        if (name.equalsIgnoreCase((other.name)))
            return true;
        return false;
    }
}

HashMapStructure.java (main class)

import java.util.HashMap;
import java.util.Iterator;
public class HashMapStructure {
    public static void main ( String[] args ) {
        Country india = new Country ("India", 1000);
        Country japan = new Country ("Japan", 10000);
        Country france = new Country ("France", 2000);
        Country russia = new Country ("Russia", 20000);
        HashMap<Country, String> countryCapitalMap = new HashMap<Country, String>();
        countryCapitalMap.put (india, "Delhi");
        countryCapitalMap.put (japan, "Tokyo");
        countryCapitalMap.put (france, "Paris");
        countryCapitalMap.put (russia, "Moscow");
        Iterator<Country> countryCapitalIter = countryCapitalMap.keySet().iterator();//put debug point at this line
        while (countryCapitalIter.hasNext()) {
            Country countryObj = countryCapitalIter.next ();
            String capital = countryCapitalMap.get (countryObj);
            System.out.println (countryObj.getName () + "----" + capital);
        }
    }
}

现在，在 Iterator… 这行设置一个断点，在项目上右击 -> 调试运行 -> java应用。程序会停在这行，然后在 countryCapitalMap 上右击，选择“查看”（watch）。将会看到如下的结构：

从上图可以观察到以下几点：

1、有一个叫做 table 大小是 16 的 Entry 数组。

2、这个 table 数组存储了 Entry 类的对象。HashMap 类有一个叫做 HashMapEntry 的内部类实现了 Entry 接口。这个 HashMapEntry 类包含了 key-value 作为实例变量。

我们看下 HashMapEntry 类的结构：

static class HashMapEntry<K, V> implements Entry<K, V> {
    final K key;
    V value;
    final int hash;
    HashMapEntry<K, V> next;
    ...//More code goes here
}

1、每当往 Hashmap 里面存放 key-value 对的时候，都会为它们实例化一个 Entry 对象，这个 Entry 对象就会存储在前面提到的 Entry 数组 table 中。现在你一定很想知道，上面创建的 Entry 对象将会存放在具体哪个位置（在 table 中的精确位置）。答案就是，根据 key 的 hashcode() 方法计算出来的 hash 值（来决定）。hash 值用来计算 key 在 Entry 数组的索引。

2、现在，如果你看上图数组的索引 10，它有个 HashMap$Entry 的 Entry 对象。

3、我们往 Hashmap 放了 4 个 key-value 对，但是看上去好像只有 2 个元素！！！这是因为，如果两个元素有相同的 hashcode，他们会被放在同一个索引上。它是以链表（LinkedList）的形式来存储的（逻辑上）。

所以上面的 Country 对象的 key-value 的 hashcode 值是如何计算出来的：

hashCode for Japan = 95，因为它的长度为奇数

hashCode for India = 95，因为它的长度为奇数

hashCode for Russia = 31，因为它的长度为偶数

hashCode for France = 31，因为它的长度为偶数

下面这图会清晰地从解释下链表：

所以，现在假如你已经很好地了解了 Hashmap 的结构，让我们看下 put 和 get 方法。

Put

让我们看下 put 方法的实现：

/**
 * Maps the specified key to the specified value.
 *
 * @param key
 *            the key.
 * @param value
 *            the value.
 * @return the value of any previous mapping with the specified key or
 *         {@code null} if there was no such mapping.
 */
@Override public V put(K key, V value) {
    if (key == null) {
        return putValueForNullKey(value);
    }
    int hash = Collections.secondaryHash(key);
    HashMapEntry<K, V>[] tab = table;
    int index = hash & (tab.length - 1);
    for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
        if (e.hash == hash && key.equals(e.key)) {
            preModify(e);
            V oldValue = e.value;
            e.value = value;
            return oldValue;
        }
    }
    // No entry for (non-null) key is present; create one
    modCount++;
    if (size++ > threshold) {
        tab = doubleCapacity();
        index = hash & (tab.length - 1);
    }
    addNewEntry(key, value, hash, index);
    return null;
}

/**
 * Creates a new entry for the given key, value, hash, and index and
 * inserts it into the hash table. This method is called by put
 * (and indirectly, putAll), and overridden by LinkedHashMap. The hash
 * must incorporate the secondary hash function.
 */
void addNewEntry(K key, V value, int hash, int index) {
    table[index] = new HashMapEntry<K, V>(key, value, hash, table[index]);
}

现在我们一步一步来看上面的代码。

对 key 做 null 检查。如果 key 是 null，会被存储到 table[0]，因为 null 的 hash 值总是 0。
key 的 hashCode() 方法会被调用，然后二次计算 hash 值。hash 值用来找到存储 Entry 对象的数组的索引。有时候 hash 函数可能写的很不好，所以 JDK 的设计者添加了另外一个叫做 hash() 的方法，它接收刚才计算的 hash 值作为参数。如果你想了解更多关于 hash() 函数的东西，可以参考：Hashmap 中的 hash 和 indexFor 方法
然后 for 循环用来寻找有没 hash 值相同且 key 也 equals 的对象，如果有就会用当前 Entry 的 value 来替换之前 value。
如果这时 Hashmap 的 size 已经慢，就会将自身容量变为 2 倍
从 addNewEntry 方法中观察，它是先把当前的 key-value 对创建一个 HashMapEntry，其中 next 指向原来的 table[index]，在重新赋值给 table[index]，这样两个 hash 值相同的 key 就以链表的形式存储起来了。

Get

现在我们看下 get 方法的实现：

/**
 * Returns the value of the mapping with the specified key.
 *
 * @param key
 *            the key.
 * @return the value of the mapping with the specified key, or {@code null}
 *         if no mapping for the specified key is found.
 */
public V get(Object key) {
    if (key == null) {
        HashMapEntry<K, V> e = entryForNullKey;
        return e == null ? null : e.value;
    }
    int hash = Collections.secondaryHash(key);
    HashMapEntry<K, V>[] tab = table;
    for (HashMapEntry<K, V> e = tab[hash & (tab.length - 1)];
            e != null; e = e.next) {
        K eKey = e.key;
        if (eKey == key || (e.hash == hash && key.equals(eKey))) {
            return e.value;
        }
    }
    return null;
}

当你了解 Hashmap 的 put 的工作原理了，理解 get 的工作原理就非常简单了。

对 key 进行 null 检查。如果 key 是 null，那么将返回 table[0] 位置的元素。
根据 key 的 hashCode() 方法计算 hash 值。
然后根据 hash 值就确定在 table 数组的精确位置。
在确定索引之后，会迭代链表，判断 key 的引用是否相同，或者调用 equals 方法检查，如果为 true，则返回 Entry 对象的 value，否则，返回 null。

要牢记以下关键点：

1、HashMap 有一个叫做 HashMapEntry 的内部类，它用来存储 key-vallue 对。

2、上面的 HashMapEntry 对象是存储在一个叫做 table 的 Entry 数组中

3、table 的索引在逻辑上叫做”桶“（bucket），它存储了链表的第一个元素。

4、key 的 hashCode() 方法经过二次计算用来确定 HashMapEntry 对象所在的桶，即在 table 数组上的索引。

5、如果两个不同的 key 的 hash 值相同，它们会被放在 table 数组的同一个桶里。

6、key 的 equals 方法用来确保 key 的唯一性。

7、value 对象的 equals() 和 hashCode() 在 Hashmap 中没有用到。