1.HashMap存取值的原理 哈希值: 通过一定的散列算法,把一个不固定长度的输入,转成一个固定长度的输出,输出的结果我们称之为哈希 map中,hash就是一个int值
哈希表:存储哈希值的数组 – 存取散列值(哈希值)的一个容器 哈希值到底该如何存,该如何取呢??? – 通过数组的角标实现数据的存取
需要有一个映射:不同的hash值存在对应角标位
hash值 ----运算---》 index
哈希函数:将哈希值通过某种运算规则得到对应index
1.存值分析:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 public V put (K key, V value) { if (key == null ) return putForNullKey(value); int hash = hash(key); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null ; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this ); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null ; }
1 2 3 4 5 6 7 8 9 10 11 12 void addEntry (int hash, K key, V value, int bucketIndex) { if ((size >= threshold) && (null != table[bucketIndex])) { resize(2 * table.length); hash = (null != key) ? hash(key) : 0 ; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); }
1 2 3 4 public V put (K key, V value) { return putVal(hash(key), key, value, false , true ); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 final V putVal (int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0 ) n = (tab = resize()).length; if ((p = tab[i = (n - 1 ) & hash]) == null ) tab[i] = newNode(hash, key, value, null ); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this , tab, hash, key, value); else { for (int binCount = 0 ; ; ++binCount) { if ((e = p.next) == null ) { p.next = newNode(hash, key, value, null ); if (binCount >= TREEIFY_THRESHOLD - 1 ) treeifyBin(tab, hash); break ; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break ; p = e; } } if (e != null ) { V oldValue = e.value; if (!onlyIfAbsent || oldValue == null ) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null ; }
2.取值分析
1 2 3 4 5 6 7 8 9 public V get (Object key) { if (key == null ) return getForNullKey(); Entry<K,V> entry = getEntry(key); return null == entry ? null : entry.getValue(); }
1 2 3 4 5 6 7 8 9 10 11 12 13 final Entry<K,V> getEntry (Object key) { int hash = (key == null ) ? 0 : hash(key); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null ; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } return null ; }
1 2 3 4 public V get (Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 final Node<K,V> getNode (int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1 ) & hash]) != null ) { if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null ) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null ); } } return null ; }
3.如何让存取效率是最高的 如果元素都是均匀存储在数据的角标位,而不产生冲突,就是最好的
(1) 尽可能少的产生hash冲突 hash函数计算出来的角标要尽可以能均匀
1 2 3 static int indexFor (int h, int length) { return h & (length-1 ); }
length长度的规则是什么?? – 要求是2的幂次方数,为啥???
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 假设长度是8 -1 === 7 00000000000000000000000000000111 --- 长度8,参与hash运算是7 00000000000000000000000000001111 --- 长度16,参与hash运算是15 00000000000000000000000000011111 --- 长度32,参与hash运算是31 00110011000111111001110111101111 & 00000000000000000000000000001111 -------------------------------------------- 00000000000000000000000000000000 -- 00000000000000000000000000001111 【得到的值,正好是0-15的范围,该范围正好是长度为16的数组的角标】 假设长度不是2的幂次方: 奇数 -- 长度5 00101001001001011111111111100111 & 00000000000000000000000000000100 -- 长度5,参与hash运算是4 ------------------------------------------ 得到的结果永远是一个偶数,代表获取的角标永远是偶数,奇数位的角标永远会都没有值 【数组的就浪费掉了一半】 偶数 -- 长度6 00101001001001011111111111100111 & 00000000000000000000000000000101 -- 长度6,参与hash运算是5 ------------------------------------------------- 第二位永远是0,导致2,3角标位都不可能有值 【数组的就浪费掉了一半
2的幂次方,在扩容时,扩容后的数组长度是原数组长度的2倍,结果还是2的幂次方,有啥好处???
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 假设原来是8 00000000000000000000000000000111 扩容后是16 00000000000000000000000000001111 在扩容时,数据要从原数组中迁移到新数组中 00000000000000000000000000001101 00000000000000000000000000000111 ------------------------------------- 101 === 5 原数组的角标位 00000000000000000000000000001101 00000000000000000000000000001111 -------------------------------------- 1101 ==== 13(原角标位+扩容长度) 新数组的角标位 【扩容后,数据迁移时,数据要么在原来位置,要么在原来位置+扩容长度 不需要重新hash--效率更好】
(2) 确实产生了hash冲突 – 扩容+数据结构
怎么扩容 jdk1.7 添加元素使用头插法
将单向链表的数据进行迁移
jdk1.8
添加元素使用尾插法
如果对应角标位是单向链表,将单向链表进行数据迁移
如果对应角标位是红黑树,将双向链表进行数据迁移
如果发现迁移完数据后,双向链表的结构小于/等于6,会将红黑树重新转回单向链表的结构
扩容后有什么问题(多线程环境) jdk1.7 多线程环境下会形成环形链表
jdk1.8
多线程环境下会有数据丢失的问题
2.HashMap如何防止碰撞 解决hash碰撞的方式有很多,比如开放地址法,重哈希,链地址法,公共溢出区等等。HashMap中防止碰撞的方式主要有两个:哈希值扰动+链地址法(当扰动后,还是hash碰撞,使用链表/红黑树存储元素)
1 2 3 4 5 6 7 8 9 10 11 final int hash (Object k) { int h = 0 ; h ^= k.hashCode(); h ^= (h >>> 20 ) ^ (h >>> 12 ); return h ^ (h >>> 7 ) ^ (h >>> 4 ); } 01100011 00001110 11000110 00011001 00000000 01101010 11001100 00011001
1 2 3 4 5 static final int hash (Object key) { int h; return (key == null ) ? 0 : (h = key.hashCode()) ^ (h >>> 16 ); }
3.HashMap容量取值问题
1 2 3 4 public HashMap () { this (DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR); }
1 2 3 public HashMap (int initialCapacity) { this (initialCapacity, DEFAULT_LOAD_FACTOR); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 public HashMap (int initialCapacity, float loadFactor) { if (initialCapacity < 0 ) throw new IllegalArgumentException ("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException ("Illegal load factor: " + loadFactor); int capacity = 1 ; while (capacity < initialCapacity) capacity <<= 1 ; this .loadFactor = loadFactor; threshold = (int )Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1 ); table = new Entry [capacity]; useAltHashing = sun.misc.VM.isBooted() && (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD); init(); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; 例: 假设自定义容量取值为10 1<10 1进行左移运算 0000 0000 0000 0000 0000 0000 0000 0001 -- 新容量等于1 << 0000 0000 0000 0000 0000 0000 0000 0010 -- 新容量等于2 2<10 2进行左移运算 0000 0000 0000 0000 0000 0000 0000 0010 << 0000 0000 0000 0000 0000 0000 0000 0100 -- 新容量等于4 4<10 4进行左移运算 0000 0000 0000 0000 0000 0000 0000 0100 << 0000 0000 0000 0000 0000 0000 0000 1000 -- 新容量等于8 8<10 8进行左移运算 0000 0000 0000 0000 0000 0000 0000 1000 << 0000 0000 0000 0000 0000 0000 0001 0000 -- 新容量等于16
为什么容量必须是2的幂次方数呢?
① 以上那些2的幂次方数有一个特点,高位为1,后续全部为0,这样的数减一,就会变成刚才为1的位置为0,后续所有值都为1,这样减一之后的数,和任何数进行与运算,得到的结果,永远是0-2的幂次方减一,正好符合数组角标的范围。
② 同时减一后,一定是一个奇数,末位一定是1,那么和其他数进行与运算后,得到的结果可能是奇数,也可能是偶数,那么可以充分利用数组的容量。
③ 2的幂次方数减一后,低位都是1,这样数组的索引位都有可能存入元素,如果低位不都是1,就会导致有些数组的索引位永远空缺,不利于数组的充分利用
④ 便于扩容时,重新定位元素的索引位,我们知道扩容的原则是原来数组的2倍,那么扩容后,数组容量还是一个2的幂次方数,原数组中的元素在新数组中,要么在原始索引位,要么在原始索引位+扩容值的位置,避免了重新hash的效率问题
注意,jdk1.8的容量计算动作,在resize()扩容方法中完成。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null ) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0 ; if (oldCap > 0 ) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1 ) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1 ; } else if (oldThr > 0 ) newCap = oldThr; else { newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int )(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0 ) { float ft = (float )newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float )MAXIMUM_CAPACITY ? (int )ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node [newCap]; table = newTab;
1 2 3 4 5 6 7 8 9 static final int tableSizeFor (int cap) { int n = cap - 1 ; n |= n >>> 1 ; n |= n >>> 2 ; n |= n >>> 4 ; n |= n >>> 8 ; n |= n >>> 16 ; return (n < 0 ) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1 ; }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 int n = cap - 1 ;n |= n >>> 1 ; n |= n >>> 2 ; n |= n >>> 4 ; n |= n >>> 8 ; n |= n >>> 16 ; return (n < 0 ) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1 ; 假设初始容量设置为10 : n = 10 -1 0000 0000 0000 0000 0000 0000 0000 1001 n右移1 位 0000 0000 0000 0000 0000 0000 0000 0100 | 0000 0000 0000 0000 0000 0000 0000 1001 ----------------------------------------------------- 0000 0000 0000 0000 0000 0000 0000 1101 n右移2 位 0000 0000 0000 0000 0000 0000 0000 0011 | 0000 0000 0000 0000 0000 0000 0000 1101 --------------------------------------------------- 0000 0000 0000 0000 0000 0000 0000 1111 n右移4 位 0000 0000 0000 0000 0000 0000 0000 0000 | 0000 0000 0000 0000 0000 0000 0000 1111 ---------------------------------------------------- 0000 0000 0000 0000 0000 0000 0000 1111
通过一系列右移+或运算后,能够将初始值减一得到的值,后面的所有0变成1,最终返回的是得到的值+1的结果作为容量,正好就是大于等于给定容量的2的幂次方数
4.HashMap数据结构
1 2 3 4 5 6 7 8 void createEntry (int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry <>(hash, key, value, e); size++; }
jdk1.8数据结构:数组+单向链表+红黑树+双向链表
数组和单向链表的结构,在这里就不再赘述了,前面存取元素的过程中已经分析
这里我们来看下红黑树和双向链表结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 final void treeifyBin (Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); else if ((e = tab[index = (n - 1 ) & hash]) != null ) { TreeNode<K,V> hd = null , tl = null ; do { TreeNode<K,V> p = replacementTreeNode(e, null ); if (tl == null ) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null ); if ((tab[index] = hd) != null ) hd.treeify(tab); } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 final TreeNode<K,V> putTreeVal (HashMap<K,V> map, Node<K,V>[] tab, int h, K k, V v) { Class<?> kc = null ; boolean searched = false ; TreeNode<K,V> root = (parent != null ) ? root() : this ; for (TreeNode<K,V> p = root;;) { int dir, ph; K pk; if ((ph = p.hash) > h) dir = -1 ; else if (ph < h) dir = 1 ; else if ((pk = p.key) == k || (k != null && k.equals(pk))) return p; else if ((kc == null && (kc = comparableClassFor(k)) == null ) || (dir = compareComparables(kc, k, pk)) == 0 ) { if (!searched) { TreeNode<K,V> q, ch; searched = true ; if (((ch = p.left) != null && (q = ch.find(h, k, kc)) != null ) || ((ch = p.right) != null && (q = ch.find(h, k, kc)) != null )) return q; } dir = tieBreakOrder(k, pk); } TreeNode<K,V> xp = p; if ((p = (dir <= 0 ) ? p.left : p.right) == null ) { Node<K,V> xpn = xp.next; TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn); if (dir <= 0 ) xp.left = x; else xp.right = x; xp.next = x; x.parent = x.prev = xp; if (xpn != null ) ((TreeNode<K,V>)xpn).prev = x; moveRootToFront(tab, balanceInsertion(root, x)); return null ; } } }
你可能会有疑问,为什么在维护红黑树的同时,需要再维护一种双向链表的结构呢?其实主要是为了扩容方便的
5.HashMap扩容时机及扩容机制
1 2 3 4 5 6 7 8 9 10 11 void addEntry (int hash, K key, V value, int bucketIndex) { if ((size >= threshold) && (null != table[bucketIndex])) { resize(2 * table.length); hash = (null != key) ? hash(key) : 0 ; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 void resize (int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return ; } Entry[] newTable = new Entry [newCapacity]; boolean oldAltHashing = useAltHashing; useAltHashing |= sun.misc.VM.isBooted() && (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD); boolean rehash = oldAltHashing ^ useAltHashing; transfer(newTable, rehash); table = newTable; threshold = (int )Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1 ); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 void transfer (Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while (null != e) { Entry<K,V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null ) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0 ; if (oldCap > 0 ) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1 ) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1 ; } else if (oldThr > 0 ) newCap = oldThr; else { newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int )(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0 ) { float ft = (float )newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float )MAXIMUM_CAPACITY ? (int )ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node [newCap]; table = newTab; if (oldTab != null ) { for (int j = 0 ; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null ) { oldTab[j] = null ; if (e.next == null ) newTab[e.hash & (newCap - 1 )] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this , newTab, j, oldCap); else { Node<K,V> loHead = null , loTail = null ; Node<K,V> hiHead = null , hiTail = null ; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0 ) { if (loTail == null ) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null ) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null ); if (loTail != null ) { loTail.next = null ; newTab[j] = loHead; } if (hiTail != null ) { hiTail.next = null ; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
红黑树数据的迁移
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 final void split (HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) { TreeNode<K,V> b = this ; TreeNode<K,V> loHead = null , loTail = null ; TreeNode<K,V> hiHead = null , hiTail = null ; int lc = 0 , hc = 0 ; for (TreeNode<K,V> e = b, next; e != null ; e = next) { next = (TreeNode<K,V>)e.next; e.next = null ; if ((e.hash & bit) == 0 ) { if ((e.prev = loTail) == null ) loHead = e; else loTail.next = e; loTail = e; ++lc; } else { if ((e.prev = hiTail) == null ) hiHead = e; else hiTail.next = e; hiTail = e; ++hc; } } if (loHead != null ) { if (lc <= UNTREEIFY_THRESHOLD) tab[index] = loHead.untreeify(map); else { tab[index] = loHead; if (hiHead != null ) loHead.treeify(tab); } } if (hiHead != null ) { if (hc <= UNTREEIFY_THRESHOLD) tab[index + bit] = hiHead.untreeify(map); else { tab[index + bit] = hiHead; if (loHead != null ) hiHead.treeify(tab); } } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 final Node<K,V> untreeify (HashMap<K,V> map) { Node<K,V> hd = null , tl = null ; for (Node<K,V> q = this ; q != null ; q = q.next) { Node<K,V> p = map.replacementNode(q, null ); if (tl == null ) hd = p; else tl.next = p; tl = p; } return hd; }
6.多线程下HashMap扩容的问题
jdk1.7,多线程扩容情况下,会导致循环引用
jdk1.8,多线程环境下,会导致数据丢失问题
1 2 3 4 5 6 7 8 9 10 11 final V putVal (int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0 ) n = (tab = resize()).length; if ((p = tab[i = (n - 1 ) & hash]) == null ) tab[i] = newNode(hash, key, value, null );
1 2 3 4 5 6 7 8 9 10 if (loTail != null ) { loTail.next = null ; newTab[j] = loHead; } if (hiTail != null ) { hiTail.next = null ; newTab[j + oldCap] = hiHead; }
当然,jdk1.8中的HashMap本身是线程不安全的,在多线程环境下,应该还会有更多其他问题,有待大家一起去探究。