Hades: Java 集合之旅

世间上本来没有集合,(只有数组参考C语言)但有人想要,所以有了集合
有人想有可以自动扩展的数组,所以有了List
有的人想有没有重复的数组,所以有了set
有人想有自动排序的组数,所以有了TreeSet,TreeList,Tree**

而几乎有有的集合都是基于数组来实现的.
因为集合是对数组做的封装,所以,数组永远比任何一个集合要快。

但任何一个集合,比数组提供的功能要多

一：数组声明了它容纳的元素的类型，而集合不声明。这是由于集合以object形式来存储它们的元素。

二：一个数组实例具有固定的大小，不能伸缩。集合则可根据需要动态改变大小。

三：数组是一种可读/可写数据结构－－－没有办法创建一个只读数组。然而可以使用集合提供的ReadOnly方法，以只读方式来使用集合。该方法将返回一个集合的只读版本。

==============================================================

MAP : INFO

/**
* An object that maps keys to values. A map cannot contain duplicate keys;
* each key can map to at most one value.
*
* This interface takes the place of the <tt>Dictionary</tt> class, which
* was a totally abstract class rather than an interface.
*
* The <tt>Map</tt> interface provides three collection views, which
* allow a map's contents to be viewed as a set of keys, collection of values,
* or set of key-value mappings. The order of a map is defined as
* the order in which the iterators on the map's collection views return their
* elements. Some map implementations, like the <tt>TreeMap</tt> class, make
* specific guarantees as to their order; others, like the <tt>HashMap</tt>
* class, do not.
*
* Note: great care must be exercised if mutable objects are used as map
* keys. The behavior of a map is not specified if the value of an object is
* changed in a manner that affects <tt>equals</tt> comparisons while the
* object is a key in the map. A special case of this prohibition is that it
* is not permissible for a map to contain itself as a key. While it is
* permissible for a map to contain itself as a value, extreme caution is
* advised: the <tt>equals</tt> and <tt>hashCode</tt> methods are no longer
* well defined on such a map.
*
* All general-purpose map implementation classes should provide two
* "standard" constructors: a void (no arguments) constructor which creates an
* empty map, and a constructor with a single argument of type <tt>Map</tt>,
* which creates a new map with the same key-value mappings as its argument.
* In effect, the latter constructor allows the user to copy any map,
* producing an equivalent map of the desired class. There is no way to
* enforce this recommendation (as interfaces cannot contain constructors) but
* all of the general-purpose map implementations in the JDK comply.
*
* The "destructive" methods contained in this interface, that is, the
* methods that modify the map on which they operate, are specified to throw
* <tt>UnsupportedOperationException</tt> if this map does not support the
* operation. If this is the case, these methods may, but are not required
* to, throw an <tt>UnsupportedOperationException</tt> if the invocation would
* have no effect on the map. For example, invoking the {@link #putAll(Map)}
* method on an unmodifiable map may, but is not required to, throw the
* exception if the map whose mappings are to be "superimposed" is empty.
*
* Some map implementations have restrictions on the keys and values they
* may contain. For example, some implementations prohibit null keys and
* values, and some have restrictions on the types of their keys. Attempting
* to insert an ineligible key or value throws an unchecked exception,
* typically <tt>NullPointerException</tt> or <tt>ClassCastException</tt>.
* Attempting to query the presence of an ineligible key or value may throw an
* exception, or it may simply return false; some implementations will exhibit
* the former behavior and some will exhibit the latter. More generally,
* attempting an operation on an ineligible key or value whose completion
* would not result in the insertion of an ineligible element into the map may
* throw an exception or it may succeed, at the option of the implementation.
* Such exceptions are marked as "optional" in the specification for this
* interface.
*
* Many methods in Collections Framework interfaces are defined
* in terms of the {@link Object#equals(Object) equals} method. For
* example, the specification for the {@link #containsKey(Object)
* containsKey(Object key)} method says: "returns <tt>true</tt> if and
* only if this map contains a mapping for a key <tt>k</tt> such that
* <tt>(key==null ? k==null : key.equals(k))</tt>." This specification should
* not be construed to imply that invoking <tt>Map.containsKey</tt>
* with a non-null argument <tt>key</tt> will cause <tt>key.equals(k)</tt> to
* be invoked for any key <tt>k</tt>. Implementations are free to
* implement optimizations whereby the <tt>equals</tt> invocation is avoided,
* for example, by first comparing the hash codes of the two keys. (The
* {@link Object#hashCode()} specification guarantees that two objects with
* unequal hash codes cannot be equal.) More generally, implementations of
* the various Collections Framework interfaces are free to take advantage of
* the specified behavior of underlying {@link Object} methods wherever the
* implementor deems it appropriate.
*
* Some map operations which perform recursive traversal of the map may fail
* with an exception for self-referential instances where the map directly or
* indirectly contains itself. This includes the {@code clone()},
* {@code equals()}, {@code hashCode()} and {@code toString()} methods.
* Implementations may optionally handle the self-referential scenario, however
* most current implementations do not do so.
*
* This interface is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*
* @author Josh Bloch
* @see HashMap
* @see TreeMap
* @see Hashtable
* @see SortedMap
* @see Collection
* @see Set
* @since 1.2
*/

==============================================================

下面的链接是 HashMap实现原理：

HashMap的实现原理 (讲的不是很清晰)

==============================================================

HashMap 的数据结构： http://blog.csdn.net/vking_wang/article/details/14166593

look the picture:

pic 1

pic 2

HashMap put source:

1）put

疑问：如果两个key通过hash%Entry[].length得到的index相同，会不会有覆盖的危险？

　　这里HashMap里面用到链式数据结构的一个概念。上面我们提到过Entry类里面有一个next属性，作用是指向下一个Entry。打个比方，第一个键值对A进来，通过计算其key的hash得到的index=0，记做:Entry[0] = A。一会后又进来一个键值对B，通过计算其index也等于0，现在怎么办？HashMap会这样做:B.next = A,Entry[0] = B,如果又进来C,index也等于0,那么C.next = B,Entry[0] = C；这样我们发现index=0的地方其实存取了A,B,C三个键值对,他们通过next这个属性链接在一起。所以疑问不用担心。也就是说数组中存储的是最后插入的元素。到这里为止，HashMap的大致实现，我们应该已经清楚了。

/**
* Maps the specified key to the specified value.
*
* @param key
* the key.
* @param value
* the value.
* @return the value of any previous mapping with the specified key or
* {@code null} if there was no such mapping.
*/
@Override public V put(K key, V value) {
if (key == null) {
return putValueForNullKey(value);
}

int hash = Collections.secondaryHash(key);
HashMapEntry<K, V>[] tab = table;
int index = hash & (tab.length - 1);
//1 如果key在链表中已存在，则替换为新value
// 看了好一会儿，终于看明白了。这一过程是，当你新添加的元素不仅 HashCode 一样。而且Key 也是一样的时候，就是有重复的数据的时候需要讲老的数据替换，保证唯一性！
for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
if (e.hash == hash && key.equals(e.key)) {
preModify(e);
V oldValue = e.value;
e.value = value;
return oldValue;
}
}

// No entry for (non-null) key is present; create one
modCount++;
if (size++ > threshold) {
//3 当你的 table[] 内存太小的时候就会调用该方法分配更大的内存空间！
tab = doubleCapacity();
index = hash & (tab.length - 1);
}
//2 添加实体（其实是 zai'xiaindex替换下来的老的实体）
// 这里才是真正的加入table[] 数据。当你的HashCode 一样的时候，就调用这个方法。将老的数据的位置用新的数据替换，老的数据后移一位！
addNewEntry(key, value, hash, index);
return null;
}

/**
* Computes a hash code and applies a supplemental hash function to defend
* against poor quality hash functions. This is critical because HashMap
* uses power-of-two length hash tables, that otherwise encounter collisions
* for hash codes that do not differ in lower or upper bits.
* Routine taken from java.util.concurrent.ConcurrentHashMap.hash(int).
* @hide
*/
public static int secondaryHash(Object key) {
return secondaryHash(key.hashCode());
}

2 let me see:

/**
* Creates a new entry for the given key, value, hash, and index and
* inserts it into the hash table. This method is called by put
* (and indirectly, putAll), and overridden by LinkedHashMap. The hash
* must incorporate the secondary hash function.
*/
void addNewEntry(K key, V value, int hash, int index) {
table[index] = new HashMapEntry<K, V>(key, value, hash, table[index]);
}

3 当你的 table[] 内存太小的时候就会调用该方法分配更大的内存空间！
/**
* Doubles the capacity of the hash table. Existing entries are placed in
* the correct bucket on the enlarged table. If the current capacity is,
* MAXIMUM_CAPACITY, this method is a no-op. Returns the table, which
* will be new unless we were already at MAXIMUM_CAPACITY.
*/
private HashMapEntry<K, V>[] doubleCapacity() {
HashMapEntry<K, V>[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
return oldTable;
}
int newCapacity = oldCapacity * 2;
HashMapEntry<K, V>[] newTable = makeTable(newCapacity);
if (size == 0) {
return newTable;
}

for (int j = 0; j < oldCapacity; j++) {
/*
* Rehash the bucket using the minimum number of field writes.
* This is the most subtle and delicate code in the class.
*/
HashMapEntry<K, V> e = oldTable[j];
if (e == null) {
continue;
}
int highBit = e.hash & oldCapacity;
HashMapEntry<K, V> broken = null;
newTable[j | highBit] = e;
for (HashMapEntry<K, V> n = e.next; n != null; e = n, n = n.next) {
int nextHighBit = n.hash & oldCapacity;
if (nextHighBit != highBit) {
if (broken == null)
newTable[j | nextHighBit] = n;
else
broken.next = n;
broken = e;
highBit = nextHighBit;
}
}
if (broken != null)
broken.next = null;
}
return newTable;
}

哈希表有多种不同的实现方法

==============================================================

2. Get
获取的思想比较简单。不需要判断有没有重复的元素：
原理：
跟HashCode 获取元素在HashTable中的位置 index. 然后判断Key是不是相同，如果相同取出。否则循环遍历链表。

/**
* Returns the value of the mapping with the specified key.
*
* @param key
* the key.
* @return the value of the mapping with the specified key, or {@code null}
* if no mapping for the specified key is found.
*/
public V get(Object key) {
if (key == null) {
HashMapEntry<K, V> e = entryForNullKey;
return e == null ? null : e.value;
}

int hash = Collections.secondaryHash(key);
HashMapEntry<K, V>[] tab = table;
for (HashMapEntry<K, V> e = tab[hash & (tab.length - 1)];
e != null; e = e.next) {
K eKey = e.key;
// Key code in this line
if (eKey == key || (e.hash == hash && key.equals(eKey))) {
return e.value;
}
}
return null;
}

==============================================================

3. 解决hash冲突的办法

开放定址法（线性探测再散列，二次探测再散列，伪随机探测再散列）
再哈希法
链地址法
建立一个公共溢出区

Java中hashmap的解决办法就是采用的链地址法。

==============================================================

4 MAP http://blog.csdn.net/qq924862077/article/details/48039643

See the UML

总结：

HashMap是基于”拉链法“实现的散列表，一般用于单线程，键值都可以为空，支持Iterator（迭代器）遍历

Hashtable是基于”拉链法“实现的散列表，是线程安全的，可以用于多线程程序中。支持Iterator（迭代器）遍历和Enumeration（枚举器）两种遍历方式。

WeakHashMap也是基于”拉链法“实现的散列表，同时是弱键

TreeMap 是有序的散列表，通过红黑树来实现的，键值都不能为空。

==============================================================

5 TreeMap 的源码分析 http://liujiacai.net/blog/2015/09/04/java-treemap/

写的非常的不错，因为是我的学长哈哈

参考链接：

1 http://blog.csdn.net/speedme/article/details/22398395

2 http://blog.csdn.net/shimiso/article/details/10181801

3 http://blog.csdn.net/vking_wang/article/details/14166593

4 http://blog.csdn.net/qq924862077/article/details/48039643

5 http://liujiacai.net/blog/2015/09/04/java-treemap/

Hades

Sunday, November 1, 2015

Java 集合之旅

1）put

3. 解决hash冲突的办法

No comments:

Post a Comment