Odd's Blog: complex

Showing posts with label complex. Show all posts

Monday, February 16, 2009

Social Computing: A Snapshot

隨著Web2.0的演進以及計算媒介愈來愈容易取得，使得一些事物在近幾年正快速變化，如社交方式、經濟模式、大眾文化等，背後本質上的改變，正從以個體為中心的思維，逐漸轉移到群體間合作、互動所產生的一連串社交反應(Social Dynamics)，而社會計算(Social Computing)正是一種透過資訊技術來輔助這些社交資訊(Social Information, User-Generated Content, Consumer-Generated Media)或社交行為(Social Behaviors, Social Interaction)之收集、分析、整合、呈現的新興領域，以現有的Web服務來舉例，Facebook, Flickr, Del.icio.us, Blogger, MSN均為社交服務下的平台(可參考下圖)，或被稱作社交軟體(Social Software)，以現有的技術來舉例，協同過濾(Collaborative Filtering), 病毒式行銷(Viral Marketing), 標籤分類(Tagging), 計算社交決策(Computational Social Choice)等均為社會計算在技術上的例子。社會計算的不止於學術研究，目前軟體產業、線上遊戲、商業策略、甚至政治分析等相關領域也相繼投入。

從產業界進入社會計算應用的角度來看Social Computing，Dion Hinchcliffe在The shift to Social Computing一文提到了三個根本的原則：
1. Innovation is moving from a top-down to bottom-up model
(創新正逐漸朝下而上的方式演進)
2. Value is shifting from ownership to experiences
(價值不在只是擁有，更重要的是體驗)
3. Power is moving from institutions to communities
(力量將來自於社群，遠過組織本身)

從學術研究的角度來看，Social Computing跨足計算科學與社會科學，Social Computing的概觀可從下面這張圖來看。其基礎主要建構於：社會心理學(Social Psychology)、人機互動(Human-Computer Interaction)、社群網路分析(Social Network Analysis)、人類學(Anthropology)、組織理論(Organization Theroy)、社會學(Sociology)與計算理論(Computing Theory)。資訊技術與社會理論對於社會計算可說是相輔相成，一方面得思考如何將上述人文學科整合成socio-technical system，另一方面得思考對於現有的技術如Database, Multimedia, Software Engineering, Wireless該如何設計以達到社交資訊處理(Social Information Processing)的目的：1) 探勘群體行為中的知識, 2) 群體智慧作為決策, 3) Social Web的形成，這是「分析」的觀點。另一種觀點為「合成」，偏向Agent-based Modeling、或是人工生命（可參考紅虫寫的科普「混沌邊緣的藝術－人工生命」）。

IEEE International Conference on Social Computing

Thursday, September 13, 2007

Eight Properties of Social Network (5~8)

5. Mixing Patterns
When we explored the networks to the details, one question was arisen: which nodes pair up with others? Recall in most networks, there exists a some different types of nodes, the probability of connection between nodes just have tight relation with the nodes' type. Take Internet as an example, the structure of the network reflects the existence of three broad categories of nodes: high-level connectivity providers who run the Internet backbone and trunk lines, consumers who are end users of Internet service, and ISPs who join the two. Again there are many links between end users and ISPs, while few between ISPs and other ISPs, or between backbone operators and end users.

In terms of social network, this kind of selective linking is called assortative mixing, or homophily. A typical example of assortative mixing in social networks is mixing by race. The following table can illustrate it, which is the results from study of 1958 couples in the city of San Francisco, California. We can observe that people prefer to choose their spouses from their own race. More examples can be raised, such as mixing by age, mixing by incoming, and mixing by culture. Hence, it could say that the association would be created preferentially from those with homogeneous characters.

6. Degree correlations
In addition to that the links could be connected depend on the types of nodes, the degree is also highly correlated with the mixing. Does high-degree nodes in a network often associate with other high-degree nodes? or to low-degree ones? Needless to say, assortative mixing by degree plays a important roles networks, especially for the consideration of graph topology. Degree correlation can give rise to some interesting network structure effects.

Newman investigated assortative mixing by degree by calculating the correlation coefficient of the degrees of nodes at each end of links, as shown the following tables. It could be observed that for so-called social network, it is positive for assortatively mixed network by degree, while for information networks, technological networks, and biological networks, it appears to be disassortative. However, it is not clear to explain the result.

7. Community Structure
It is widely observed that most social network show the property of community structure, which means that groups of nodes that have a high density of links within them, with a lower density of links between groups. It could be used to explain the phenomenon that people usually cluster by their interest, education, age, and so on. What follows illustrates the community structure by visualization of two examples: the first is the friendship network of children in a US school taken from a study by Moody, and the second is the science citation network taken from Eigenfactor.org. It is very obvious to see the effect of community.

8. Centrality
a) Degree centrality: nodes with high degree.
b) Closeness centrality: nodes that have low distance to all other nodes.
c) Betweenness centrality: nodes that occur on more shortest-paths.
It can be illustrated via the following picture.

Eight Properties of Social Network (1~4)

這幾天 survey 關於 social network 相關研究後，從 M. E. J. Newman 以及 Christos Faloutsos 的論文中歸納了以下八項 social network 或者說是 complex system 的特性（我用英文寫，感覺比較有學問 XD）：

Recently, during my survey for social network and complex system from papers of M. E. J. Newman, I think it could be briefly summed up to eight properties, as follows:

1. Network diameter
The diameter of a network is the length (edges) of the longest path between any two nodes. Generally, the diameter is small, like 6 of small-world phenomenon, and it is independent of network size. The small-world effect imply the dynamics of processes taking place on networks. e.g. fast spread of information for the network, like rumor. This effect also underlies some well-known parlor games, particularly the calculation of Erdős number and Bacon Number.

2. Network Transitivity or Clustering
The concept was derived from the behavior of random graph. The most comprehensible instance is that the friend of your friend is likely to be your friend. In terms of graph topology, transitivity means the presence of a heightened number of triangles in the network - sets of three nodes each of which is connected to each of the others. It can be measured through the definition of clustering coefficient C (with an example):

3. Degree distribution
The degree of a node in a network is the number of links connected to that node. We can define a probability pk to be the fraction of nodes in the network that have degree k. Meanwhile, the pk could be considered as an uniform probability for choosing nodes with degree k at random. Plot pk (vertical axis) with degree k (horizontal axis) for some networks can form a histogram of degree distribution of nodes, as follows.

It is observed that the degrees of nodes are highly right-skewed. In other words, the distribution has a long right tail, or heavy-tailed distribution, it might be the well-known long tail effect. Alternatively, this distribution can be plotted in a logarithmic method, as follows. Marvellously, it is nearly conformed with Power law, which is the primary property of scale-free network. In contrast to the Poisson distribution of random graph (Erdős-Rényi model).

4. Network resilience
The resilience of network means the removal of its nodes, which is related to the concept of degree distributions. The function and structure of a network usually rely on its connectivity. Once some nodes are removed, the length of paths could be increase, even the network becomes disconnected. However, there are a different ways to remove the nodes. One way to remove the nodes in a network is to random removal. This approach wouldn't affect the distances between nodes almost since most nodes in a network have low degree and therefore lie on few paths between others. The other way to remove nodes from networks is targeted at high-degree nodes. Needless to say, it will have tremendous effects on the structure of a network. And the distance would increase acutely with the fraction of nodes removed. The following picture can explan it! (Note blue for random removal and red for high-degree removal.)

M. E. J. Newman

我是在 NCCU Summer School on Computational Social Sciences 2007 上準教授 Martin Rosvall 的 Lecture 中得知這號人物的。這幾天拜讀了他的代表著作之一: The structure and function of complex networks，之後到 google 上搜尋他，才知道這個人不簡單！在複雜網路的研究領域中頗具貢獻。M. E. J. Newman 的網頁上是寫他叫做 Mark Newman，我也不知道為什麼許多論文上他都被稱為 "M. E. J. Newman" ...

Newman 是 University of Michigan 物理系的教授，同時是 University of Michigan 複雜系統研究中心（Center for Study of Complex System）的主要成員之一，這個研究中心是由學校中許多領域的教授所組成，目的在於從 nonlinear, dynamical, adaptive 的角度切入研究 complex system。更值得一提的是，Newman 竟然也是 Santa Fe Institute 的成員之一，可以他對於 complex system 具有相當程度的研究！

Newman 對複雜網路的研究主要著重於分析 network 的結構與功能，尤其針對 social network 與 information network，研究方式藉由結合經驗分析法（empirical analysis）與電腦模擬（computer simulation）。以下為他的代表著作 (都是跟該領域中其他大師級人物的coauthorship @@)：

Random graphs with arbitrary degree distributions and their applications, M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys. Rev. E 64, 026118 (2001).
The structure and function of complex networks, M. E. J. Newman, SIAM Review 45, 167-256 (2003).
The Structure and Dynamics of Networks, M. E. J. Newman, A.-L. Barabási, and D. J. Watts, Princeton University Press (2006).
Identity and search in social networks, D. J. Watts, P. S. Dodds, and M. E. J. Newman, Science 296, 1302-1305 (2002).

Tuesday, September 11, 2007

Networks in The Real World

Complex network 研究之所以被發掘而且迅速地廣為被研究，最重要的時間點在 1998 年，Watts D. J. and Strogatz S. H. 在 Nature 雜誌上發表一篇名為 "Collective dynamics of 'small-world' networks" (Nature 393, 440-442)，該篇論文著重於分析複雜網路的特性，並嘗試將它以數學模型來解釋。

2003 年，Mark Newman 進一步將真實世界的 Complex network 歸納為四大類：
(M. E. J. Newman, The structure and function of complex networks, SIAM Review 2003)

Social Network
Information Network
Technological Network
Biological Network

[Social Network]
以人，或者由人所組成團體，為基本單位，藉由特定基本單位間各種互動關係建立彼此的連結，所形成的網路，稱之。典型的幾個例子：film actors, co-authorship, sexual contacts, email messages, blogosphere。Social Network 最為著名的體現是 Stanley Milgram 的 small-world experiments，得出世界上任何兩個人之間，至多存在五個人，便可建立關連，即"六度分離" (Six degree of separation)。在 Stanley Milgram 的實驗後，科學家發現傳統 social network analysis 的 sample size 都很小，加上資料來源可能受到人為主觀意識的影響，鑒於此，又時逢 WWW 迅速發展，因此資料來源轉為 affiliation network，也就是透過眾人的力量所組織成的 knowledge base。舉兩個social network的例子來說明：

Science coauthorship (Node: author; Link: write paper together)

Blogosphere (Node: blog (color means host); Link: (a-)reciprocal link (thickness)

[Information Network]
Also called "Knowledge Network"，典型的幾個例子為：citation network, WWW, word co-occurrence, preference network (bipartite graph)

World Wide Web (Node: webpage; Link: hyperlink)

[Technological Network]
Man-made networks designed for distribution of some commodities or resources. 典型的幾個例子為：Internet, power grid, train routes, P2P network, electronic circuits

Power grid (Node: generator, substations; Link: high-voltage power transmission lines [Note: Line thickness and color indicate the voltage level])

Internet (Node: computer, router; Link: physical lines)

[Biological Network]
Biological System，典型的幾個例子為：metabolic network, protein interactions, food web, neural network

Food web (Node: trophic species; Link: trophic interactions)

metabolic network, protein interactions

My List