zeuux-universe  - 讨论区

标题:[zeuux-universe] “MapReduce:一个巨大的倒退”

2008年07月09日 星期三 12:14

Xia Qingran qingran在zeuux.org
星期三 七月 9 12:14:38 CST 2008

An HTML attachment was scrubbed...
URL: <http://www.zeuux.org/pipermail/zeuux-universe/attachments/20080709/1115c828/attachment-0001.html>

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

2008年07月09日 星期三 12:19

Zoom.Quiet zoom.quiet在gmail.com
星期三 七月 9 12:19:57 CST 2008

好文章,但是!为什么使用 html 的附件?
对在线查阅是多么大的阻碍哪!

2008/7/9 Xia Qingran <qingran在zeuux.org>:
> 转一篇文章"MapReduce:一个巨大的倒退":
> 转自:
> http://www.pgsqldb.org/mwiki/index.php/MapReduce:_%E4%B8%80%E4%B8%AA%E5%B7%A8%E5%A4%A7%E7%9A%84%E5%80%92%E9%80%80
> http://www.pgsqldb.org/mwiki/index.php/MapReduce_II
>
> MapReduce: 一个巨大的倒退
>
> From PostgreSQL 中文维基, PostgreSQL 中文站, PostgreSQL 中国社区, PostgreSQL Chinese
> community
>
> Jump to: navigation, search
>
> 目录
>
> [隐藏]
>
> 1 前言
> 2 MapReduce: A major step backwards/MapReduce: 一个巨大的倒退
> 3 What is MapReduce?/何谓MapReduce?
>
> 3.1 MapReduce is a step backwards in database access
> 3.2 MapReduce is a poor implementation
> 3.3 MapReduce is not novel
> 3.4 MapReduce is missing features
> 3.5 MapReduce is incompatible with the DBMS tools
>
> 4 In Summary
> 5 References
>
> [编辑] 前言
>
> databasecolumn 的数据库大牛们(其中包括PostgreSQL的最初伯克利领导:Michael
> Stonebraker)最近写了一篇评论当前如日中天的MapReduce 技术的文章,引发剧烈的讨论。我抽空在这儿翻译一些,一起学习。
>
> 译者注:这种 Tanenbaum vs. Linus 式的讨论自然会导致非常热烈的争辩。但是老实说,从 Tanenbaum vs. Linus
> 的辩论历史发展来看,Linux是越来越多地学习并以不同方式应用了 Tanenbaum 等 OS 研究者的经验(而不是背弃); 所以 MapReduce
> vs. DBMS 的讨论,希望也能给予后来者更多的启迪,而不是对立。
>
> 原文见:
>
> http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html
>
> [编 辑] MapReduce: A major step backwards/MapReduce: 一个巨大的倒退
>
> 注:作者是 David J. DeWitt 和 Michael Stonebraker
>
> On January 8, a Database Column reader asked for our views on new
> distributed database research efforts, and we'll begin here with our views
> on MapReduce. This is a good time to discuss it, since the recent trade
> press has been filled with news of the revolution of so-called "cloud
> computing." This paradigm entails harnessing large numbers of (low-end)
> processors working in parallel to solve a computing problem. In effect, this
> suggests constructing a data center by lining up a large number of "jelly
> beans" rather than utilizing a much smaller number of high-end servers.
>
> 1月8日,一位Database
> Column的读者询问我们对各种新的分布式数据库研究工作有何看法,我们就从MapReduce谈起吧。现在讨论MapReduce恰逢其时,因为最近
> 商业媒体充斥着所谓"云计算(cloud
> computing)"革命的新闻。这种计算方式通过大量(低端的)并行工作的处理器来解决计算问题。实际上,就是用大量便宜货(原文是jelly
> beans)代替数量小得多的高端服务器来构造数据中心。
>
> For example, IBM and Google have announced plans to make a 1,000 processor
> cluster available to a few select universities to teach students how to
> program such clusters using a software tool called MapReduce [1]. Berkeley
> has gone so far as to plan on teaching their freshman how to program using
> the MapReduce framework.
>
> 例如,IBM和Google已经宣布,计划构建一个1000处理器的集群,开放给几个大学,教授学生使用一种名为MapReduce
> [1]的软件工具对这种集群编程。加州大学伯克利分校甚至计划教一年级新生如何使用MapReduce框架编程。
>
> As both educators and researchers, we are amazed at the hype that the
> MapReduce proponents have spread about how it represents a paradigm shift in
> the development of scalable, data-intensive applications. MapReduce may be a
> good idea for writing certain types of general-purpose computations, but to
> the database community, it is:
>
> 我们都既是教育者也是研究人员,MapReduce支持者们大肆宣传它代表了可伸缩、数据密集计算发展中的一次范型转移,对此我们非常惊讶。
> MapReduce就编写某些类型的通用计算程序而言,可能是个不错的想法,但是从数据库界看来,并非如此:
>
> A giant step backward in the programming paradigm for large-scale data
> intensive applications
> A sub-optimal implementation, in that it uses brute force instead of
> indexing
> Not novel at all -- it represents a specific implementation of well known
> techniques developed nearly 25 years ago
> Missing most of the features that are routinely included in current DBMS
> Incompatible with all of the tools DBMS users have come to depend on
>
> 在大规模的数据密集应用的编程领域,它是一个巨大的倒退
> 它是一个非最优的实现,使用了蛮力而非索引
> 它一点也不新颖——代表了一种25年前已经开发得非常完善的技术
> 它缺乏当前DBMS基本都拥有的大多数特性
> 它和DBMS用户已经依赖的所有工具都不兼容
>
> First, we will briefly discuss what MapReduce is; then we will go into more
> detail about our five reactions listed above.
>
> 首先,我们简要地讨论一下MapReduce是什么,然后更详细地阐述上面列出的5点看法。
>
> [编辑] What is MapReduce?/何谓MapReduce?
>
> The basic idea of MapReduce is straightforward. It consists of two programs
> that the user writes called map and reduce plus a framework for executing a
> possibly large number of instances of each program on a compute cluster.
>
> MapReduce的基本思想很直接。它包括用户写的两个程序:map和reduce,以及一个framework,在一个计算机簇中执行大量的每
> 个程序的实例。
>
> The map program reads a set of "records" from an input file, does any
> desired filtering and/or transformations, and then outputs a set of records
> of the form (key, data). As the map program produces output records, a
> "split" function partitions the records into M disjoint buckets by applying
> a function to the key of each output record. This split function is
> typically a hash function, though any deterministic function will suffice.
> When a bucket fills, it is written to disk. The map program terminates with
> M output files, one for each bucket.
>
> map程序从输入文件中读取"records"的集合,执行任何需要的过滤或者转换,并且以(key,data)的形式输出
> records的集合。当map程序产生输出记录,"split"函数对每一个输出的记录的key应用一个函数,将records分割为M个不连续的块
> (buckets)。这个split函数有可能是一个hash函数,而其他确定的函数也是可用的。当一个块被写满后,将被写道磁盘上。然后map程序终
> 止,输出M个文件,每一个代表一个块(bucket)。
>
> In general, there are multiple instances of the map program running on
> different nodes of a compute cluster. Each map instance is given a distinct
> portion of the input file by the MapReduce scheduler to process. If N nodes
> participate in the map phase, then there are M files on disk storage at each
> of N nodes, for a total of N * M files; Fi,j, 1 ≤ i ≤ N, 1 ≤ j ≤ M.
>
> 通常情况下,map程序的多个实例持续运行在compute cluster的不同节点上。每一个map实例都被MapReduce
> scheduler分配了input
> file的不同部分,然后执行。如果有N个节点参与到map阶段,那么在这N个节点的磁盘储存都有M个文件,总共有N*M个文件。
>
> The key thing to observe is that all map instances use the same hash
> function. Hence, all output records with the same hash value will be in
> corresponding output files.
>
> 值得注意的地方是,所有的map实例都使用同样的hash函数。因此,有相同hash值的所有output record会出被放到相应的输出文件中。
>
> The second phase of a MapReduce job executes M instances of the reduce
> program, Rj, 1 ≤ j ≤ M. The input for each reduce instance Rj consists of
> the files Fi,j, 1 ≤ i ≤ N. Again notice that all output records from the map
> phase with the same hash value will be consumed by the same reduce instance
> -- no matter which map instance produced them. After being collected by the
> map-reduce framework, the input records to a reduce instance are grouped on
> their keys (by sorting or hashing) and feed to the reduce program. Like the
> map program, the reduce program is an arbitrary computation in a
> general-purpose language. Hence, it can do anything it wants with its
> records. For example, it might compute some additional function over other
> data fields in the record. Each reduce instance can write records to an
> output file, which forms part of the "answer" to a MapReduce computation.
>
> MapReduce的第二个阶段执行M个reduce程序的实例, Rj, 1 <= j <= M. 每一个reduce实例的输入是Rj,包含文件Fi,j,
> 1<= i <= N. 注意,每一个来自map阶段的output record,含有相同的hash值的record将会被相同的reduce实例处理 --
> 不论是哪一个map实例产生的数据。在map-reduce架构处理过后,input
> records将会被以他们的keys来分组(以排序或者哈希的方式),到一个reduce实例然后给reduce程序处理。和map程序一
> 样,reduce程序是任意计算言表示的。因此,它可以对它的records做任何想做事情。例如,可以添加一些额外的函数,来计算record的其他 data
> field。每一个reduce实例可以将records写到输出文件中,组成MapReduce计算的"answer"的一部分。
>
> To draw an analogy to SQL, map is like the group-by clause of an aggregate
> query. Reduce is analogous to the aggregate function (e.g., average) that is
> computed over all the rows with the same group-by attribute.
>
> 和SQL可以做对比的是,map程序和聚集查询中的 group-by
> 语句相似。Reduce函数和聚集函数(例如,average,求平均)相似,在所有的有相同group-by的属性的列上计算。
>
> We now turn to the five concerns we have with this computing paradigm.
>
> 现在来谈一谈我们对这种计算方式的5点看法。
>
> [编 辑] MapReduce is a step backwards in database access
>
> As a data processing paradigm, MapReduce represents a giant step backwards.
> The database community has learned the following three lessons from the 40
> years that have unfolded since IBM first released IMS in 1968.
>
> Schemas are good.
> Separation of the schema from the application is good.
> High-level access languages are good.
>
> Schemas是有益的。
> 将schema和程序分开处理是有益的。
> High-level存取语言是有益的。
>
> MapReduce has learned none of these lessons and represents a throw back to
> the 1960s, before modern DBMSs were invented.
>
> MapReduce没有学到任何一条,并且倒退回了60年代,倒退回了现代数据库管理系统发明以前的时代。
>
> The DBMS community learned the importance of schemas, whereby the fields and
> their data types are recorded in storage. More importantly, the run-time
> system of the DBMS can ensure that input records obey this schema. This is
> the best way to keep an application from adding "garbage" to a data set.
> MapReduce has no such functionality, and there are no controls to keep
> garbage out of its data sets. A corrupted MapReduce dataset can actually
> silently break all the MapReduce applications that use that dataset.
>
> DBMS社区懂得schemas的重要性,凭借fields和他们的数据类型记录在储存中。更重要的,运行状态的DBMS系统可以确定输
> 入的记录都遵循这个schema。这是最佳的保护程序不会添加任何垃圾信息到数据集中。MapReduce没有任何这样的功能,没有任何控制数据集的预防
> 垃圾数据机制。一个损坏的MapReduce数据集事实上可以无声无息的破坏所有使用这个数据集的MapReduce程序。
>
> It is also crucial to separate the schema from the application program. If a
> programmer wants to write a new application against a data set, he or she
> must discover the record structure. In modern DBMSs, the schema is stored in
> a collection of system catalogs and can be queried (in SQL) by any user to
> uncover such structure. In contrast, when the schema does not exist or is
> buried in an application program, the programmer must discover the structure
> by an examination of the code. Not only is this a very tedious exercise, but
> also the programmer must find the source code for the application. This
> latter tedium is forced onto every MapReduce programmer, since there are no
> system catalogs recording the structure of records -- if any such structure
> exists.
>
> 将schema和程序分开也非常重要。如果一个程序员想要对一个数据集写一个新程序,他必须知道数据集的结构(record
> structure)。现代DBMS系统中,shcema储存在系统目录中,并且可以被任意用户查询(使用SQL)它的结构。相反的,如果schema不
> 存在或者存在于程序中,程序员必须检查程序的代码来获得数据的结构。这不仅是一个单调枯燥的尝试,而且程序员必须能够找到先前程序的source
> code。每一个MapReduce程序员都必须承受后者的乏味,因为没有系统目录用来储存records的结构 -- 就算这些结构存在。
>
> During the 1970s the DBMS community engaged in a "great debate" between the
> relational advocates and the Codasyl advocates. One of the key issues was
> whether a DBMS access program should be written:
>
> By stating what you want - rather than presenting an algorithm for how to
> get it (relational view)
> By presenting an algorithm for data access (Codasyl view)
>
> 70年代DBMS社区,在关系型数据库支持者和Codasys型数据库支持者之间发有一次"大讨论"。一个重点议题就是是否DBMS存取程序应该写 入:
>
> 直接开始你想要的 -- 而不是展示一个算法,解释如何工作的。 (关系型数据库的观点)
> 展示数据存取的算法。(Codasyl 的观点)
>
> The result is now ancient history, but the entire world saw the value of
> high-level languages and relational systems prevailed. Programs in
> high-level languages are easier to write, easier to modify, and easier for a
> new person to understand. Codasyl was rightly criticized for being "the
> assembly language of DBMS access." A MapReduce programmer is analogous to a
> Codasyl programmer -- he or she is writing in a low-level language
> performing low-level record manipulation. Nobody advocates returning to
> assembly language; similarly nobody should be forced to program in
> MapReduce.
>
> MapReduce advocates might counter this argument by claiming that the
> datasets they are targeting have no schema. We dismiss this assertion. In
> extracting a key from the input data set, the map function is relying on the
> existence of at least one data field in each input record. The same holds
> for a reduce function that computes some value from the records it receives
> to process.
>
> Writing MapReduce applications on top of Google's BigTable (or Hadoop's
> HBase) does not really change the situation significantly. By using a
> self-describing tuple format (row key, column name, {values}) different
> tuples within the same table can actually have different schemas. In
> addition, BigTable and HBase do not provide logical independence, for
> example with a view mechanism. Views significantly simplify keeping
> applications running when the logical schema changes.
>
> [编辑] MapReduce is a poor implementation
>
> 2. MapReduce是一个糟糕的实现
>
> All modern DBMSs use hash or B-tree indexes to accelerate access to data. If
> one is looking for a subset of the records (e.g., those employees with a
> salary of 10,000 or those in the shoe department), then one can often use an
> index to advantage to cut down the scope of the search by one to two orders
> of magnitude. In addition, there is a query optimizer to decide whether to
> use an index or perform a brute-force sequential search.
>
> 所有现代DBMS都使用散列或者B树索引加速数据存取。如果要寻找记录的某个子集(比如薪水为10000的雇员或者鞋部的雇员),经常可以使用索引
> 有效地将搜索范围缩小一到两个数量级。而且,还有查询优化器来确定是使用索引还是执行蛮力顺序搜索。
>
> MapReduce has no indexes and therefore has only brute force as a processing
> option. It will be creamed whenever an index is the better access mechanism.
>
> MapReduce没有索引,因此处理时只有蛮力一种选择。在索引是更好的存取机制时,MapReduce将劣势尽显。
>
> One could argue that value of MapReduce is automatically providing parallel
> execution on a grid of computers. This feature was explored by the DBMS
> research community in the 1980s, and multiple prototypes were built
> including Gamma [2,3], Bubba [4], and Grace [5]. Commercialization of these
> ideas occurred in the late 1980s with systems such as Teradata.
>
> 有人可能会说,MapReduce的价值在于在计算机网格上自动地提供并行执行。这种特性数据库研究界在上世纪80年代就已经探讨过
> 了,而且构建了许多原型,包括 Gamma [2,3], Bubba [4], 和 Grace
> [5]。而Teradata这样的系统早在80年代晚期,就将这些想法商业化了。
>
> In summary to this first point, there have been high-performance,
> commercial, grid-oriented SQL engines (with schemas and indexing) for the
> past 20 years. MapReduce does not fare well when compared with such systems.
>
> There are also some lower-level implementation issues with MapReduce,
> specifically skew and data interchange.
>
> One factor that MapReduce advocates seem to have overlooked is the issue of
> skew. As described in "Parallel Database System: The Future of High
> Performance Database Systems," [6] skew is a huge impediment to achieving
> successful scale-up in parallel query systems. The problem occurs in the map
> phase when there is wide variance in the distribution of records with the
> same key. This variance, in turn, causes some reduce instances to take much
> longer to run than others, resulting in the execution time for the
> computation being the running time of the slowest reduce instance. The
> parallel database community has studied this problem extensively and has
> developed solutions that the MapReduce community might want to adopt.
>
> There is a second serious performance problem that gets glossed over by the
> MapReduce proponents. Recall that each of the N map instances produces M
> output files -- each destined for a different reduce instance. These files
> are written to a disk local to the computer used to run the map instance. If
> N is 1,000 and M is 500, the map phase produces 500,000 local files. When
> the reduce phase starts, each of the 500 reduce instances needs to read its
> 1,000 input files and must use a protocol like FTP to "pull" each of its
> input files from the nodes on which the map instances were run. With 100s of
> reduce instances running simultaneously, it is inevitable that two or more
> reduce instances will attempt to read their input files from the same map
> node simultaneously -- inducing large numbers of disk seeks and slowing the
> effective disk transfer rate by more than a factor of 20. This is why
> parallel database systems do not materialize their split files and use push
> (to sockets) instead of pull. Since much of the excellent fault-tolerance
> that MapReduce obtains depends on materializing its split files, it is not
> clear whether the MapReduce framework could be successfully modified to use
> the push paradigm instead.
>
> Given the experimental evaluations to date, we have serious doubts about how
> well MapReduce applications can scale. Moreover, the MapReduce implementers
> would do well to study the last 25 years of parallel DBMS research
> literature.
>
> [编辑] MapReduce is not novel
>
> The MapReduce community seems to feel that they have discovered an entirely
> new paradigm for processing large data sets. In actuality, the techniques
> employed by MapReduce are more than 20 years old. The idea of partitioning a
> large data set into smaller partitions was first proposed in "Application of
> Hash to Data Base Machine and Its Architecture" [11] as the basis for a new
> type of join algorithm. In "Multiprocessor Hash-Based Join Algorithms," [7],
> Gerber demonstrated how Kitsuregawa's techniques could be extended to
> execute joins in parallel on a shared-nothing [8] cluster using a
> combination of partitioned tables, partitioned execution, and hash based
> splitting. DeWitt [2] showed how these techniques could be adopted to
> execute aggregates with and without group by clauses in parallel. DeWitt and
> Gray [6] described parallel database systems and how they process queries.
> Shatdal and Naughton [9] explored alternative strategies for executing
> aggregates in parallel.
>
> Teradata has been selling a commercial DBMS utilizing all of these
> techniques for more than 20 years; exactly the techniques that the MapReduce
> crowd claims to have invented.
>
> While MapReduce advocates will undoubtedly assert that being able to write
> MapReduce functions is what differentiates their software from a parallel
> SQL implementation, we would remind them that POSTGRES supported
> user-defined functions and user-defined aggregates in the mid 1980s.
> Essentially, all modern database systems have provided such functionality
> for quite a while, starting with the Illustra engine around 1995.
>
> [编辑] MapReduce is missing features
>
> All of the following features are routinely provided by modern DBMSs, and
> all are missing from MapReduce:
>
> Bulk loader -- to transform input data in files into a desired format and
> load it into a DBMS
> Indexing -- as noted above
> Updates -- to change the data in the data base
> Transactions -- to support parallel update and recovery from failures during
> update
> Integrity constraints -- to help keep garbage out of the data base
> Referential integrity -- again, to help keep garbage out of the data base
> Views -- so the schema can change without having to rewrite the application
> program
>
> In summary, MapReduce provides only a sliver of the functionality found in
> modern DBMSs.
>
> [编辑] MapReduce is incompatible with the DBMS tools
>
> A modern SQL DBMS has available all of the following classes of tools:
>
> Report writers (e.g., Crystal reports) to prepare reports for human
> visualization
> Business intelligence tools (e.g., Business Objects or Cognos) to enable
> ad-hoc querying of large data warehouses
> Data mining tools (e.g., Oracle Data Mining or IBM DB2 Intelligent Miner) to
> allow a user to discover structure in large data sets
> Replication tools (e.g., Golden Gate) to allow a user to replicate data from
> on DBMS to another
> Database design tools (e.g., Embarcadero) to assist the user in constructing
> a data base.
>
> MapReduce cannot use these tools and has none of its own. Until it becomes
> SQL-compatible or until someone writes all of these tools, MapReduce will
> remain very difficult to use in an end-to-end task.
>
> [编辑] In Summary
>
> It is exciting to see a much larger community engaged in the design and
> implementation of scalable query processing techniques. We, however, assert
> that they should not overlook the lessons of more than 40 years of database
> technology -- in particular the many advantages that a data model, physical
> and logical data independence, and a declarative query language, such as
> SQL, bring to the design, implementation, and maintenance of application
> programs. Moreover, computer science communities tend to be insular and do
> not read the literature of other communities. We would encourage the wider
> community to examine the parallel DBMS literature of the last 25 years.
> Last, before MapReduce can measure up to modern DBMSs, there is a large
> collection of unmet features and required tools that must be added.
>
> 看到规模大得多的社区加入可伸缩的查询处理技术的设计与实现,非常令人兴奋。但是,我们要强调,他们不应该忽视数据库技术40多年来的教
> 训,尤其是数据库技术中数据模型、物理和逻辑数据独立性、像SQL这样的声明性查询语言等等,可以为应用程序的设计、实现和维护带来的诸多好处。而且,计
> 算机科学界往往喜欢自行其是,不理会其他学科的文献。我们希望更多人来一起研究过去25年的并行DBMS文献。MapReduce要达到能够与现代
> DBMS相提并论的水平,还需要开发大量特性和工具。
>
> We fully understand that database systems are not without their problems.
> The database community recognizes that database systems are too "hard" to
> use and is working to solve this problem. The database community can also
> learn something valuable from the excellent fault-tolerance that MapReduce
> provides its applications. Finally we note that some database researchers
> are beginning to explore using the MapReduce framework as the basis for
> building scalable database systems. The Pig[10] project at Yahoo! Research
> is one such effort.
>
> 我们完全理解数据库系统也有自己的问题。数据库界清楚地认识到,现在数据库系统还太"难"使用,而且正在解决这一问题。数据库界也从
> MapReduce为其应用程序提供的出色的容错上学到了有价值的东西。最后,我们注意到,一些数据库研究人员也开始研究使用MapReduce框架作为
> 构建可伸缩数据库系统的基础。雅虎研究院的Pig[10]项目就是其中之一。
>
> [编辑] References
>
> [1] "MapReduce: Simplified Data Processing on Large Clusters," Jeff Dean and
> Sanjay Ghemawat, Proceedings of the 2004 OSDI Conference, 2004.
>
> [2] "The Gamma Database Machine Project," DeWitt, et. al., IEEE Transactions
> on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.
>
> [4] "Gamma - A High Performance Dataflow Database Machine," DeWitt, D, R.
> Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna, Proceedings
> of the 1986 VLDB Conference, 1986.
>
> [5] "Prototyping Bubba, A Highly Parallel Database System," Boral, et. al.,
> IEEE Transactions on Knowledge and Data Engineering,Vol. 2, No. 1, March
> 1990.
>
> [6] "Parallel Database System: The Future of High Performance Database
> Systems," David J. DeWitt and Jim Gray, CACM, Vol. 35, No. 6, June 1992.
>
> [7] "Multiprocessor Hash-Based Join Algorithms," David J. DeWitt and Robert
> H. Gerber, Proceedings of the 1985 VLDB Conference, 1985.
>
> [8] "The Case for Shared-Nothing," Michael Stonebraker, Data Engineering
> Bulletin, Vol. 9, No. 1, 1986.
>
> [9] "Adaptive Parallel Aggregation Algorithms," Ambuj Shatdal and Jeffrey F.
> Naughton, Proceedings of the 1995 SIGMOD Conference, 1995.
>
> [10] "Pig", Chris Olston, http://research.yahoo.com/project/90
>
> [11] "Application of Hash to Data Base Machine and Its Architecture," Masaru
> Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka, New Generation Comput. 1(1):
> 63-74 (1983)
>
> MapReduce II
>
> From PostgreSQL 中文维基, PostgreSQL 中文站, PostgreSQL 中国社区, PostgreSQL Chinese
> community
>
> Jump to: navigation, search
>
> MapReduce II
>
> [Note: Although the system attributes this post to a single author, it was
> written by David J. DeWitt and Michael Stonebraker]
>
> 作者:David J.DeWitt 和 Michael Stonebraker
>
> Last week's MapReduce post attracted tens of thousands of readers and
> generated many comments, almost all of them attacking our critique. Just to
> let you know, we don't hold a personal grudge against MapReduce. MapReduce
> didn't kill our dog, steal our car, or try and date our daughters.
>
> Our motivations for writing about MapReduce stem from MapReduce being
> increasingly seen as the most advanced and/or only way to analyze massive
> datasets. Advocates promote the tool without seemingly paying attention to
> years of academic and commercial database research and real world use.
>
> The point of our initial post was to say that there are striking
> similarities between MapReduce and a fairly primitive parallel database
> system. As such, MapReduce can be significantly improved by learning from
> the parallel database community.
>
> So, hold off on your comments for just a few minutes, as we will spend the
> rest of this post addressing four specific topics brought up repeatedly by
> those who commented on our previous blog:
>
> 先暂停抱怨几分钟,我们将就上一篇文章的回复中反复出现的特定专题进行一些回答:
>
> . MapReduce is not a database system, so don't judge it as one
> . MapReduce has excellent scalability; the proof is Google's use
> . MapReduce is cheap and databases are expensive
> . We are the old guard trying to defend our turf/legacy from the young turks
>
> . MapReduce 不是数据库系统,所以不要以数据库库系统来判断它
> . MapReduce 有着杰出的扩展性; 证明就是 Google 的使用
> . MapReduce 很便宜,而数据库很昂贵
> . 我们是老古董,只是想保护自己的领土/财产不受年轻一辈的侵蚀
>
> Feedback No. 1: MapReduce is not a database system, so don't judge it as one
>
> It's not that we don't understand this viewpoint. We are not claiming that
> MapReduce is a database system. What we are saying is that like a DBMS + SQL
> + analysis tools, MapReduce can be and is being used to analyze and perform
> computations on massive datasets. So we aren't judging apples and oranges.
> We are judging two approaches to analyzing massive amounts of information,
> even for less structured information.
>
> To illustrate our point, assume that you have two very large files of facts.
> The first file contains structured records of the form:
>
>    Rankings (pageURL, pageRank)
>
> Records in the second file have the form:
>
>    UserVisits (sourceIPAddr, destinationURL, date, adRevenue)
>
> Someone might ask, "What IP address generated the most ad revenue during the
> week of January 15th to the 22nd, and what was the average page rank of the
> pages visited?"
>
> This question is a little tricky to answer in MapReduce because it consumes
> two data sets rather than one, and it requires a "join" of the two datasets
> to find pairs of Ranking and UserVisit records that have matching values for
> pageURL and destinationURL. In fact, it appears to require three MapReduce
> phases, as noted below.
>
> Phase 1
>
> This phase filters UserVisits records that are outside the desired data
> range and then "joins" the qualifying records with records from the Rankings
> file.
>
> Map program: The map program scans through UserVisits and Rankings records.
> Each UserVisit record is filtered on the date range specification.
> Qualifying records are emitted with composite keys of the form
>  where T1 indicates that it is a UserVisits record.
> Rankings records are emitted with composite keys of the form 
> (T2 is a tag indicating it a Rankings record). Output records are
> repartitioned using a user-supplied partitioning function that only hashes
> on the URL portion of the composite key.
>
> Reduce Program: The input to the reduce program is a single sorted run of
> records in URL order. For each unique URL, the program splits the incoming
> records into two sets (one for Rankings records and one for UserVisits
> records) using the tag component of the composite key. To complete the join,
> reduce finds all matching pairs of records of the two sets. Output records
> are in the form of Temp1 (sourceIPAddr, pageURL, pageRank, adRevenue).
>
> The reduce program must be capable of handling the case in which one or both
> of these sets with the same URL are too large to fit into memory and must be
> materialized on disk. Since access to these sets is through an iterator, a
> straightforward implementation will result in what is termed a nested-loops
> join. This join algorithm is known to have very bad performance I/O
> characteristics as "inner" set is scanned once for each record of the
> "outer" set.
>
> Phase 2
>
> This phase computes the total ad revenue and average page rank for each
> Source IP Address.
>
> Map program: Scan Temp1 using the identity function on sourceIPAddr.
> Reduce program: The reduce program makes a linear pass over the data. For
> each sourceIPAddr, it will sum the ad-revenue and compute the average page
> rank, retaining the one with the maximum total ad revenue. Each reduce
> worker then outputs a single record of the form Temp2 (sourceIPAddr,
> total_adRevenue, average_pageRank).
>
> Phase 3
>
> Map program: The program uses a single map worker that scans Temp2 and
> outputs the record with the maximum value for total_adRevenue.
>
> We realize that portions of the processing steps described above are handled
> automatically by the MapReduce infrastructure (e.g., sorting and
> partitioning the records). Although we have not written this program, we
> estimate that the custom parts of the code (i.e., the map() and reduce()
> functions) would require substantially more code than the two fairly simple
> SQL statements to do the same:
>
> Q1
>
>    Select as Temp  sourceIPAddr, avg(pageRank) as avgPR, sum(adRevenue) as
> adTotal
>    From Rankings, UserVisits
>    where Rankings.pageURL = UserVisits.destinationURL and
>    date > "Jan 14" and date < "Jan 23"
>    Group by sourceIPAddr
>
> Q2
>
>    Select sourceIPAddr, adTotal, avgPR
>    From Temp
>    Where adTotal = max (adTotal)
>
> No matter what you think of SQL, eight lines of code is almost certainly
> easier to write and debug than the programming required for MapReduce. We
> believe that MapReduce advocates should consider the advantages that
> layering a high-level language like SQL could provide to users of MapReduce.
> Apparently we're not alone in this assessment, as efforts such as PigLatin
> and Sawzall appear to be promising steps in this direction.
>
> We also firmly believe that augmenting the input files with a schema would
> provide the basis for improving the overall performance of MapReduce
> applications by allowing B-trees to be created on the input data sets and
> techniques like hash partitioning to be applied. These are technologies in
> widespread practice in today's parallel DBMSs, of which there are quite a
> number on the market, including ones from IBM, Teradata, Netezza, Greenplum,
> Oracle, and Vertica. All of these should be able to execute this program
> with the same or better scalability and performance of MapReduce.
>
> Here's how these capabilities could benefit MapReduce:
>
> 1. Indexing. The filter (date > "Jan 14" and date < "Jan 23") condition can
> be executed by using a B-tree index on the date attribute of the UserVisits
> table, avoiding a sequential scan of the entire table.
>
> 2. Data movement. When you load files into a distributed file system prior
> to running MapReduce, data items are typically assigned to blocks/partitions
> in sequential order. As records are loaded into a table in a parallel
> database system, it is standard practice to apply a hash function to an
> attribute value to determine which node the record should be stored on (the
> same basic idea as is used to determine which reduce worker should get an
> output record from a map instance). For example, records being loaded into
> the Rankings and UserVisits tables might be mapped to a node by hashing on
> the pageURL and destinationURL attributes, respectively. If loaded this way,
> the join of Rankings and UserVisits in Q1 above would be performed
> completely locally with absolutely no data movement between nodes.
> Furthermore, as result records from the join are materialized, they will be
> pipelined directly into a local aggregate computation without being written
> first to disk. This local aggregate operator will partially compute the two
> aggregates (sum and average) concurrently (what is called a combiner in
> MapReduce terminology). These partial aggregates are then repartitioned by
> hashing on this sourceIPAddr to produce the final results for Q1.
>
> It is certainly the case that you could do the same thing in MapReduce by
> using hashing to map records to chunks of the file and then modifying the
> MapReduce program to exploit the knowledge of how the data was loaded. But
> in a database, physical data independence happens automatically. When Q1 is
> "compiled," the query optimizer will extract partitioning information about
> the two tables from the schema. It will then generate the correct query plan
> based on this partitioning information (e.g., maybe Rankings is hash
> partitioned on pageURL but UserVisits is hash partitioned on sourceIPAddr).
> This happens transparently to any user (modulo changes in response time) who
> submits a query involving a join of the two tables.
>
> 3. Column representation. Many questions access only a subset of the fields
> of the input files. The others do not need to be read by a column store.
>
> 4. Push, not pull. MapReduce relies on the materialization of the output
> files from the map phase on disk for fault tolerance. Parallel database
> systems push the intermediate files directly to the receiving (i.e., reduce)
> nodes, avoiding writing the intermediate results and then reading them back
> as they are pulled by the reduce computation. This provides MapReduce far
> superior fault tolerance at the expense of additional I/Os.
>
> In general, we expect these mechanisms to provide about a factor of 10 to
> 100 performance advantage, depending on the selectivity of the query, the
> width of the input records to the map computation, and the size of the
> output files from the map phase. As such, we believe that 10 to 100 parallel
> database nodes can do the work of 1,000 MapReduce nodes.
>
> To further illustrate out point, suppose you have a more general filter, F,
> a more general group_by function, G, and a more general Reduce function, R.
> PostgreSQL (an open source, free DBMS) allows the following SQL query over a
> table T:
>
>    Select R (T)
>    From T
>    Group_by G (T)
>    Where F (T)
>
> F, R, and G can be written in a general-purpose language like C or C++. A
> SQL engine, extended with user-defined functions and aggregates, has nearly
> -- if not all -- of the generality of MapReduce.
>
> As such, we claim that most things that are possible in MapReduce are also
> possible in a SQL engine. Hence, it is exactly appropriate to compare the
> two approaches. We are working on a more complete paper that demonstrates
> the relative performance and relative programming effort between the two
> approaches, so, stay tuned.
>
> Feedback No. 2: MapReduce has excellent scalability; the proof is Google's
> use
>
> Many readers took offense at our comment about scaling and asserted that
> since Google runs MapReduce programs on 1,000s (perhaps 10s of 1,000s) of
> nodes it must scale well. Having started benchmarking database systems 25
> years ago (yes, in 1983), we believe in a more scientific approach toward
> evaluating the scalability of any system for data intensive applications.
>
> Consider the following scenario. Assume that you have a 1 TB data set that
> has been partitioned across 100 nodes of a cluster (each node will have
> about 10 GB of data). Further assume that some MapReduce computation runs in
> 5 minutes if 100 nodes are used for both the map and reduce phases. Now
> scale the dataset to 10 TB, partition it over 1,000 nodes, and run the same
> MapReduce computation using those 1,000 nodes. If the performance of
> MapReduce scales linearly, it will execute the same computation on 10x the
> amount of data using 10x more hardware in the same 5 minutes. Linear scaleup
> is the gold standard for measuring the scalability of data intensive
> applications. As far as we are aware there are no published papers that
> study the scalability of MapReduce in a controlled scientific fashion.
> MapReduce may indeed scale linearly, but we have not seen published evidence
> of this.
>
> Feedback No. 3: MapReduce is cheap and databases are expensive
>
> Every organization has a "build" versus "buy" decision, and we don't
> question the decision by Google to roll its own data analysis solution. We
> also don't intend to defend DBMS pricing by the commercial vendors. What we
> wanted to point out is that we believe it is possible to build a version of
> MapReduce with more functionality and better performance. Pig is an
> excellent step in this direction.
>
> Also, we want to mention that there are several open source (i.e., free)
> DBMSs, including PostgreSQL, MySQL, Ingres, and BerkeleyDB. Several of the
> aforementioned parallel DBMS companies have increased the scale of these
> open source systems by adding parallel computing extensions.
>
> A number of individuals also commented that SQL and the relational data
> model are too restrictive. Indeed, the relational data model might very well
> be the wrong data model for the types of datasets that MapReduce
> applications are targeting. However, there is considerable ground between
> the relational data model and no data model at all. The point we were trying
> to make is that developers writing business applications have benefited
> significantly from the notion of organizing data in the database according
> to a data model and accessing that data through a declarative query
> language. We don't care what that language or model is. Pig, for example,
> employs a nested relational model, which gives developers more flexibility
> that a traditional 1NF doesn't allow.
>
> Feedback No. 4: We are the old guard trying to defend our turf/legacy from
> the young turks
>
> 反馈 4:我们是老古董,只是想保护自己的领土/财产不受年轻一辈的侵蚀
>
> Since both of us are among the "gray beards" and have been on this earth
> about 2 Giga-seconds, we have seen a lot of ideas come and go. We are
> constantly struck by the following two observations:
>
> 因为我俩都是"白胡子"老头了,已经在这个地球上呆了超过2G秒了,我们看到过很多主意的产生和消失。而且我们经常被下面两个现象所烦恼:
>
> How insular computer science is. The propagation of ideas from
> sub-discipline to sub-discipline is very slow and sketchy. Most of us are
> content to do our own thing, rather than learn what other sub-disciplines
> have to offer.
>
> 计算机科学是多么地孤立。观念从一个子学科传播到另外一个子学科是非常缓慢且残缺的。我们中大多数人都只想做自己的事情,而不是从其它子学科学习已经具备
> 的东西。
>
> How little knowledge is passed from generation to generation. In a recent
> paper entitled "What goes around comes around," (M. Stonebraker/J.
> Hellerstein, Readings in Database Systems 4th edition, MIT Press, 2004) one
> of us noted that many current database ideas were tried a quarter of a
> century ago and discarded. However, such pragma does not seem to be passed
> down from the "gray beards" to the "young turks." The turks and gray beards
> aren't usually and shouldn't be adversaries.
>
> 代与代之间相传的知识是如此之少。在最近的一篇题为"似水流年"("What goes around comes around," (M.
> Stonebraker/J. Hellerstein, Readings in Database Systems 4th edition, MIT
> Press, 2004)
> )的文章中,我们中的一个提到了很多现代数据库的观点都曾经在四分之一世纪之前尝试过并被抛弃掉。但是,这些试验仿佛并没有从"白胡子"老头传授给"年轻
> 一辈"。年轻或者年长通常不是也不应该成为对立面。
>
> Thanks for stopping by the "pasture" and reading this post. We look forward
> to reading your feedback, comments and alternative viewpoints.
>
> --
> 夏清然
> Xia Qingran
> qingran在zeuux.org
>
> _______________________________________________
> zeuux-universe mailing list
> zeuux-universe在zeuux.org
> http://www.zeuux.org/mailman/listinfo/zeuux-universe
>
> ZEUUX Project - Free Software, Free Society!
> http://www.zeuux.org
>




-- 

http://zoomquiet.org'''
过程改进乃是催生可促生靠谱的人的组织!
PE keeps evolving organizations which promoting people be good!'''
-------------- 下一部分 --------------
A non-text attachment was scrubbed...
Name: 2008-07-09-122130_864x233_scrot.png
Type: image/png
Size: 25524 bytes
Desc: 不可用
URL: <http://www.zeuux.org/pipermail/zeuux-universe/attachments/20080709/24f2955e/attachment-0001.png>

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

2008年07月09日 星期三 14:32

清风 paradise.qingfeng在gmail.com
星期三 七月 9 14:32:08 CST 2008

从hadoop角度说,我觉得Map/Reduce强调的更多的应该是并行计算吧,关乎存储的部分应该是HDFS和HBase吧.

2008/7/9 Xia Qingran <qingran在zeuux.org>:
> 转一篇文章"MapReduce:一个巨大的倒退":
> 转自:
> http://www.pgsqldb.org/mwiki/index.php/MapReduce:_%E4%B8%80%E4%B8%AA%E5%B7%A8%E5%A4%A7%E7%9A%84%E5%80%92%E9%80%80
> http://www.pgsqldb.org/mwiki/index.php/MapReduce_II
>
> MapReduce: 一个巨大的倒退
>
> From PostgreSQL 中文维基, PostgreSQL 中文站, PostgreSQL 中国社区, PostgreSQL Chinese
> community
>
> Jump to: navigation, search
>
> 目录
>
> [隐藏]
>
> 1 前言
> 2 MapReduce: A major step backwards/MapReduce: 一个巨大的倒退
> 3 What is MapReduce?/何谓MapReduce?
>
> 3.1 MapReduce is a step backwards in database access
> 3.2 MapReduce is a poor implementation
> 3.3 MapReduce is not novel
> 3.4 MapReduce is missing features
> 3.5 MapReduce is incompatible with the DBMS tools
>
> 4 In Summary
> 5 References
>
> [编辑] 前言
>
> databasecolumn 的数据库大牛们(其中包括PostgreSQL的最初伯克利领导:Michael
> Stonebraker)最近写了一篇评论当前如日中天的MapReduce 技术的文章,引发剧烈的讨论。我抽空在这儿翻译一些,一起学习。
>
> 译者注:这种 Tanenbaum vs. Linus 式的讨论自然会导致非常热烈的争辩。但是老实说,从 Tanenbaum vs. Linus
> 的辩论历史发展来看,Linux是越来越多地学习并以不同方式应用了 Tanenbaum 等 OS 研究者的经验(而不是背弃); 所以 MapReduce
> vs. DBMS 的讨论,希望也能给予后来者更多的启迪,而不是对立。
>
> 原文见:
>
> http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html
>
> [编 辑] MapReduce: A major step backwards/MapReduce: 一个巨大的倒退
>
> 注:作者是 David J. DeWitt 和 Michael Stonebraker
>
> On January 8, a Database Column reader asked for our views on new
> distributed database research efforts, and we'll begin here with our views
> on MapReduce. This is a good time to discuss it, since the recent trade
> press has been filled with news of the revolution of so-called "cloud
> computing." This paradigm entails harnessing large numbers of (low-end)
> processors working in parallel to solve a computing problem. In effect, this
> suggests constructing a data center by lining up a large number of "jelly
> beans" rather than utilizing a much smaller number of high-end servers.
>
> 1月8日,一位Database
> Column的读者询问我们对各种新的分布式数据库研究工作有何看法,我们就从MapReduce谈起吧。现在讨论MapReduce恰逢其时,因为最近
> 商业媒体充斥着所谓"云计算(cloud
> computing)"革命的新闻。这种计算方式通过大量(低端的)并行工作的处理器来解决计算问题。实际上,就是用大量便宜货(原文是jelly
> beans)代替数量小得多的高端服务器来构造数据中心。
>
> For example, IBM and Google have announced plans to make a 1,000 processor
> cluster available to a few select universities to teach students how to
> program such clusters using a software tool called MapReduce [1]. Berkeley
> has gone so far as to plan on teaching their freshman how to program using
> the MapReduce framework.
>
> 例如,IBM和Google已经宣布,计划构建一个1000处理器的集群,开放给几个大学,教授学生使用一种名为MapReduce
> [1]的软件工具对这种集群编程。加州大学伯克利分校甚至计划教一年级新生如何使用MapReduce框架编程。
>
> As both educators and researchers, we are amazed at the hype that the
> MapReduce proponents have spread about how it represents a paradigm shift in
> the development of scalable, data-intensive applications. MapReduce may be a
> good idea for writing certain types of general-purpose computations, but to
> the database community, it is:
>
> 我们都既是教育者也是研究人员,MapReduce支持者们大肆宣传它代表了可伸缩、数据密集计算发展中的一次范型转移,对此我们非常惊讶。
> MapReduce就编写某些类型的通用计算程序而言,可能是个不错的想法,但是从数据库界看来,并非如此:
>
> A giant step backward in the programming paradigm for large-scale data
> intensive applications
> A sub-optimal implementation, in that it uses brute force instead of
> indexing
> Not novel at all -- it represents a specific implementation of well known
> techniques developed nearly 25 years ago
> Missing most of the features that are routinely included in current DBMS
> Incompatible with all of the tools DBMS users have come to depend on
>
> 在大规模的数据密集应用的编程领域,它是一个巨大的倒退
> 它是一个非最优的实现,使用了蛮力而非索引
> 它一点也不新颖――代表了一种25年前已经开发得非常完善的技术
> 它缺乏当前DBMS基本都拥有的大多数特性
> 它和DBMS用户已经依赖的所有工具都不兼容
>
> First, we will briefly discuss what MapReduce is; then we will go into more
> detail about our five reactions listed above.
>
> 首先,我们简要地讨论一下MapReduce是什么,然后更详细地阐述上面列出的5点看法。
>
> [编辑] What is MapReduce?/何谓MapReduce?
>
> The basic idea of MapReduce is straightforward. It consists of two programs
> that the user writes called map and reduce plus a framework for executing a
> possibly large number of instances of each program on a compute cluster.
>
> MapReduce的基本思想很直接。它包括用户写的两个程序:map和reduce,以及一个framework,在一个计算机簇中执行大量的每
> 个程序的实例。
>
> The map program reads a set of "records" from an input file, does any
> desired filtering and/or transformations, and then outputs a set of records
> of the form (key, data). As the map program produces output records, a
> "split" function partitions the records into M disjoint buckets by applying
> a function to the key of each output record. This split function is
> typically a hash function, though any deterministic function will suffice.
> When a bucket fills, it is written to disk. The map program terminates with
> M output files, one for each bucket.
>
> map程序从输入文件中读取"records"的集合,执行任何需要的过滤或者转换,并且以(key,data)的形式输出
> records的集合。当map程序产生输出记录,"split"函数对每一个输出的记录的key应用一个函数,将records分割为M个不连续的块
> (buckets)。这个split函数有可能是一个hash函数,而其他确定的函数也是可用的。当一个块被写满后,将被写道磁盘上。然后map程序终
> 止,输出M个文件,每一个代表一个块(bucket)。
>
> In general, there are multiple instances of the map program running on
> different nodes of a compute cluster. Each map instance is given a distinct
> portion of the input file by the MapReduce scheduler to process. If N nodes
> participate in the map phase, then there are M files on disk storage at each
> of N nodes, for a total of N * M files; Fi,j, 1 ≤ i ≤ N, 1 ≤ j ≤ M.
>
> 通常情况下,map程序的多个实例持续运行在compute cluster的不同节点上。每一个map实例都被MapReduce
> scheduler分配了input
> file的不同部分,然后执行。如果有N个节点参与到map阶段,那么在这N个节点的磁盘储存都有M个文件,总共有N*M个文件。
>
> The key thing to observe is that all map instances use the same hash
> function. Hence, all output records with the same hash value will be in
> corresponding output files.
>
> 值得注意的地方是,所有的map实例都使用同样的hash函数。因此,有相同hash值的所有output record会出被放到相应的输出文件中。
>
> The second phase of a MapReduce job executes M instances of the reduce
> program, Rj, 1 ≤ j ≤ M. The input for each reduce instance Rj consists of
> the files Fi,j, 1 ≤ i ≤ N. Again notice that all output records from the map
> phase with the same hash value will be consumed by the same reduce instance
> -- no matter which map instance produced them. After being collected by the
> map-reduce framework, the input records to a reduce instance are grouped on
> their keys (by sorting or hashing) and feed to the reduce program. Like the
> map program, the reduce program is an arbitrary computation in a
> general-purpose language. Hence, it can do anything it wants with its
> records. For example, it might compute some additional function over other
> data fields in the record. Each reduce instance can write records to an
> output file, which forms part of the "answer" to a MapReduce computation.
>
> MapReduce的第二个阶段执行M个reduce程序的实例, Rj, 1 <= j <= M. 每一个reduce实例的输入是Rj,包含文件Fi,j,
> 1<= i <= N. 注意,每一个来自map阶段的output record,含有相同的hash值的record将会被相同的reduce实例处理 --
> 不论是哪一个map实例产生的数据。在map-reduce架构处理过后,input
> records将会被以他们的keys来分组(以排序或者哈希的方式),到一个reduce实例然后给reduce程序处理。和map程序一
> 样,reduce程序是任意计算言表示的。因此,它可以对它的records做任何想做事情。例如,可以添加一些额外的函数,来计算record的其他 data
> field。每一个reduce实例可以将records写到输出文件中,组成MapReduce计算的"answer"的一部分。
>
> To draw an analogy to SQL, map is like the group-by clause of an aggregate
> query. Reduce is analogous to the aggregate function (e.g., average) that is
> computed over all the rows with the same group-by attribute.
>
> 和SQL可以做对比的是,map程序和聚集查询中的 group-by
> 语句相似。Reduce函数和聚集函数(例如,average,求平均)相似,在所有的有相同group-by的属性的列上计算。
>
> We now turn to the five concerns we have with this computing paradigm.
>
> 现在来谈一谈我们对这种计算方式的5点看法。
>
> [编 辑] MapReduce is a step backwards in database access
>
> As a data processing paradigm, MapReduce represents a giant step backwards.
> The database community has learned the following three lessons from the 40
> years that have unfolded since IBM first released IMS in 1968.
>
> Schemas are good.
> Separation of the schema from the application is good.
> High-level access languages are good.
>
> Schemas是有益的。
> 将schema和程序分开处理是有益的。
> High-level存取语言是有益的。
>
> MapReduce has learned none of these lessons and represents a throw back to
> the 1960s, before modern DBMSs were invented.
>
> MapReduce没有学到任何一条,并且倒退回了60年代,倒退回了现代数据库管理系统发明以前的时代。
>
> The DBMS community learned the importance of schemas, whereby the fields and
> their data types are recorded in storage. More importantly, the run-time
> system of the DBMS can ensure that input records obey this schema. This is
> the best way to keep an application from adding "garbage" to a data set.
> MapReduce has no such functionality, and there are no controls to keep
> garbage out of its data sets. A corrupted MapReduce dataset can actually
> silently break all the MapReduce applications that use that dataset.
>
> DBMS社区懂得schemas的重要性,凭借fields和他们的数据类型记录在储存中。更重要的,运行状态的DBMS系统可以确定输
> 入的记录都遵循这个schema。这是最佳的保护程序不会添加任何垃圾信息到数据集中。MapReduce没有任何这样的功能,没有任何控制数据集的预防
> 垃圾数据机制。一个损坏的MapReduce数据集事实上可以无声无息的破坏所有使用这个数据集的MapReduce程序。
>
> It is also crucial to separate the schema from the application program. If a
> programmer wants to write a new application against a data set, he or she
> must discover the record structure. In modern DBMSs, the schema is stored in
> a collection of system catalogs and can be queried (in SQL) by any user to
> uncover such structure. In contrast, when the schema does not exist or is
> buried in an application program, the programmer must discover the structure
> by an examination of the code. Not only is this a very tedious exercise, but
> also the programmer must find the source code for the application. This
> latter tedium is forced onto every MapReduce programmer, since there are no
> system catalogs recording the structure of records -- if any such structure
> exists.
>
> 将schema和程序分开也非常重要。如果一个程序员想要对一个数据集写一个新程序,他必须知道数据集的结构(record
> structure)。现代DBMS系统中,shcema储存在系统目录中,并且可以被任意用户查询(使用SQL)它的结构。相反的,如果schema不
> 存在或者存在于程序中,程序员必须检查程序的代码来获得数据的结构。这不仅是一个单调枯燥的尝试,而且程序员必须能够找到先前程序的source
> code。每一个MapReduce程序员都必须承受后者的乏味,因为没有系统目录用来储存records的结构 -- 就算这些结构存在。
>
> During the 1970s the DBMS community engaged in a "great debate" between the
> relational advocates and the Codasyl advocates. One of the key issues was
> whether a DBMS access program should be written:
>
> By stating what you want - rather than presenting an algorithm for how to
> get it (relational view)
> By presenting an algorithm for data access (Codasyl view)
>
> 70年代DBMS社区,在关系型数据库支持者和Codasys型数据库支持者之间发有一次"大讨论"。一个重点议题就是是否DBMS存取程序应该写 入:
>
> 直接开始你想要的 -- 而不是展示一个算法,解释如何工作的。 (关系型数据库的观点)
> 展示数据存取的算法。(Codasyl 的观点)
>
> The result is now ancient history, but the entire world saw the value of
> high-level languages and relational systems prevailed. Programs in
> high-level languages are easier to write, easier to modify, and easier for a
> new person to understand. Codasyl was rightly criticized for being "the
> assembly language of DBMS access." A MapReduce programmer is analogous to a
> Codasyl programmer -- he or she is writing in a low-level language
> performing low-level record manipulation. Nobody advocates returning to
> assembly language; similarly nobody should be forced to program in
> MapReduce.
>
> MapReduce advocates might counter this argument by claiming that the
> datasets they are targeting have no schema. We dismiss this assertion. In
> extracting a key from the input data set, the map function is relying on the
> existence of at least one data field in each input record. The same holds
> for a reduce function that computes some value from the records it receives
> to process.
>
> Writing MapReduce applications on top of Google's BigTable (or Hadoop's
> HBase) does not really change the situation significantly. By using a
> self-describing tuple format (row key, column name, {values}) different
> tuples within the same table can actually have different schemas. In
> addition, BigTable and HBase do not provide logical independence, for
> example with a view mechanism. Views significantly simplify keeping
> applications running when the logical schema changes.
>
> [编辑] MapReduce is a poor implementation
>
> 2. MapReduce是一个糟糕的实现
>
> All modern DBMSs use hash or B-tree indexes to accelerate access to data. If
> one is looking for a subset of the records (e.g., those employees with a
> salary of 10,000 or those in the shoe department), then one can often use an
> index to advantage to cut down the scope of the search by one to two orders
> of magnitude. In addition, there is a query optimizer to decide whether to
> use an index or perform a brute-force sequential search.
>
> 所有现代DBMS都使用散列或者B树索引加速数据存取。如果要寻找记录的某个子集(比如薪水为10000的雇员或者鞋部的雇员),经常可以使用索引
> 有效地将搜索范围缩小一到两个数量级。而且,还有查询优化器来确定是使用索引还是执行蛮力顺序搜索。
>
> MapReduce has no indexes and therefore has only brute force as a processing
> option. It will be creamed whenever an index is the better access mechanism.
>
> MapReduce没有索引,因此处理时只有蛮力一种选择。在索引是更好的存取机制时,MapReduce将劣势尽显。
>
> One could argue that value of MapReduce is automatically providing parallel
> execution on a grid of computers. This feature was explored by the DBMS
> research community in the 1980s, and multiple prototypes were built
> including Gamma [2,3], Bubba [4], and Grace [5]. Commercialization of these
> ideas occurred in the late 1980s with systems such as Teradata.
>
> 有人可能会说,MapReduce的价值在于在计算机网格上自动地提供并行执行。这种特性数据库研究界在上世纪80年代就已经探讨过
> 了,而且构建了许多原型,包括 Gamma [2,3], Bubba [4], 和 Grace
> [5]。而Teradata这样的系统早在80年代晚期,就将这些想法商业化了。
>
> In summary to this first point, there have been high-performance,
> commercial, grid-oriented SQL engines (with schemas and indexing) for the
> past 20 years. MapReduce does not fare well when compared with such systems.
>
> There are also some lower-level implementation issues with MapReduce,
> specifically skew and data interchange.
>
> One factor that MapReduce advocates seem to have overlooked is the issue of
> skew. As described in "Parallel Database System: The Future of High
> Performance Database Systems," [6] skew is a huge impediment to achieving
> successful scale-up in parallel query systems. The problem occurs in the map
> phase when there is wide variance in the distribution of records with the
> same key. This variance, in turn, causes some reduce instances to take much
> longer to run than others, resulting in the execution time for the
> computation being the running time of the slowest reduce instance. The
> parallel database community has studied this problem extensively and has
> developed solutions that the MapReduce community might want to adopt.
>
> There is a second serious performance problem that gets glossed over by the
> MapReduce proponents. Recall that each of the N map instances produces M
> output files -- each destined for a different reduce instance. These files
> are written to a disk local to the computer used to run the map instance. If
> N is 1,000 and M is 500, the map phase produces 500,000 local files. When
> the reduce phase starts, each of the 500 reduce instances needs to read its
> 1,000 input files and must use a protocol like FTP to "pull" each of its
> input files from the nodes on which the map instances were run. With 100s of
> reduce instances running simultaneously, it is inevitable that two or more
> reduce instances will attempt to read their input files from the same map
> node simultaneously -- inducing large numbers of disk seeks and slowing the
> effective disk transfer rate by more than a factor of 20. This is why
> parallel database systems do not materialize their split files and use push
> (to sockets) instead of pull. Since much of the excellent fault-tolerance
> that MapReduce obtains depends on materializing its split files, it is not
> clear whether the MapReduce framework could be successfully modified to use
> the push paradigm instead.
>
> Given the experimental evaluations to date, we have serious doubts about how
> well MapReduce applications can scale. Moreover, the MapReduce implementers
> would do well to study the last 25 years of parallel DBMS research
> literature.
>
> [编辑] MapReduce is not novel
>
> The MapReduce community seems to feel that they have discovered an entirely
> new paradigm for processing large data sets. In actuality, the techniques
> employed by MapReduce are more than 20 years old. The idea of partitioning a
> large data set into smaller partitions was first proposed in "Application of
> Hash to Data Base Machine and Its Architecture" [11] as the basis for a new
> type of join algorithm. In "Multiprocessor Hash-Based Join Algorithms," [7],
> Gerber demonstrated how Kitsuregawa's techniques could be extended to
> execute joins in parallel on a shared-nothing [8] cluster using a
> combination of partitioned tables, partitioned execution, and hash based
> splitting. DeWitt [2] showed how these techniques could be adopted to
> execute aggregates with and without group by clauses in parallel. DeWitt and
> Gray [6] described parallel database systems and how they process queries.
> Shatdal and Naughton [9] explored alternative strategies for executing
> aggregates in parallel.
>
> Teradata has been selling a commercial DBMS utilizing all of these
> techniques for more than 20 years; exactly the techniques that the MapReduce
> crowd claims to have invented.
>
> While MapReduce advocates will undoubtedly assert that being able to write
> MapReduce functions is what differentiates their software from a parallel
> SQL implementation, we would remind them that POSTGRES supported
> user-defined functions and user-defined aggregates in the mid 1980s.
> Essentially, all modern database systems have provided such functionality
> for quite a while, starting with the Illustra engine around 1995.
>
> [编辑] MapReduce is missing features
>
> All of the following features are routinely provided by modern DBMSs, and
> all are missing from MapReduce:
>
> Bulk loader -- to transform input data in files into a desired format and
> load it into a DBMS
> Indexing -- as noted above
> Updates -- to change the data in the data base
> Transactions -- to support parallel update and recovery from failures during
> update
> Integrity constraints -- to help keep garbage out of the data base
> Referential integrity -- again, to help keep garbage out of the data base
> Views -- so the schema can change without having to rewrite the application
> program
>
> In summary, MapReduce provides only a sliver of the functionality found in
> modern DBMSs.
>
> [编辑] MapReduce is incompatible with the DBMS tools
>
> A modern SQL DBMS has available all of the following classes of tools:
>
> Report writers (e.g., Crystal reports) to prepare reports for human
> visualization
> Business intelligence tools (e.g., Business Objects or Cognos) to enable
> ad-hoc querying of large data warehouses
> Data mining tools (e.g., Oracle Data Mining or IBM DB2 Intelligent Miner) to
> allow a user to discover structure in large data sets
> Replication tools (e.g., Golden Gate) to allow a user to replicate data from
> on DBMS to another
> Database design tools (e.g., Embarcadero) to assist the user in constructing
> a data base.
>
> MapReduce cannot use these tools and has none of its own. Until it becomes
> SQL-compatible or until someone writes all of these tools, MapReduce will
> remain very difficult to use in an end-to-end task.
>
> [编辑] In Summary
>
> It is exciting to see a much larger community engaged in the design and
> implementation of scalable query processing techniques. We, however, assert
> that they should not overlook the lessons of more than 40 years of database
> technology -- in particular the many advantages that a data model, physical
> and logical data independence, and a declarative query language, such as
> SQL, bring to the design, implementation, and maintenance of application
> programs. Moreover, computer science communities tend to be insular and do
> not read the literature of other communities. We would encourage the wider
> community to examine the parallel DBMS literature of the last 25 years.
> Last, before MapReduce can measure up to modern DBMSs, there is a large
> collection of unmet features and required tools that must be added.
>
> 看到规模大得多的社区加入可伸缩的查询处理技术的设计与实现,非常令人兴奋。但是,我们要强调,他们不应该忽视数据库技术40多年来的教
> 训,尤其是数据库技术中数据模型、物理和逻辑数据独立性、像SQL这样的声明性查询语言等等,可以为应用程序的设计、实现和维护带来的诸多好处。而且,计
> 算机科学界往往喜欢自行其是,不理会其他学科的文献。我们希望更多人来一起研究过去25年的并行DBMS文献。MapReduce要达到能够与现代
> DBMS相提并论的水平,还需要开发大量特性和工具。
>
> We fully understand that database systems are not without their problems.
> The database community recognizes that database systems are too "hard" to
> use and is working to solve this problem. The database community can also
> learn something valuable from the excellent fault-tolerance that MapReduce
> provides its applications. Finally we note that some database researchers
> are beginning to explore using the MapReduce framework as the basis for
> building scalable database systems. The Pig[10] project at Yahoo! Research
> is one such effort.
>
> 我们完全理解数据库系统也有自己的问题。数据库界清楚地认识到,现在数据库系统还太"难"使用,而且正在解决这一问题。数据库界也从
> MapReduce为其应用程序提供的出色的容错上学到了有价值的东西。最后,我们注意到,一些数据库研究人员也开始研究使用MapReduce框架作为
> 构建可伸缩数据库系统的基础。雅虎研究院的Pig[10]项目就是其中之一。
>
> [编辑] References
>
> [1] "MapReduce: Simplified Data Processing on Large Clusters," Jeff Dean and
> Sanjay Ghemawat, Proceedings of the 2004 OSDI Conference, 2004.
>
> [2] "The Gamma Database Machine Project," DeWitt, et. al., IEEE Transactions
> on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.
>
> [4] "Gamma - A High Performance Dataflow Database Machine," DeWitt, D, R.
> Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna, Proceedings
> of the 1986 VLDB Conference, 1986.
>
> [5] "Prototyping Bubba, A Highly Parallel Database System," Boral, et. al.,
> IEEE Transactions on Knowledge and Data Engineering,Vol. 2, No. 1, March
> 1990.
>
> [6] "Parallel Database System: The Future of High Performance Database
> Systems," David J. DeWitt and Jim Gray, CACM, Vol. 35, No. 6, June 1992.
>
> [7] "Multiprocessor Hash-Based Join Algorithms," David J. DeWitt and Robert
> H. Gerber, Proceedings of the 1985 VLDB Conference, 1985.
>
> [8] "The Case for Shared-Nothing," Michael Stonebraker, Data Engineering
> Bulletin, Vol. 9, No. 1, 1986.
>
> [9] "Adaptive Parallel Aggregation Algorithms," Ambuj Shatdal and Jeffrey F.
> Naughton, Proceedings of the 1995 SIGMOD Conference, 1995.
>
> [10] "Pig", Chris Olston, http://research.yahoo.com/project/90
>
> [11] "Application of Hash to Data Base Machine and Its Architecture," Masaru
> Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka, New Generation Comput. 1(1):
> 63-74 (1983)
>
> MapReduce II
>
> From PostgreSQL 中文维基, PostgreSQL 中文站, PostgreSQL 中国社区, PostgreSQL Chinese
> community
>
> Jump to: navigation, search
>
> MapReduce II
>
> [Note: Although the system attributes this post to a single author, it was
> written by David J. DeWitt and Michael Stonebraker]
>
> 作者:David J.DeWitt 和 Michael Stonebraker
>
> Last week's MapReduce post attracted tens of thousands of readers and
> generated many comments, almost all of them attacking our critique. Just to
> let you know, we don't hold a personal grudge against MapReduce. MapReduce
> didn't kill our dog, steal our car, or try and date our daughters.
>
> Our motivations for writing about MapReduce stem from MapReduce being
> increasingly seen as the most advanced and/or only way to analyze massive
> datasets. Advocates promote the tool without seemingly paying attention to
> years of academic and commercial database research and real world use.
>
> The point of our initial post was to say that there are striking
> similarities between MapReduce and a fairly primitive parallel database
> system. As such, MapReduce can be significantly improved by learning from
> the parallel database community.
>
> So, hold off on your comments for just a few minutes, as we will spend the
> rest of this post addressing four specific topics brought up repeatedly by
> those who commented on our previous blog:
>
> 先暂停抱怨几分钟,我们将就上一篇文章的回复中反复出现的特定专题进行一些回答:
>
> . MapReduce is not a database system, so don't judge it as one
> . MapReduce has excellent scalability; the proof is Google's use
> . MapReduce is cheap and databases are expensive
> . We are the old guard trying to defend our turf/legacy from the young turks
>
> . MapReduce 不是数据库系统,所以不要以数据库库系统来判断它
> . MapReduce 有着杰出的扩展性; 证明就是 Google 的使用
> . MapReduce 很便宜,而数据库很昂贵
> . 我们是老古董,只是想保护自己的领土/财产不受年轻一辈的侵蚀
>
> Feedback No. 1: MapReduce is not a database system, so don't judge it as one
>
> It's not that we don't understand this viewpoint. We are not claiming that
> MapReduce is a database system. What we are saying is that like a DBMS + SQL
> + analysis tools, MapReduce can be and is being used to analyze and perform
> computations on massive datasets. So we aren't judging apples and oranges.
> We are judging two approaches to analyzing massive amounts of information,
> even for less structured information.
>
> To illustrate our point, assume that you have two very large files of facts.
> The first file contains structured records of the form:
>
>    Rankings (pageURL, pageRank)
>
> Records in the second file have the form:
>
>    UserVisits (sourceIPAddr, destinationURL, date, adRevenue)
>
> Someone might ask, "What IP address generated the most ad revenue during the
> week of January 15th to the 22nd, and what was the average page rank of the
> pages visited?"
>
> This question is a little tricky to answer in MapReduce because it consumes
> two data sets rather than one, and it requires a "join" of the two datasets
> to find pairs of Ranking and UserVisit records that have matching values for
> pageURL and destinationURL. In fact, it appears to require three MapReduce
> phases, as noted below.
>
> Phase 1
>
> This phase filters UserVisits records that are outside the desired data
> range and then "joins" the qualifying records with records from the Rankings
> file.
>
> Map program: The map program scans through UserVisits and Rankings records.
> Each UserVisit record is filtered on the date range specification.
> Qualifying records are emitted with composite keys of the form
>  where T1 indicates that it is a UserVisits record.
> Rankings records are emitted with composite keys of the form 
> (T2 is a tag indicating it a Rankings record). Output records are
> repartitioned using a user-supplied partitioning function that only hashes
> on the URL portion of the composite key.
>
> Reduce Program: The input to the reduce program is a single sorted run of
> records in URL order. For each unique URL, the program splits the incoming
> records into two sets (one for Rankings records and one for UserVisits
> records) using the tag component of the composite key. To complete the join,
> reduce finds all matching pairs of records of the two sets. Output records
> are in the form of Temp1 (sourceIPAddr, pageURL, pageRank, adRevenue).
>
> The reduce program must be capable of handling the case in which one or both
> of these sets with the same URL are too large to fit into memory and must be
> materialized on disk. Since access to these sets is through an iterator, a
> straightforward implementation will result in what is termed a nested-loops
> join. This join algorithm is known to have very bad performance I/O
> characteristics as "inner" set is scanned once for each record of the
> "outer" set.
>
> Phase 2
>
> This phase computes the total ad revenue and average page rank for each
> Source IP Address.
>
> Map program: Scan Temp1 using the identity function on sourceIPAddr.
> Reduce program: The reduce program makes a linear pass over the data. For
> each sourceIPAddr, it will sum the ad-revenue and compute the average page
> rank, retaining the one with the maximum total ad revenue. Each reduce
> worker then outputs a single record of the form Temp2 (sourceIPAddr,
> total_adRevenue, average_pageRank).
>
> Phase 3
>
> Map program: The program uses a single map worker that scans Temp2 and
> outputs the record with the maximum value for total_adRevenue.
>
> We realize that portions of the processing steps described above are handled
> automatically by the MapReduce infrastructure (e.g., sorting and
> partitioning the records). Although we have not written this program, we
> estimate that the custom parts of the code (i.e., the map() and reduce()
> functions) would require substantially more code than the two fairly simple
> SQL statements to do the same:
>
> Q1
>
>    Select as Temp  sourceIPAddr, avg(pageRank) as avgPR, sum(adRevenue) as
> adTotal
>    From Rankings, UserVisits
>    where Rankings.pageURL = UserVisits.destinationURL and
>    date > "Jan 14" and date < "Jan 23"
>    Group by sourceIPAddr
>
> Q2
>
>    Select sourceIPAddr, adTotal, avgPR
>    From Temp
>    Where adTotal = max (adTotal)
>
> No matter what you think of SQL, eight lines of code is almost certainly
> easier to write and debug than the programming required for MapReduce. We
> believe that MapReduce advocates should consider the advantages that
> layering a high-level language like SQL could provide to users of MapReduce.
> Apparently we're not alone in this assessment, as efforts such as PigLatin
> and Sawzall appear to be promising steps in this direction.
>
> We also firmly believe that augmenting the input files with a schema would
> provide the basis for improving the overall performance of MapReduce
> applications by allowing B-trees to be created on the input data sets and
> techniques like hash partitioning to be applied. These are technologies in
> widespread practice in today's parallel DBMSs, of which there are quite a
> number on the market, including ones from IBM, Teradata, Netezza, Greenplum,
> Oracle, and Vertica. All of these should be able to execute this program
> with the same or better scalability and performance of MapReduce.
>
> Here's how these capabilities could benefit MapReduce:
>
> 1. Indexing. The filter (date > "Jan 14" and date < "Jan 23") condition can
> be executed by using a B-tree index on the date attribute of the UserVisits
> table, avoiding a sequential scan of the entire table.
>
> 2. Data movement. When you load files into a distributed file system prior
> to running MapReduce, data items are typically assigned to blocks/partitions
> in sequential order. As records are loaded into a table in a parallel
> database system, it is standard practice to apply a hash function to an
> attribute value to determine which node the record should be stored on (the
> same basic idea as is used to determine which reduce worker should get an
> output record from a map instance). For example, records being loaded into
> the Rankings and UserVisits tables might be mapped to a node by hashing on
> the pageURL and destinationURL attributes, respectively. If loaded this way,
> the join of Rankings and UserVisits in Q1 above would be performed
> completely locally with absolutely no data movement between nodes.
> Furthermore, as result records from the join are materialized, they will be
> pipelined directly into a local aggregate computation without being written
> first to disk. This local aggregate operator will partially compute the two
> aggregates (sum and average) concurrently (what is called a combiner in
> MapReduce terminology). These partial aggregates are then repartitioned by
> hashing on this sourceIPAddr to produce the final results for Q1.
>
> It is certainly the case that you could do the same thing in MapReduce by
> using hashing to map records to chunks of the file and then modifying the
> MapReduce program to exploit the knowledge of how the data was loaded. But
> in a database, physical data independence happens automatically. When Q1 is
> "compiled," the query optimizer will extract partitioning information about
> the two tables from the schema. It will then generate the correct query plan
> based on this partitioning information (e.g., maybe Rankings is hash
> partitioned on pageURL but UserVisits is hash partitioned on sourceIPAddr).
> This happens transparently to any user (modulo changes in response time) who
> submits a query involving a join of the two tables.
>
> 3. Column representation. Many questions access only a subset of the fields
> of the input files. The others do not need to be read by a column store.
>
> 4. Push, not pull. MapReduce relies on the materialization of the output
> files from the map phase on disk for fault tolerance. Parallel database
> systems push the intermediate files directly to the receiving (i.e., reduce)
> nodes, avoiding writing the intermediate results and then reading them back
> as they are pulled by the reduce computation. This provides MapReduce far
> superior fault tolerance at the expense of additional I/Os.
>
> In general, we expect these mechanisms to provide about a factor of 10 to
> 100 performance advantage, depending on the selectivity of the query, the
> width of the input records to the map computation, and the size of the
> output files from the map phase. As such, we believe that 10 to 100 parallel
> database nodes can do the work of 1,000 MapReduce nodes.
>
> To further illustrate out point, suppose you have a more general filter, F,
> a more general group_by function, G, and a more general Reduce function, R.
> PostgreSQL (an open source, free DBMS) allows the following SQL query over a
> table T:
>
>    Select R (T)
>    From T
>    Group_by G (T)
>    Where F (T)
>
> F, R, and G can be written in a general-purpose language like C or C++. A
> SQL engine, extended with user-defined functions and aggregates, has nearly
> -- if not all -- of the generality of MapReduce.
>
> As such, we claim that most things that are possible in MapReduce are also
> possible in a SQL engine. Hence, it is exactly appropriate to compare the
> two approaches. We are working on a more complete paper that demonstrates
> the relative performance and relative programming effort between the two
> approaches, so, stay tuned.
>
> Feedback No. 2: MapReduce has excellent scalability; the proof is Google's
> use
>
> Many readers took offense at our comment about scaling and asserted that
> since Google runs MapReduce programs on 1,000s (perhaps 10s of 1,000s) of
> nodes it must scale well. Having started benchmarking database systems 25
> years ago (yes, in 1983), we believe in a more scientific approach toward
> evaluating the scalability of any system for data intensive applications.
>
> Consider the following scenario. Assume that you have a 1 TB data set that
> has been partitioned across 100 nodes of a cluster (each node will have
> about 10 GB of data). Further assume that some MapReduce computation runs in
> 5 minutes if 100 nodes are used for both the map and reduce phases. Now
> scale the dataset to 10 TB, partition it over 1,000 nodes, and run the same
> MapReduce computation using those 1,000 nodes. If the performance of
> MapReduce scales linearly, it will execute the same computation on 10x the
> amount of data using 10x more hardware in the same 5 minutes. Linear scaleup
> is the gold standard for measuring the scalability of data intensive
> applications. As far as we are aware there are no published papers that
> study the scalability of MapReduce in a controlled scientific fashion.
> MapReduce may indeed scale linearly, but we have not seen published evidence
> of this.
>
> Feedback No. 3: MapReduce is cheap and databases are expensive
>
> Every organization has a "build" versus "buy" decision, and we don't
> question the decision by Google to roll its own data analysis solution. We
> also don't intend to defend DBMS pricing by the commercial vendors. What we
> wanted to point out is that we believe it is possible to build a version of
> MapReduce with more functionality and better performance. Pig is an
> excellent step in this direction.
>
> Also, we want to mention that there are several open source (i.e., free)
> DBMSs, including PostgreSQL, MySQL, Ingres, and BerkeleyDB. Several of the
> aforementioned parallel DBMS companies have increased the scale of these
> open source systems by adding parallel computing extensions.
>
> A number of individuals also commented that SQL and the relational data
> model are too restrictive. Indeed, the relational data model might very well
> be the wrong data model for the types of datasets that MapReduce
> applications are targeting. However, there is considerable ground between
> the relational data model and no data model at all. The point we were trying
> to make is that developers writing business applications have benefited
> significantly from the notion of organizing data in the database according
> to a data model and accessing that data through a declarative query
> language. We don't care what that language or model is. Pig, for example,
> employs a nested relational model, which gives developers more flexibility
> that a traditional 1NF doesn't allow.
>
> Feedback No. 4: We are the old guard trying to defend our turf/legacy from
> the young turks
>
> 反馈 4:我们是老古董,只是想保护自己的领土/财产不受年轻一辈的侵蚀
>
> Since both of us are among the "gray beards" and have been on this earth
> about 2 Giga-seconds, we have seen a lot of ideas come and go. We are
> constantly struck by the following two observations:
>
> 因为我俩都是"白胡子"老头了,已经在这个地球上呆了超过2G秒了,我们看到过很多主意的产生和消失。而且我们经常被下面两个现象所烦恼:
>
> How insular computer science is. The propagation of ideas from
> sub-discipline to sub-discipline is very slow and sketchy. Most of us are
> content to do our own thing, rather than learn what other sub-disciplines
> have to offer.
>
> 计算机科学是多么地孤立。观念从一个子学科传播到另外一个子学科是非常缓慢且残缺的。我们中大多数人都只想做自己的事情,而不是从其它子学科学习已经具备
> 的东西。
>
> How little knowledge is passed from generation to generation. In a recent
> paper entitled "What goes around comes around," (M. Stonebraker/J.
> Hellerstein, Readings in Database Systems 4th edition, MIT Press, 2004) one
> of us noted that many current database ideas were tried a quarter of a
> century ago and discarded. However, such pragma does not seem to be passed
> down from the "gray beards" to the "young turks." The turks and gray beards
> aren't usually and shouldn't be adversaries.
>
> 代与代之间相传的知识是如此之少。在最近的一篇题为"似水流年"("What goes around comes around," (M.
> Stonebraker/J. Hellerstein, Readings in Database Systems 4th edition, MIT
> Press, 2004)
> )的文章中,我们中的一个提到了很多现代数据库的观点都曾经在四分之一世纪之前尝试过并被抛弃掉。但是,这些试验仿佛并没有从"白胡子"老头传授给"年轻
> 一辈"。年轻或者年长通常不是也不应该成为对立面。
>
> Thanks for stopping by the "pasture" and reading this post. We look forward
> to reading your feedback, comments and alternative viewpoints.
>
> --
> 夏清然
> Xia Qingran
> qingran在zeuux.org
>
> _______________________________________________
> zeuux-universe mailing list
> zeuux-universe在zeuux.org
> http://www.zeuux.org/mailman/listinfo/zeuux-universe
>
> ZEUUX Project - Free Software, Free Society!
> http://www.zeuux.org
>



-- 
退订Python-cn列表:https://groups.google.com/group/python-cn/subscribe
http://blog.woodpecker.org.cn/planet/
https://www.google.com/reader/shared/01132676741164412300

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

2008年07月09日 星期三 15:05

Jiahua Huang jhuangjiahua在gmail.com
星期三 七月 9 15:05:36 CST 2008

拜托不要火星文乱贴啦,
这个学究对 mp 眼热,
已经很多人指出, MP 跟 rdbms 根本就不是同一类东西,
这篇文章是关公战秦琼。

真要拿来一块比较,也是 rdbms 跟 亚马逊的 SimpleDB 比较。

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

2008年07月09日 星期三 15:18

Xia Qingran qingran在zeuux.org
星期三 七月 9 15:18:30 CST 2008

Jiahua Huang wrote:
> 拜托不要火星文乱贴啦,
> 这个学究对 mp 眼热,
> 已经很多人指出, MP 跟 rdbms 根本就不是同一类东西,
> 这篇文章是关公战秦琼。
>
> 真要拿来一块比较,也是 rdbms 跟 亚马逊的 SimpleDB 比较。
>   
现在的rdbms cluster本身也就具备了存储 + 一定的并行计算能力,所以这位大 
叔对mp有点不服。

MapReduce vs. DBMS 的这种思想碰撞应该是给予我们更多的启迪,而不是对立了。
> _______________________________________________
> zeuux-universe mailing list
> zeuux-universe at zeuux.org
> http://www.zeuux.org/mailman/listinfo/zeuux-universe
>
> ZEUUX Project - Free Software, Free Society!
> http://www.zeuux.org


-- 
夏清然
Xia Qingran
qingran at zeuux.org


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

2008年07月09日 星期三 15:37

chifeng chifeng在gmail.com
星期三 七月 9 15:37:26 CST 2008

解决实际问题才是最重要的,管丫什么技术呢。。。



2008/7/9 Xia Qingran <qingran在zeuux.org>:

> Jiahua Huang wrote:
>
>> 拜托不要火星文乱贴啦,
>> 这个学究对 mp 眼热,
>> 已经很多人指出, MP 跟 rdbms 根本就不是同一类东西,
>> 这篇文章是关公战秦琼。
>>
>> 真要拿来一块比较,也是 rdbms 跟 亚马逊的 SimpleDB 比较。
>>
>>
> 现在的rdbms cluster本身也就具备了存储 + 一定的并行计算能力,所以这位大 叔对mp有点不服。
>
> MapReduce vs. DBMS 的这种思想碰撞应该是给予我们更多的启迪,而不是对立了。
>
>> _______________________________________________
>> zeuux-universe mailing list
>> zeuux-universe在zeuux.org
>> http://www.zeuux.org/mailman/listinfo/zeuux-universe
>>
>> ZEUUX Project - Free Software, Free Society!
>> http://www.zeuux.org
>>
>
>
> --
> 夏清然
> Xia Qingran
> qingran在zeuux.org
>
> _______________________________________________
> zeuux-universe mailing list
> zeuux-universe在zeuux.org
> http://www.zeuux.org/mailman/listinfo/zeuux-universe
>
> ZEUUX Project - Free Software, Free Society!
> http://www.zeuux.org
>



-- 
regards.
chifeng
-------------- 下一部分 --------------
一个HTML附件被移除...
URL: <http://www.zeuux.org/pipermail/zeuux-universe/attachments/20080709/cbae12fe/attachment.html>

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

2008年07月09日 星期三 15:45

Jiahua Huang jhuangjiahua在gmail.com
星期三 七月 9 15:45:14 CST 2008

On 7/9/08, chifeng <chifeng在gmail.com> wrote:
> 解决实际问题才是最重要的,管丫什么技术呢。。。
>

伯克利的学究们可不这样看……

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

2008年07月10日 星期四 18:09

Xia Qingran qingran在zeuux.org
星期四 七月 10 18:09:27 CST 2008

Jiahua Huang wrote:
>
> 伯克利的学究们可不这样看……
>   
看看大家对此篇文章的评论,很有意思

http://www.reddit.com/r/programming/info/65tyi/comments/
http://news.ycombinator.com/item?id=100175


-- 
夏清然
Xia Qingran
qingran at zeuux.org


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-universe]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2024

    京ICP备05028076号