Mathematics of Knowledge and Search Engines
September 10 - December 14, 2007
Organizing Committee |
(New York University)
(New York University)
(Los Alamos National Laboratory)
Back to Top
There will be an active program of research activities, seminars and
workshops throughout the period and core participants will be in residence
at IPAM continuously for these fourteen weeks. The program will open with
tutorials, and will be punctuated by four major workshops and a culminating
workshop at UCLA's Lake Arrowhead Conference Center. Several distinguished
senior researchers will be in residence for the entire period. Between the
workshops there will be a program of activities involving the long-term and
short-term participants, as well as visitors.
Back to Top
The rise of the search engine as a major tool for searches on the internet has spawned a large and growing industry that has changed modern commerce, education, and the study of scientific, financial, and social data bases. The underpinnings of these search engines are mathematical algorithms which are well adapted to large and rapid computations, mainly from linear algebra. While the impact of this industry has been enormous, there is a parallel development in the applications of these methods to other related problems concerning the extraction of knowledge from large databases. This long program at IPAM will be devoted to new mathematics and methodologies of knowledge engines: the mathematical procedures used to extract knowledge from large databases. While this includes topics related to search engines it is mainly devoted to the more general problem of finding features in a database or using defined features to search within a database. It is expected that this program will be of interest to a large number of scientific fields, including pure and applied mathematics, statistics, bioinformatics, and engineering. We also anticipate the study of applications to finance, the social sciences, and the humanities. Some of the topics to be investigated are listed below:
One set of problems consists of searching a data set that is presented as either a graph (perhaps with weighted edges) or a set of points in Euclidean space, perhaps of very high dimension. Many of the presently used algorithms for search engines use some variant of spectral graph theory, where one uses eigenfunctions of a Laplace operator (or some related object) to start searching. An exciting development is the related idea of Diffusion Geometry, where one has a given set in Euclidean space. While some of these ideas appear in a variety of contexts of data analysis, such as spectral graph theory, manifold learning, nonlinear principal components and kernel methods, one can augment these approaches by showing that the diffusion inference distances are key intrinsic geometric quantities linking spectral theory of the Markov process (Laplace operator or Kernels) to the corresponding “geometry of information”. It has recently been shown that one explanation for the robustness of such algorithms is that on domains in Euclidean space of dimension d with volume bounded by one (or manifolds with the correct assumption on curvature bounds), there are at any point in the domain d eigenfunctions for Laplacian (Dirichlet or Neumann) that not only provide local coordinates on B(x, e distance(x, boundary)) but map that ball to (approximately) a ball of radius one. In other words the eigenfunctions play a role similar to the classical Riemann Mapping Function of complex analysis. Another area is to use various PDE’s to define an initial diffusion, where the PDE could vary depending on location. Another area of interest here is the preprocessing stage, which requires recent advances (e.g. the work of Assaf Naor) on embeddings of metric spaces into Euclidean space.
The problems of numerical linear algebra that arise include computations of eigenfunctions and eigenvalues (or some analogue when one is not dealing with a Laplace operator), and squaring of matrices. An overriding theme is how to build sparse approximating matrices while respecting the appropriate features. This means that all problems are much more severe than simply approximating in the operator norm. One methodology, presently under intensive development, is to adapt multiscale techniques related to the work of Beylkin, Coifman, Rokhlin. It turns out that for many of these numerical issues a goal is to find the appropriate scales defined by a matrix, often in a coordinate system given by a nonlinear change of coordinates.
A third important area is to develop methodologies for steerable searches (“Dynamic Navigation”). In a certain sense this can be seen as the unifying goal of the entire area. This is of outsized importance in the world of commerce, where one has already seen a profusion of specialized search engines, but the next step is to allow searches which are more flexible in the sense that the goal can be changed on the fly.
An evolving area of search is developing effective similarity measures for different types of data, particularly data that is unstructured or which has a non-traditional type of structure. Some examples are images, music, video, patient databases.
Back to Top
This long-term program will involve a community of senior and junior researchers. The intent is for long-term participants to have an opportunity to learn about the mathematics of knowledge and search engines from the perspective of multiple fields--notably mathematics, and computer science--and to meet a diverse group of people and have an opportunity to form new collaborations. In addition to these activities, there will be opening tutorials, four workshops, and a culminating workshop at Lake Arrowhead.
Full and partial support for long-term participants is available, and those interested are encouraged to fill out an online application at the bottom of this page. Support for individual workshops will also be available, and may be applied for through the online application for each workshop. We are especially interested in applicants who are interested in becoming core participants and participating in the entire program (September 10 - December 14, 2007), but give consideration to applications for shorter periods. Funding for participants is available at all academic levels, though recent PhD's, graduate students, and researchers in the early stages of their career are especially encouraged to apply. .
Encouraging the careers of women and minority mathematicians and
scientists is an important component of IPAM's mission and we welcome their
Back to Top
Institute for Pure and Applied Mathematics (IPAM)
460 Portola Plaza
Los Angeles CA 90095-7121
Phone: 310 825-4755
Fax: 310 825-4756
Back to Top