March 28, 2008 Producing Web Pages of Value

Hayato Mizukami
Technical Director

A shift in the methods used to gather information on the Internet

Since the advent of the Internet, it has considerably changed the modes of communication and business. There also has been a significant change in the way Internet users choose Web sites for viewing and browsing various Web pages.

Gathering Information a “gigantic chaos”

In recent years, owing to various factors—a–high-capacity and low-priced broadband services for mass communication, advancements in cell phone and PC technologies, and introduction of search engines like Google—the number of steps that is required for Internet users to navigate to their target Web page has decreased compared to the early days of the Internet. At the same time, terminals and services have become available through which Internet users can perform more actions such as submit comments on, issue trackbacks to, and bookmark content on the Internet, as well as write blog entries and upload video files. As a result, the volume of information on the Internet has increased tremendously, and while the path to the targeted page and information has become shorter, the complexity involved in the process has increased. Most methods used to gather information employ search engines, search functions within each site, social bookmarking services (SBS), and social networking services (SNS), in order to select pages and content that users consider beneficial out of the huge number of Web pages. Users also employ various combinations of search keywords and queries, and make repeated searches to navigate to the pages in the bottom layers of a large Web site. Such a gigantic chaos has underlined the situation in recent years.

The “next thing” after search engines and SNS for gathering information

In the near future, as microformats and the Semantic Web proliferate, “meaning” will be attached to information and Web pages. As a result, Internet users will be able to obtain information that they seek without having to perform complex actions; instead, users' software agents will search, retrieve, organize, and provide information to the user.

Movement to escape the gigantic chaos

Will this “next thing” arrive in the immediate future? Presently, structured and organized data and information stored databases are stripped off the structure when they are presented to Internet users viewing the information on Web browsers, alternatively, structured data are hidden from users and are not accessible to them. Although small forward steps have been taken, the fruits of these endeavors cannot be obtained immediately; further, there are no “killer applications” available today.

While this column was being written, Yahoo! took a big step forward by announcing its support for the Semantic Web with the aim to enhance its “Open Search Platform.” An application programming interface (API) for the Open Search Platform is offered to Web developers, using which they can access structured data and customize search results. “It will provide Internet users with more convenience,” Yahoo! said in a statement. However, it is important that we follow-up on whether Yahoo!'s action will bring about a big change in Japan as well, because the Japanese language is used in Japan for creating Web content.

What makes a Web page valuable to Internet users?

mediajam, a company that has designed, developed, and operates a news aggregation Web service called “mediajam.” The service hopes to improve to the process of matching useful Web pages of various Web sites and the information contained therein with Internet users seeking information.

mediajam can perform automatic crawling on more than 80 Web sites that mainly handle news, in order to automatically build a news navigation site. While the layout and mark up of the crawled sites or the structure of the sentence is varied, the following actions are performed automatically:

  • Absorb the differences in the various Web sites and collect (gather) information.
  • Display (generate) all of the content in the same style and layout.
  • Connect pages that are similar in content as related links, and each page by a keyword that is automatically attached, thereby tying up various kinds of information.

mediajam invests considerable effort into loosely connecting data and pages that are valuable to Internet users.

The pages that carry value for users are the bottom layer pages in a large Web site or individual articles within a blog that provide the desired information to Internet users in response to their actions and demands such as “search,” “gather information,” and fulfill the criterion of ”I want to know.“ Furthermore, “loosely connecting” means discarding the tree structure that has hitherto been the most widely used method for organizing information on the Web (a representative example would be a Web site's sitemap); and categorizing information on the basis of keywords that can be utilized by users more naturally and easily; and creating hyperlinks on pages with similar content.

What is valued on the Web?

The most important element of a Web page, for both Internet users and search engines, is the hyperlink. What makes a Web page different from a hard-copy publication (like books) is that a user of a Web page can navigate freely from one hyperlink to another. In the absence of hyperlinks, the means of connection with individual Web pages is lost, or a dead end is reached even if the page can somehow be accessed. Hyperlinks disseminate information by connecting with important pages with each other and making pages more valuable. Furthermore, links that are made up of keywords or information that is similar to what is found in a page being viewed help Internet users gather information more efficiently.

What does “mediajam mini” bring?

“mediajam mini” adopts the SaaS (ASP) model and inherits 80% of the functionality of mediajam. With mediajam mini, links between wide varieties of content, that are just limited to news content can be created in order to construct a site. At our company, we hope that mediajam mini will be used by many people, in particular, by people who plan and propose the layout of Web sites, professionals involved in site administration, and owners of sites and Internet users. I would like to explain in detail the merits of introducing mediajam mini.

(1) Possibility for cooperating between site contents and pages

Links are automatically created between Web pages collected by mediajam mini on the basis of interrelated information. The pages are analyzed by mediajam mini and inter-related or similar pages are connected in succession. Interconnections existing between Web pages that an information publisher did not notice can be established through the normal process of updating a Web site. In other words, links are created between pages in the bottom layer of a Web site, for assisting users in gathering information. For example, “Videocast,” which is one of our company's services, is related to the following pages in a Web site:

Moreover, related intra-site links to CMS Solution (eZ Publish) include

  • eZ Publish service development.

When Web pages that contain information related to videocast and CMS content are added to our site, and the related links are updated automatically.

(2) Operation beyond site and platform boundaries through the conversion of data format

Information provided as RSS feed it can help provide information to subscribed users as well as increase the chances that the information will be reused within and outside the Web site. These benefits can be obtained through RSS since it represents “Web site content summarized into a format that can be used by browsers and programs,” which makes it easier to automatically publish the latest information on related or collaboration sites.

Let me introduce, as an example, the mobile site of mediajam. The RSS feed of “main news” is read by the mobile site's average RSS reader at the other server of mediajam. The reader tries to read the RSS feed every 15 minutes, and the synchronized content is between the Web sites for PCs and cell phones automatically.

(3) Specialized Web sites contribute to the efficient gathering of information

Unlike Google's search results that comprise information gathered from an arbitrary number of public Web pages, a site constructed with mediajam mini is comprised of information obtained from specific pages, and users can search and browse information on the site.

(4) Users can obtain required information at once

Keywords (or tags) are added to the pages aggregated by mediajam mini and related links are added automatically; individual Web pages are generated for each keyword, making it possible to gather related information at once. Moreover, on a particular page, related or similar content is listed, which makes the search process more efficient for users as compared to one where they have to close a page and perform another search. Further, listing of related content on one page enables multilateral information gathering which is not possible through keyword-based searches.

(5) Continuous gathering of information is possible

mediajam mini can generate RSS feed by any keyword or search result and automatically update the feed. According to the queries made by a user, “update” information is continually sent and continuous gathering of information is possible.

mediajam wants to lessen the inconvenience for Internet users. Therefore, we are going to try and incorporate a better structure and functionality into it and thus improve our service.

For more information on our services, timeframes and estimates, as well as examples of our work, please feel free to be in touch.