add directory study

This commit is contained in:
gohigh
2024-02-19 00:25:23 -05:00
parent b1306b38b1
commit f3774e2f8c
4001 changed files with 2285787 additions and 0 deletions

View File

@@ -0,0 +1,108 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Linux Cross-Reference</TITLE>
</HEAD>
<BODY BGCOLOR=WHITE>
<H1 ALIGN=CENTER>
Cross-Referencing Linux<BR>
<A HREF="http:source/">
<I>Browse the code</I></A></H1>
<HR><H2>Motivation</H2>
The Linux Cross-Reference project is the testbed application of a
general hypertext cross-referencing tool. (Or the other way around.)
<P>
The main goal of the project is to create a versatile
cross-referencing tool for relatively large code repositories. The
project is based on stock web technology, so the codeview client may
be chosen from the full range of available web browsers. On the
server side, any Unix-based web server with cgi-script capability
should do nicely.
<P>
The main feature of the indexer is of course the ability to jump
easily to the declaration of any global identifier. Indeed, even all
<I>references</I> to global identifiers are indexed. Quick access to
function declarations, data (type) definitions and preprocessor macros
makes code browsing just that tad more convenient. At-a-glance
overview of e.g. which code areas that will be affected by changing a
function or type definition should also come in useful during
development and debugging.
<P>
Other bits of hypertextual sugar, such as e-mail and include file
links, are provided as well, but is on the whole, well, sugar. Some
minimal visual markup is also done. (Style sheets are considered as a
way to do this in the future.)
<HR><H2>Technicalities</H2>
The index generator is written in <A HREF="http://www.perl.org">Perl</A>
and relies heavily on Perl's regular expression facilities. The
algorithm used is very brute force and extremely sloppy. The
rationale behind the sloppiness is that too little information renders
the database useless, while too much information simply means the
users have to think and navigate at the same time.
<P>
The Linux source code, with which the project has initially been
linked, presents the indexer with some very tough obstacles.
Specifically, the heavy use of preprocessor macros makes the parsing a
virtual nightmare. We want to index the information in the
preprocessor directives as well as the actual C code, so we have to
parse both at once, which leads to no end of trouble. (Strict parsing
is right out.) Still, we're pretty satisfied with what the indexer
manages to get out of it.
<P>
There's also the question of actually broken code. We want to
reasonably index all code portions, even if some of it is not entirely
syntactically valid. This is another reason for the sloppiness.
<P>
There are obviously disadvantages to this approach. No scope checking
is done, and the most annoying effect of this is mistaking local
identifers for references to global ones with the same name. This
particular problem (and others) can only be solved by doing (almost)
full parsing. The feasibility of combining this with the fuzzy way
indexing is currently done is being looked into.
<P>
An identifier is a macro, typedef, struct, enum, union, function,
function prototype or variable. For the Linux source code between
50000 and 60000 identifiers are collected. The individual files of the
sourcecode are formatted on the fly and presented with clickable
identifiers.
<P>
It is possible to search among the identifiers and the entire
kernel source text. The freetext search is implemented using <A
HREF="http://glimpse.cs.arizona.edu">Glimpse</A>, so all the
capabilities of Glimpse are available. Especially the regular expression
search capabilities are useful.
<HR><H2>Availability</H2>
The code for the indexer is released under the
<A HREF="http://www.gnu.org">GNU</A>
<A HREF="http://www.gnu.org/copyleft/copyleft.html">Copyleft</A>
license. Go to <A HREF="http://lxr.linux.no">LXR main site</A> to
get the latest version.
<HR><H2>Contacting the authors</H2>
We would very much like to receive feedback on this project. If you
find it useful or have suggestions on how to make improvements, feel
free to send us e-mail. We hope that this will be a useful tool, both
for experienced developers and beginners wanting to explore the Linux
sourcecode.
<HR>
<ADDRESS>
<A HREF="mailto:lxr@linux.no">
Arne Georg Gleditsch and Per Kristian Gjermshus</A>
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,37 @@
# Configuration file.
# Define typed variable "v", read valueset from file.
variable: v, Version, [/local/lxr/source/versions], [/local/lxr/source/defversion]
# Define typed variable "a". First value is default.
variable: a, Architecture, (i386, alpha, m68k, mips, ppc, sparc, sparc64)
# Define the base url for the LXR files.
baseurl: http://lxr/
# These are the templates for the HTML heading, directory listing and
# footer, respectively.
htmlhead: /local/lxr/http/template-head
htmltail: /local/lxr/http/template-tail
htmldir: /local/lxr/http/template-dir
# The source is here.
sourceroot: /local/lxr/source/$v/linux/
srcrootname: Linux
# "#include <foo.h>" is mapped to this directory (in the LXR source
# tree)
incprefix: /include
# The database files go here.
dbdir: /local/lxr/source/$v/
# Glimpse can be found here.
glimpsebin: /local/bin/glimpse
# The power of regexps. This is pretty Linux-specific, but quite
# useful. Tinker with it and see what it does. (How's that for
# documentation?)
map: /include/asm[^\/]*/ /include/asm-$a/
map: /arch/[^\/]+/ /arch/$a/

View File

@@ -0,0 +1,139 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Linux Cross-Reference</TITLE>
</HEAD>
<BODY BGCOLOR=WHITE>
<H1 ALIGN=CENTER>
Help doing searches<BR>
<A HREF="http:/source/">
<I>Browse the code</I></A></H1>
<I> This text is directly stolen from the Glimpse manual page. I have
tried to remove things that do not apply to the lxr searcher, but
beware, some things might have slipped through. I'll try to put
together something better when I get the time. For more information on
glimpse go to the <A HREF="http://glimpse.cs.arizona.edu">Glimpse
homepage</A>.</I>
<A NAME="Patterns"></A><H2>Patterns</H2>
<P>
glimpse supports a large variety of patterns, including simple
strings, strings with classes of characters, sets of strings,
wild cards, and regular expressions (see <A HREF="#Limitations">Limitations</A>).
<P> <H3>Strings</H3>
Strings are any sequence of characters, including the special symbols
`^' for beginning of line and `$' for end of line. The following
special characters (`$', `^', `*', `[', `^', `|', `(', `)', `!', and
`\' ) as well as the following meta characters special to glimpse (and
agrep): `;', `,', `#', `&gt;', `&lt;', `-', and `.', should be preceded by
`\\' if they are to be matched as regular characters. For example,
\\^abc\\\\ corresponds to the string ^abc\\, whereas ^abc corresponds
to the string abc at the beginning of a line.
<P> <H3>Classes of characters</H3>
a list of characters inside [] (in order) corresponds to any character
from the list. For example, [a-ho-z] is any character between a and h
or between o and z. The symbol `^' inside [] complements the list.
For example, [^i-n] denote any character in the character set except
character 'i' to 'n'.
The symbol `^' thus has two meanings, but this is consistent with
egrep.
The symbol `.' (don't care) stands for any symbol (except for the
newline symbol).
<P> <H3>Boolean operations</H3>
Glimpse
supports an `AND' operation denoted by the symbol `;'
an `OR' operation denoted by the symbol `,',
a limited version of a 'NOT' operation (starting at version 4.0B1)
denoted by the symbol `~',
or any combination.
For example, pizza;cheeseburger' will output all lines containing
both patterns.
'{political,computer};science' will match 'political science'
or 'science of computers'.
<P><H3>Wild cards</H3>
The symbol '#' is used to denote a sequence
of any number (including 0)
of arbitrary characters (see <A HREF="#Limitations">Limitations</A>).
The symbol # is equivalent to .* in egrep.
In fact, .* will work too, because it is a valid regular expression
(see below), but unless this is part of an actual regular expression,
# will work faster.
(Currently glimpse is experiencing some problems with #.)
<P><H3>Combination of exact and approximate matching</H3>
Any pattern inside angle brackets &lt;&gt; must match the text exactly even
if the match is with errors. For example, &lt;mathemat&gt;ics matches
mathematical with one error (replacing the last s with an a), but
mathe&lt;matics&gt; does not match mathematical no matter how many errors are
allowed. (This option is buggy at the moment.)
<H3>Regular expressions</H3>
Since the index is word based, a regular expression must match words
that appear in the index for glimpse to find it. Glimpse first strips
the regular expression from all non-alphabetic characters, and
searches the index for all remaining words. It then applies the
regular expression matching algorithm to the files found in the index.
For example, glimpse 'abc.*xyz' will search the index for all files
that contain both 'abc' and 'xyz', and then search directly for
'abc.*xyz' in those files. (If you use glimpse -w 'abc.*xyz', then
'abcxyz' will not be found, because glimpse will think that abc and
xyz need to be matches to whole words.) The syntax of regular
expressions in glimpse is in general the same as that for agrep. The
union operation `|', Kleene closure `*', and parentheses () are all
supported. Currently '+' is not supported. Regular expressions are
currently limited to approximately 30 characters (generally excluding
meta characters). The maximal number of errors
for regular expressions that use '*' or '|' is 4.
<P>
<A NAME="Limitations"></A><H2>Limitations</H2>
The index of glimpse is word based. A pattern that contains more than
one word cannot be found in the index. The way glimpse overcomes this
weakness is by splitting any multi-word pattern into its set of words
and looking for all of them in the index.
For example, <I>'linear programming'</I> will first consult the index
to find all files containing both <I>linear</I> and <I>programming</I>,
and then apply agrep to find the combined pattern.
This is usually an effective solution, but it can be slow for
cases where both words are very common, but their combination is not.
<P>
As was mentioned in the section on <A HREF="#Patterns">Patterns</A> above, some characters
serve as meta characters for glimpse and need to be
preceded by '\\' to search for them. The most common
examples are the characters '.' (which stands for a wild card),
and '*' (the Kleene closure).
So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"
will not, and "glimpse ab*de" will not match ab*de, but
"glimpse ab\\*de" will.
The meta character - is translated automatically to a hypen
unless it appears between [] (in which case it denotes a range of
characters).
<P>
There is no size limit for simple patterns and simple patterns
within Boolean expressions.
More complicated patterns, such as regular expressions,
are currently limited to approximately 30 characters.
Lines are limited to 1024 characters.
<P>
<HR>
<ADDRESS>
<A HREF="mailto:lxr@linux.no">
Arne Georg Gleditsch and Per Kristian Gjermshus</A>
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,15 @@
<table border=0 cellspacing=4>
<tr valign=middle>
<td>
<td nowrap><b>Name</b>
<td nowrap><b>Size</b>
<td nowrap><b>Last modified (GMT)</b>
<td nowrap><b>Description</b>
$files{
<tr valign=middle>
<td nowrap>$iconlink
<td nowrap>$namelink
<td nowrap align=right>$filesize{$bytes bytes}
<td nowrap>$modtime
<td>$description{<i>$desctext</i>}}
</table>

View File

@@ -0,0 +1,27 @@
<!doctype html public "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>$title</title>
<base href="$baseurl">
</head>
<body bgcolor=white>
<div align=center>
$modes{ ~ <b>[</b>&nbsp;$modelink&nbsp;<b>]</b>} ~
</div>
<h1 align=center>
<a href="http://www.linux.org/">
Linux</a>
<a href="http:blurb.html">
Cross Reference</a><br>
$banner
</h1>
<div align=center>
$variables{
<b>$varname:</b>
$varlinks{ ~ <b>[</b>&nbsp;$varvalue&nbsp;<b>]</b>} ~
<br>}
</div>
<hr>

View File

@@ -0,0 +1,11 @@
<hr>
<div align=center>
$modes{ ~ <b>[</b>&nbsp;$modelink&nbsp;<b>]</b>} ~
</div>
<hr>
This page was automatically generated by the
<a href="http:blurb.html">LXR engine</a>.
<br>
Visit the <a href="http://lxr.linux.no/">LXR main site</a> for more
information.

Binary file not shown.

After

Width:  |  Height:  |  Size: 230 B