Release notes for htcheck-1.1 - 18 Feb 2002
- HTTP code now handles the language negotiation, through the
'accept-language' attribute of the configuration file
- More robust support of cookies with the management of the domain attribute
- Cookies are now stored in the database (Cookies table)
- builds under GCC3
- fixed a bug regarding the BASE tag handling
- fixed some other minor bugs
- PHP Interface:
- German language file added (thanks to Michael Stenitzer <stenitzer@eva.ac.at>)
- some Web structure mining indexes have been added
- display of the content language of a URL as given by the server
- cookies simple report in the database home page
- some cosmetic changes
- code now has only the 'php' extension and works without the ASP tags setting
Release notes for htcheck-1.1.0b9-klunk - 25 Jun 2001
- Database structure now improved and compressed; less storage
space and more speed in queries.
- Indexes of the Link table are created at the end of the crawl,
improving performances, and controled by the 'url_index_length' parameter
- 'url_index_length' configuration attribute has been added: this
attribute allows the user to control the length of the index
for the Url field in the Schedule and Url tables. This attribute
may affect the performance of the crawls, as long as the length
of an index can either slow down or speed up the spidering process.
- Cookies summary (with -s option)
- POSIX standard: --version and --help compatible (with getopt_long)
- libtool 1.4 support
- fixed many bugs regarding the parser of the spider, which is now more robust
- cleaned code inside the 'core' source files
- PHP Interface:
- Automatic and manual choosing of ht://Check databases
- Javascript URLs query support
- Description of a connection trouble when a URL is not retrieved
- Fixed minor bugs and done cosmetic changes
Release notes for htcheck-1.1.0b8-muttley - 27 Apr 2001
- Finally runs on Solaris
- MySQL 3.23.xx users: now datetime fields are stored properly
- Link to e-mail are now stored and can be seen
- Link with a 'file:/' call are now considered as errors
- User Agent now shows the version and the platform
- Fixed a bug regarding the HTML parser with (very) malformed tags
- Fixed many minor bugs
- PHP Interface:
- Enhancements: retrieve e-mail links
- Fixed some bugs
Release notes for htcheck-1.1.0b7-anaconda - 28 Mar 2001
- Fixed library versioning
- Man page now provided (thanks to Marco Nenciarini <mnencia@prato.linux.it>
- Static linking now works fine
- New library architecture in order to provide no conflict with ht://Dig; they
are all 'package' libs instead of global libs.
- 'optimize_db' has now been set to false by default
- PHP Interface:
- PHP3 compatibility issued
- removed .inc extension as PHP source
Release notes for htcheck-1.1.0b6-zizou - 12 Mar 2001
- HTTP Cookies support now enabled
- New type of link result: 'Not authorized'
- Fixed configuration error for load_mysql_defaults function and
raised by Free BSD users.
- disable_cookies attribute added in the configuration
- Update of the HtDateTime class according to ht://Dig's one
- PHP interface:
- better output
- added images for link results
- bug in qryurls.php and listlinks.php has been fixed
- css file added for content visualization
- dynamic language detection (english or italian for now)
- small bugs fixes
Release notes for htcheck-1.1.0b5-flukekelso - 24 Jan 2001
- Fixed a bug in the database initialization
- Default MySQL authentication (through /etc/my.cnf or ~/.my.cnf file)
- 'OBJECT' HTML tag now correctly parsed
- Basic HTTP Authentication enabled
- PHP interface improvements:
- English and italian languages available
- Get info regarding URLs by choosing through a form lots of parameters
(i.e. URL, status code values, content-type, size and title if present)
- Other small enhancements
- Documentation started
- Fixed other minor bugs
Release notes for htcheck-1.1.0b4-utero - 07 Sep 2000
- Now ht://Check uses MySQL's option file in order to get connection
information such host, user, password, port and socket.
- HTTP Proxy support (to be tested more deeply)
- PHP interface's improvements:
- It's now possible to look for broken links and anchors not found by
using the form in listlinks.php. Filter can now be made with the
LinkResult as well as the LinkType (and the referencing and referenced
URLs like before).
- Fixed a bug regarding SGML entities with anchors and the "#top" anchor
is now considered as valid.
- Sources have now been cleaned from most of the compilation warnings.
Release notes for htcheck-1.1.0b3-utero - 22 Aug 2000
- Better summary of the broken links (more complete and reliable).
- HTML anchors check is now performed and a field (LinkResult) has
been added. It contains info about the link, if it's ok, broken,
redirected and if a anchor is present and not found it warns about it.
- Summary of anchors not found, enabled or disabled through the
configuration attribute 'summary_anchor_not_found'.
- The table 'htCheck' has been added to the database: its purpose
is to store the general info of the crawl (user, start time, end
time, etc ...).
- Added 'optimize_db' configuration parameter for optimizing the tables
of the database. Default is true.
- Added 'sql_big_table_option' configuration parameter for performing
huge queries. Default is true.
- Fixed the bug regarding HTTP persistent connections with a preemptive
HEAD call before the GET.
- HTTP redirections are now treated as special links and stored into
the link table with a 'Redirection' LinkResult flag.
- Referer management now is done right.
- Hop count management and storing added.
- Added 'max_hop_count' configuration parameter for limiting the crawl
to a certain distance from the starting URL.
- PHP Interface:
- The configure and make system has been modified in order to manage the
php scripts. A new configuration option has been issued (--with-php-dir=DIR)
and the make install procedure now look after the scripts too.
- Page for querying the links retrieved, with a form which we
can set filters through, regarding both the source and the
destination URLs (with like and not like SQL statements);
- Page for dropping a database.
- Italian language added (include/italian.inc - See the INSTALL file)
Release notes for htcheck-1.1.0b2-utero - 08 Aug 2000
- A simple PHP interface has been added. You need PHP (either as a
standalone CGI interpreter or - if you have Apache - as an Apache
module) compiled with the mysql add-on module. For its installation
look at the INSTALL file.
- The 'Link' table contains another field, the 'Anchor': its purpose
is to store the 'token' after the '#' char in a link (for example in
<A href="URL#anchorname">, it contains 'anchorname).
Release notes for htcheck-1.1.0b1-utero - 12 May 2000
A more stable version, but tested only on a RedHat 6.x system (see README file).
This new features have been added:
- Now it's possible to determine if a link is normal (like A href ones), that is
to say the user has to click in order to get it, or is direct (like IMG src)
that is to say it's automatically loaded (potentially) by the user's browser.
- Added a field to the Url table which contains the size to be added at load
time in order to obtain the total weight of the document: it contains the sum
Release notes for htcheck-1.1.0b-utero - 5 May 2000
This is the very first release. It can be used for checking broken links.
Here are the main features:
- Access to a MySQL database (in this form: user@localhost, where user
is the PID owner).
- HTTP 1.1 connections working with persistent connections choose
- At the end, show of broken links, servers seen and content-types encountered.
- Creation of these tables in the database: Url, Server, Link, Schedule,
HtmlStatement, HtmlAttribute.