Banner
Home Page
Services
Benchmarks
Reports
Web Server Reports
Directory Server Reports
Proxy/Cache Reports
White Papers
BangForBuck
Company
Contact Us
Performance Reports

Performance Benchmark Test of
the Netscape FastTrack Beta 3 Web Server

Chuck Karish
Mindcraft, Inc.

Michael Blakeley
Netscape Communications

96/05/16 11:19:28

Abstract

This document describes the results of performance tests conducted on Netscape FastTrack Beta 3 running under Windows NT 3.51. Tests were run using the WebStone 1.1 test suite. We provide documentation of our methods to facilitate the replication of these results by any interested party.

Netscape FastTrack Beta 3 running under Windows NT 3.51 on a single-processor Hewlett-Packard NetServer 5/133 LS2 produced up to 263 connections per second for a 100% HTML load, up to 25 connections per second for a 100% CGI load, and up to 78 connections per second for a 100% NSAPI load.

Modifications to the NSAPI server code that are consistent with the published description of the intended load allowed FastTrack Beta 3 to produce up to 134 connections per second. We have included the source code used for these tests, so that interested parties can reproduce our results.

We also present results of the same tests run with both processors enabled. Performance increases with the second processor enabled were 30% for the all-HTML load, 60% for the all-CGI load, and 23% to 52% for the different versions of the all-NSAPI load.

Our best results for the HTML and CGI loads on a single processor system exceed the performance of the Microsoft Internet Information Server (IIS) by 3% and 14%, respectively, by comparison with the data published the Microsoft benchmarking report. Our results for the NSAPI load range from 82% to 141% of the best reported IIS performance. Since the Microsoft report does not reveal the code used to exercise the Microsoft proprietary API, it is not clear which of these numbers is directly comparable to their result.


Introduction

Netscape FastTrack is Netscape's latest HTTP server for Windows NT, the successor to Netscape Communications Server 1.13. This report is intended to demonstrate the performance capabilities of FastTrack in a way that is comparable to previous tests of Microsoft's IIS, reported in http://www.haynes.com/haynes1/infoserv/haynes1.htm ("the Microsoft benchmarking report").

Care must be taken in comparing results from different runs of the WebStone 1.1 test suite. The test suite is highly configurable, so the reader must examine the configuration to assess whether different tests are measuring the same thing. Another problem is that the tests for API performance are defined only through inclusion of sparsely-documented sample code for a single API.

We have adopted the approach of staying close to the default configuration settings for WebStone 1.1.

Since the Microsoft benchmarking report. does not give the code used to exercise the Microsoft proprietary API, it is difficult to assess whether their published results are directly comparable to those obtained by running the NSAPI server code that is provided in the WebStone 1.1 test suite. We give four sets of results for NSAPI testing. One set was obtained using the unmodified nsapi-send.c module included with WebStone 1.1. For the other NSAPI runs we used modified versions of nsapi-send.c (Appendix 2). These versions had several system calls removed, so a larger proportion of the workload was in API calls and transfers.

The different versions of the code represent different approaches that might reasonably be used to support a new API if the comment header in nsapi-send.c were taken to be a specification for the test. It is impossible to say which of these approaches most closely replicates those used for the Microsoft benchmarking report,

Test Configuration

The test configuration was chosen to match as closely as possible the configuration used for the tests sponsored by Microsoft and documented in the Microsoft benchmarking report.

Test Software

The load generator for this test was SGI's WebStone 1.1 benchmark. WebStone source and information about WebStone are available from http://www.sgi.com/Products/WebFORCE/WebStone. Some bug fixes were made to the WebStone source code (Appendix 1). No significant changes were made to the load generation modules, except that the NSAPI load module was run in both unmodified and modified form (Appendix 2). We are confident that our results may be compared reliably to other benchmark runs that are done using the unmodified WebStone 1.1 source.

We ran each test from 16 client processes to 128, in steps of 16, with a test time of 15 minutes.

Test Data

The data files used for the static HTML testing were the default "Silicon Surf" fileset, distributed as filelist.ss with the WebStone 1.1 benchmark.

This static HTML fileset is intended to be representative of real-world server loads. It was designed based on analysis of the access logs of SGI's external Web site, http://www.sgi.com. Netscape's own analysis of logs from other commercial sites indicates that Silicon Surf access patterns are fairly typical for the Web.

The Silicon Surf model targets the following characteristics:

  • 93% of accessed files are smaller than 30 KB.
  • Average accessed file is roughly 7 KB.

We also tested the performance of the FastTrack server running against the CGI and NSAPI workloads distributed with WebStone 1.1. The CGI results presented here were obtained using the unmodified Webstone 1.1 CGI code. Four sets of NSAPI performance data are presented: from the unmodified Webstone 1.1 NSAPI server module, and from each of three modified versions of the source from which system calls unrelated to API performance have been removed. The modified code is presented in Appendix 2.

Web Server Software

The HP NetServer hardware ran the Netscape FastTrack server, Beta 3. The server was started automatically at boot time.

Before each test series, corresponding to the data in one table in the Test Results section, the server log files were deleted and the system rebooted.

All test runs were done with the server console idle (no userlogged in).

Web Server System

Hardware

For our tests, we chose to replicate the server used by Microsoft for their IIS testing. We modified the system's run-time configuration file (boot.ini) so that only one Pentium CPU and 32 MB of RAM would be used for our testing. We also present the results of running with both processors enabled.

Hewlett-Packard NetServer 5/133 LS2
Dual Intel Pentium CPUs at 133MHz
1MB L2 cache per CPU
64MB RAM
Digital EtherWorks 10/100 PCI adapter
2 - Adaptec AIC-7870 SCSI controllers
2 - Seagate ST32550WC 2GB drives

Operating System

We tested the HP NetServer with MS Windows NT 3.51 Server. We also installed Service Pack 3, as was done for the Microsoft tests.

Tuning

The system ran error logging and access logging. DNS reverse name lookups were disabled to keep DNS server performance from affecting the tests of Web server performance.

Other system configuration parameters used are listed in Appendix 3.

Web Client Systems

In order to cause the Netscape FastTrack server to use all available CPU cycles on the server system, we used eight SGI Indy workstations as WebStone client hosts. One of the client systems also served as the Webmaster, controlling the WebStone driver.

Each client was a 133 MHz R4600 Indy with 32 MB of RAM. The client machines ran IRIX 5.3 with SGI's recommended kernel and network patches, 670 and 676.

Network

The tests described in this paper were performed on a LAN that was quiescent except for the test traffic. All hosts were connected to a single Grand Junction FastSwitch 2800. The server was connected to a 100BaseTX interface, which was internally bridged by the FastSwitch 2800 to eight clients connected via switched 10BaseT interfaces.

The maximum average throughput observed in our testing was 18.46 Mbit/sec, which is well within the capabilities of the switch. Network bandwidth was thus not a limiting factor in our testing.


Test Results

Each table below corresponds to a series of runs with the test parameters set as described in the table title. The different versions of the NSAPI code are described in Appendix 2.

Terms used in the data tables:

Clients
Number of processes simultaneously requesting Web services from the server.
Connections per second
Average number of client/server connections created and destroyed.
Errors per second
Error rate for this run.
Latency
Average client wait for data to be returned.
Throughput
Average gross data transfer rate, in megabits per second.

Data

FastTrack Beta 3, One Processor, 100% HTML
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 262 0.000 0.06 14.1
32 263 0.000 0.12 14.1
48 263 0.000 0.18 14.0
64 263 0.000 0.24 14.1
80 261 0.000 0.31 13.9
96 259 0.000 0.37 13.9
112 257 0.003 0.43 14.1
128 256 0.003 0.50 13.7
FastTrack Beta 3, One Processor, 100% CGI
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 25 0.000 0.63 1.4
32 25 0.000 1.30 1.3
48 25 0.000 1.95 1.3
64 24 0.000 2.61 1.3
80 24 0.000 3.30 1.3
96 24 0.000 3.92 1.3
112 24 0.000 4.64 1.3
128 24 0.000 5.24 1.3
FastTrack Beta 3, One Processor, 100% NSAPI (Version C)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 134 0.000 0.12 7.0
32 131 0.000 0.24 6.9
48 132 0.000 0.36 7.0
64 132 0.000 0.48 6.9
80 132 0.000 0.61 6.9
96 132 0.000 0.73 7.0
112 132 0.000 0.85 6.9
128 132 0.000 0.97 6.8
FastTrack Beta 3, One Processor, 100% NSAPI (Version B)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 106 0.002 0.15 5.5
32 104 0.000 0.31 5.4
48 104 0.000 0.46 5.4
64 105 0.000 0.61 5.4
80 105 0.000 0.76 5.4
96 105 0.000 0.92 5.4
112 104 0.000 1.08 5.5
128 104 0.000 1.23 5.4
FastTrack Beta 3, One Processor, 100% NSAPI (Version A)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 82 0.001 0.19 4.3
32 82 0.000 0.39 4.3
48 81 0.000 0.59 4.3
64 82 0.000 0.78 4.3
80 82 0.000 0.98 4.3
96 82 0.000 1.18 4.3
112 82 0.000 1.36 4.2
128 82 0.000 1.56 4.3
FastTrack Beta 3, One Processor, 100% NSAPI (Webstone 1.1)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 79 0.002 0.20 4.1
32 77 0.000 0.41 4.1
48 78 0.000 0.61 4.1
64 78 0.000 0.82 4.1
80 79 0.000 1.02 4.1
96 77 0.000 1.24 4.1
112 78 0.000 1.44 4.1
128 76 0.000 1.68 4.0
FastTrack Beta 3, Two Processors, 100% HTML
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 330 0.000 0.05 17.7
32 338 0.000 0.09 18.0
48 343 0.001 0.14 18.2
64 342 0.000 0.19 18.5
80 342 0.004 0.23 18.2
96 318 0.009 0.30 17.3
112 301 0.006 0.37 16.3
128 301 0.007 0.42 16.1
FastTrack Beta 3, Two Processors, 100% CGI
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 40 0.000 0.40 2.1
32 39 0.000 0.83 2.0
48 38 0.000 1.26 2.0
64 38 0.000 1.69 2.0
80 38 0.000 2.12 2.0
96 38 0.000 2.54 2.0
112 38 0.000 2.98 2.0
128 38 0.000 3.39 1.9
FastTrack Beta 3, Two Processors, 100% NSAPI (Version C)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 204 0.001 0.08 10.6
32 202 0.000 0.16 10.4
48 200 0.000 0.24 10.5
64 200 0.000 0.32 10.4
80 200 0.000 0.40 10.3
96 200 0.000 0.48 10.5
112 200 0.000 0.56 10.5
128 200 0.002 0.64 10.5
FastTrack Beta 3, Two Processors, 100% NSAPI (Version B)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 140 0.001 0.11 7.4
32 139 0.000 0.23 7.3
48 138 0.000 0.35 7.3
64 139 0.000 0.46 7.3
80 139 0.000 0.58 7.3
96 139 0.000 0.69 7.3
112 139 0.000 0.80 7.2
128 138 0.000 0.93 7.3
FastTrack Beta 3, Two Processors, 100% NSAPI (Version A)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 102 0.000 0.16 5.3
32 101 0.000 0.32 5.2
48 101 0.000 0.47 5.2
64 101 0.000 0.63 5.2
80 101 0.000 0.79 5.2
96 100 0.000 0.96 5.3
112 100 0.000 1.12 5.3
128 101 0.007 1.26 5.2
FastTrack Beta 3, Two Processors, 100% NSAPI (Webstone 1.1)
Clients Connections
per second
Errors
per second
Latency
(seconds)
Throughput
(Mbit/sec)
16 95 0.000 0.17 5.1
32 94 0.000 0.34 5.0
48 95 0.000 0.50 5.0
64 95 0.000 0.67 5.1
80 95 0.000 0.84 5.0
96 95 0.000 1.01 5.1
112 97 0.000 1.16 5.0
128 96 0.000 1.34 5.0


Test Certification

This report was written by Mindcraft, Inc. to describe the results of tests conducted between April 12, 1996 and April 16, 1996 at the premises of Netscape Communications Corporation in Mountain View, California. Mindcraft certifies that the tests were run under the configuration described in this report and that the test results were as given here.

Permission is granted to reproduce this report on the condition that it is reproduced in full.


Appendix 1: Changes to Webstone 1.1 Source

The following output from diff illustrates our changes:

WebStone-1.1/src/cgi-send.c:

Use the re-entrant version of rand() so locking in the IRIX runtime library doesn't serialize execution.


59a60,62
>#ifdef IRIX
>      buffer[index] = rand_r() %26 + 63;
>#else
61c64
<
---
>#endif

WebStone-1.1/src/get.c:

Give more comprehensible error messages.


134,135c134,135
<      return(returnerr("Did not receive full header: %sn",
<      strerror(errno)));
---
>      return(returnerr("Did not receive full header: %s; url %s;
totalbytesread %d; numbytesread $d; n",
>      strerror(errno), url, totalbytesread, numbytesread));
251,252c251,257
<      return(returnerr("Read returns %d, error %d: %sn", numbytesread,
<               errno, strerror(errno)));
---
>      /* mikeb */
>      if (timeexpired) {
>      return(-1);
>      } else {
>          return(returnerr("Read returns %d, error %d: %s totalbytes is %d,
"url is %sn", numbytesread,
>               errno, strerror(errno), totalbytesread+1, url));
>     } /* end mikeb */

WebStone-1.1/src/webclient.c:

Wait for 10 milliseconds rather than for one second between requests. This reduces the wall clock time to run the benchmark without changing the results.


20,29d19
</**************************************************************************
< *                                                                        *
< *         Copyright (C) 1995 Silicon Graphics, Inc.                      *
< *                                                                        *
< *  These coded instructions, statements, and computer programs  where    *
< *  deveolped by SGI for public use.  If anychanges are made to this code *
< *  please try to get the changes back to the author.  Feel free to make  *
< *  modfications and changes to the code and release it.                  *
< *                                                                        *
< **************************************************************************/
645c635,641
<  sleep(1);
---
>  /* sleep(1); */
>  {
>      struct timeval tv;
>      tv.tv_sec = 0;
>      tv.tv_usec = 10;
>      select(0, 0, 0, 0, &tv);  /* MB 2/28/96 */
>  }

WebStone-1.1/src/webmaster.c:

Use shorter sleeps for synchronization, as for webclient.c.


531,533d530
<        cnt++;
<    }
<    fprintf(stdout,"n");
549a547,550
>        cnt++;
>    }
>    fprintf(stdout,"n");
>
587c588,597
<    sleep(1);
---
>    /* sleep(1);  */
>       {
>                struct timeval tv;
>                tv.tv_sec = 0;
>                tv.tv_usec = totalnumclients;
>                fprintf(stdout, "Waiting %dn", totalnumclients);
>                fflush(stdout);
>                select(0, 0, 0, 0, &tv);  /* MB 2/28/96 */
>       }
>
635c645,652
<    sleep(1);
---
>    /* sleep(1); */
>        {
>                struct timeval tv;
>                tv.tv_sec = 0;
>                tv.tv_usec = 10;
>        select(0, 0, 0, 0, &tv);  /* MB 2/28/96 */
>        }
>
691c708,715
<      sleep(1);
---
>      /* sleep(1); */
>        {
>                struct timeval tv;
>                tv.tv_sec = 0;
>                tv.tv_usec = 10;
>        select(0, 0, 0, 0, &tv);  /* MB 2/28/96 */
>        }
>
748d771
<            /* sleep(2); */
773c796,803
<          sleep(1);
---
>          /* sleep(1);  */
>        {
>                struct timeval tv;
>                tv.tv_sec = 0;
>                tv.tv_usec = 10;
>        select(0, 0, 0, 0, &tv);  /* MB 2/28/96 */
>        }
>


Appendix 2: Optimizing nsapi-send.c

Here is the description of the Webstone 1.1 NSAPI workload, from the comment header of nsapi-send.c:


 * Send random bits file
 *
 * Once this service function is installed, any file with the extension
 * "dyn-send" will be serviced with this function.  An optional query
 * string may be passed to alter the amount of data in the response.

The code in nsapi-send.c does a number of things that are not spelled out in this comment. It would be reasonable for the implementor of a similar test for a different proprietary API to omit these extra calls. Here are three approaches that could be taken:

Version A

This version uses an automatically-allocated array to hold the data, omitting calls to malloc() and free(). It also uses static text to create the HTTP header instead of using API calls to build the string.

Here are the differences between the WebStone 1.1 nsapi-send.c and the version used for the Version A runs:


32a33,35
>#ifndef WIN32
>#include <stdio.h>
>#include <stdlib.h>
35d37
<#include "frame/req.h"
38a41,42
>#endif
>#include "frame/req.h"
41c45
<#define MALLOC_FAILURE  "Out of memory"
---
>#define HEADERS       "HTTP/1.0 200 OK\r\nContent-type: text/html\r\n\r\n"
42a47,49
>#ifdef WIN32
>__declspec(dllexport) 
>#endif
46c53
<        char *buffer;
---
>        char buffer[sizeof(HEADERS) + 204800 + 1];
48c55,56
<        int index;
---
>  unsigned int maxindex;
>        unsigned int index;
56d63
<                if ( !strncmp(query_string, "size=", 5) )
58,59d64
<                else
<                        filesize = FILE_SIZE;
62,64c67
<        /* Set the context type */
<        param_free(pblock_remove("content-type", rq->srvhdrs));
<        pblock_nvinsert("content-type", "text/plain", rq->srvhdrs);
---
>  memcpy(&buffer, HEADERS, sizeof(HEADERS)-1);
66,76d68
<        /* Start the protocol response */
<        protocol_status(sn, rq, PROTOCOL_OK, NULL);
<        protocol_start_response(sn, rq);
<
<        /* Allocate the output buffer */
<        if ( !(buffer = (char *)malloc(filesize)) ) {
<                net_write(sn->csd, MALLOC_FAILURE,
<strlen(MALLOC_FAILURE));
<                return REQ_ABORTED;
<        }
<
78c70,71
<        for (index=0; index <filesize; index++)
---
>  maxindex = sizeof(HEADERS) + filesize;
>        for (index=sizeof(HEADERS); index <(maxindex); index++)
79a73,75
>#ifdef IRIX
>      buffer[index] = rand_r() % 26 + 63;
>#else
80a77
>#endif
83c80
<        if (net_write(sn->csd, buffer, filesize) == IO_ERROR)
---
>        if (net_write(sn->csd, buffer, sizeof(HEADERS)-1+filesize) == IO_ERROR)
85,86d81
<
<        free(buffer);

Version B

The optimization for the Version B runs goes a step farther, by initializing the data with the modulus operator rather than by calls to rand(). Here are the diffs between the Version A and the Version B code:


74c74
<              buffer[index] = rand_r() % 26 + 63;
---
>              buffer[index] = index % 26 + 63;
76c76
<                buffer[index] = rand() %26 + 63;
---
>                buffer[index] = index % 26 + 63;

Version C

For the Version C runs the initialization loop was removed. Here are the diffs between the Version A code and the Version C code:


55,56d54
<      unsigned int maxindex;
<        unsigned int index;
68,77d65
<
<        /* Generate the output */
<      maxindex = sizeof(HEADERS) + filesize;
<        for (index=sizeof(HEADERS); index <(maxindex); index++)
<                /* generate random characters from A-Z */
<#ifdef IRIX
<              buffer[index] = rand_r() % 26 + 63;
<#else
<                buffer[index] = rand() %26 + 63;
<#endif

Appendix 3: Operating System Configuration

System Identification

From "About Program Manager":

	NT Program Manager Version 3.51 Build 1057 Service Pack 3
	Product ID: 227-075-100
	Total Physical Memory: 32,176 KB
	Processor: x86 Family 5 Model 2 Stepping 11
	Identifier: AT/AT Compatible

Service Pack 3 contains special I/O calls for Web servers.

Run Time Parameters

	Tasking:	"Best Interactive Response Time"
	Paging:		76 MB start, 256 MB max.
	Registry Size:	2 MB start, 12 MB max.

        From Registry Editor:

        HKEY_LOCAL_MACHINE
            SYSTEM
                Current Control Set
                    Services
                        inet info
                            parameters
                                Listen Backlog: 1024

Services

	Alerter				Auto
	Directory Replicator		Manual
	Event Log			Auto
	FTP Publishing			Auto
	Gopher Publishing		Manual
	License Logging			Auto
	Messenger			Auto
	Netscape FastTrack httpd-192.168.1.14	Auto
	Microsoft DHCP Server		Manual
	NT LM Security Support		Manual, Started
	OLE				Manual
	RPC Locator			Manual
	RPC Service			Manual, Started
	Schedule			Manual
	UPS				Manual
	Workstation			Auto
	WWW Publishing			Manual

All other services were disabled.


Copyright © 1997-98. Mindcraft, Inc. All rights reserved.
Mindcraft is a registered trademark of Mindcraft, Inc.
For more information, contact us at: info@mindcraft.com
Phone: +1 (408) 395-2404
Fax: +1 (408) 395-6324