Sortix 1.1dev ports manual
This manual documents Sortix 1.1dev ports. You can instead view this document in the latest official manual.
WGET(1) | GNU Wget | WGET(1) |
NAME
Wget - The non-interactive network downloader.SYNOPSIS
wget [ option]... [URL]...DESCRIPTION
GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user's presence, which can be a great hindrance when transferring a lot of data. Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as "recursive downloading." While doing that, Wget respects the Robot Exclusion Standard ( /robots.txt). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing. Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.OPTIONS
Option Syntax
Since Wget uses GNU getopt to process command-line arguments, every option has a long form along with the short one. Long options are more convenient to remember, but take time to type. You may freely mix different option styles, or specify options after the command-line arguments. Thus you may write:wget -r --tries=10 http://fly.srk.fer.hr/ -o logThe space between the option accepting an argument and the argument may be omitted. Instead of -o log you can write -olog. You may put several options that do not require arguments together, like:
wget -drc <URL>This is completely equivalent to:
wget -d -r -c <URL>Since the options can be specified after the arguments, you may terminate them with --. So the following will try to download URL -x, reporting failure to log:
wget -o log -- -xThe options that accept comma-separated lists all respect the convention that specifying an empty list clears its value. This can be useful to clear the .wgetrc settings. For instance, if your .wgetrc sets "exclude_directories" to /cgi-bin, the following example will first reset it, and then set it to exclude /~nobody and /~somebody. You can also clear the lists in .wgetrc.
wget -X " -X /~nobody,/~somebodyMost options that do not accept arguments are boolean options, so named because their state can be captured with a yes-or-no ("boolean") variable. For example, --follow-ftp tells Wget to follow FTP links from HTML files and, on the other hand, --no-glob tells it not to perform file globbing on FTP URLs. A boolean option is either affirmative or negative (beginning with --no). All such options share several properties. Unless stated otherwise, it is assumed that the default behavior is the opposite of what the option accomplishes. For example, the documented existence of --follow-ftp assumes that the default is to not follow FTP links from HTML pages. Affirmative options can be negated by prepending the --no- to the option name; negative options can be negated by omitting the --no- prefix. This might seem superfluous---if the default for an affirmative option is to not do something, then why provide a way to explicitly turn it off? But the startup file may in fact change the default. For instance, using "follow_ftp = on" in .wgetrc makes Wget follow FTP links by default, and using --no-follow-ftp is the only way to restore the factory default from the command line.
Basic Startup Options
- -V
- --version
- Display the version of Wget.
- -h
- --help
- Print a help message describing all of Wget's command-line options.
- -b
- --background
- Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
- -e command
- --execute command
- Execute command as if it were a part of .wgetrc. A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one wgetrc command, use multiple instances of -e.
Logging and Input File Options
- -o logfile
- --output-file=logfile
- Log all messages to logfile. The messages are normally reported to standard error.
- -a logfile
- --append-output=logfile
- Append to logfile. This is the same as -o, only it appends to logfile instead of overwriting the old log file. If logfile does not exist, a new file is created.
- -d
- --debug
- Turn on debug output, meaning various information important to the developers of Wget if it does not work properly. Your system administrator may have chosen to compile Wget without debug support, in which case -d will not work. Please note that compiling with debug support is always safe---Wget compiled with the debug support will not print any debug info unless requested with -d.
- -q
- --quiet
- Turn off Wget's output.
- -v
- --verbose
- Turn on verbose output, with all the available data. The default output is verbose.
- -nv
- --no-verbose
- Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.
- --report-speed=type
- Output bandwidth as type. The only accepted value is bits.
- -i file
- --input-file=file
- Read URLs from a local or external file. If -
is specified as file, URLs are read from the standard input. (Use
./- to read from a file literally named -.)
- --input-metalink=file
- Downloads files covered in local Metalink file. Metalink version 3 and 4 are supported.
- --metalink-over-http
- Issues HTTP HEAD request instead of GET and extracts Metalink metadata from response headers. Then it switches to Metalink download. If no valid Metalink metadata is found, it falls back to ordinary HTTP download.
- --preferred-location
- Set preferred location for Metalink resources. This has effect if multiple resources with same priority are available.
- -F
- --force-html
- When input is read from a file, force it to be treated as an HTML file. This enables you to retrieve relative links from existing HTML files on your local disk, by adding "<base href=" url">" to HTML, or using the --base command-line option.
- -B URL
- --base=URL
- Resolves relative links using URL as the point of
reference, when reading links from an HTML file specified via the
-i/ --input-file option (together with --force-html,
or when the input file was fetched remotely from a server describing it as
HTML). This is equivalent to the presence of a "BASE" tag in the
HTML input file, with URL as the value for the "href"
attribute.
- --config=FILE
- Specify the location of a startup file you wish to use.
- --rejected-log=logfile
- Logs all URL rejections to logfile as comma separated values. The values include the reason of rejection, the URL and the parent URL it was found in.
Download Options
- --bind-address=ADDRESS
- When making client TCP/IP connections, bind to ADDRESS on the local machine. ADDRESS may be specified as a hostname or IP address. This option can be useful if your machine is bound to multiple IPs.
- --bind-dns-address=ADDRESS
- [libcares only] This address overrides the route for DNS requests. If you ever need to circumvent the standard settings from /etc/resolv.conf, this option together with --dns-servers is your friend. ADDRESS must be specified either as IPv4 or IPv6 address. Wget needs to be built with libcares for this option to be available.
- --dns-servers=ADDRESSES
- [libcares only] The given address(es) override the standard nameserver addresses, e.g. as configured in /etc/resolv.conf. ADDRESSES may be specified either as IPv4 or IPv6 addresses, comma-separated. Wget needs to be built with libcares for this option to be available.
- -t number
- --tries=number
- Set number of tries to number. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like "connection refused" or "not found" (404), which are not retried.
- -O file
- --output-document=file
- The documents will not be written to the appropriate files,
but all will be concatenated together and written to file. If
- is used as file, documents will be printed to standard
output, disabling link conversion. (Use ./- to print to a file
literally named -.)
- -nc
- --no-clobber
- If a file is downloaded more than once in the same
directory, Wget's behavior depends on a few options, including -nc.
In certain cases, the local file will be clobbered, or overwritten,
upon repeated download. In other cases it will be preserved.
- --backups=backups
- Before (over)writing a file, back up an existing file by adding a .1 suffix (_1 on VMS) to the file name. Such backup files are rotated to .2, .3, and so on, up to backups (and lost beyond that).
- -c
- --continue
- Continue getting a partially-downloaded file. This is
useful when you want to finish up a download started by a previous
instance of Wget, or by another program. For instance:
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
- --start-pos=OFFSET
- Start downloading at zero-based position OFFSET.
Offset may be expressed in bytes, kilobytes with the `k' suffix, or
megabytes with the `m' suffix, etc.
- --progress=type
- Select the type of the progress indicator you wish to use.
Legal indicators are "dot" and "bar".
- --show-progress
- Force wget to display the progress bar in any verbosity.
- -N
- --timestamping
- Turn on time-stamping.
- --no-if-modified-since
- Do not send If-Modified-Since header in -N mode. Send preliminary HEAD request instead. This has only effect in -N mode.
- --no-use-server-timestamps
- Don't set the local file's timestamp by the one on the
server.
- -S
- --server-response
- Print the headers sent by HTTP servers and responses sent by FTP servers.
- --spider
- When invoked with this option, Wget will behave as a Web
spider, which means that it will not download the pages, just check
that they are there. For example, you can use Wget to check your
bookmarks:
wget --spider --force-html -i bookmarks.html
- -T seconds
- --timeout=seconds
- Set the network timeout to seconds seconds. This is
equivalent to specifying --dns-timeout, --connect-timeout,
and --read-timeout, all at the same time.
- --dns-timeout=seconds
- Set the DNS lookup timeout to seconds seconds. DNS lookups that don't complete within the specified time will fail. By default, there is no timeout on DNS lookups, other than that implemented by system libraries.
- --connect-timeout=seconds
- Set the connect timeout to seconds seconds. TCP connections that take longer to establish will be aborted. By default, there is no connect timeout, other than that implemented by system libraries.
- --read-timeout=seconds
- Set the read (and write) timeout to seconds seconds.
The "time" of this timeout refers to idle time: if, at
any point in the download, no data is received for more than the specified
number of seconds, reading fails and the download is restarted. This
option does not directly affect the duration of the entire download.
- --limit-rate=amount
- Limit the download speed to amount bytes per second.
Amount may be expressed in bytes, kilobytes with the k suffix, or
megabytes with the m suffix. For example, --limit-rate=20k
will limit the retrieval rate to 20KB/s. This is useful when, for whatever
reason, you don't want Wget to consume the entire available bandwidth.
- -w seconds
- --wait=seconds
- Wait the specified number of seconds between the
retrievals. Use of this option is recommended, as it lightens the server
load by making the requests less frequent. Instead of in seconds, the time
can be specified in minutes using the "m" suffix, in hours using
"h" suffix, or in days using "d" suffix.
- --waitretry=seconds
- If you don't want Wget to wait between every
retrieval, but only between retries of failed downloads, you can use this
option. Wget will use linear backoff, waiting 1 second after the
first failure on a given file, then waiting 2 seconds after the second
failure on that file, up to the maximum number of seconds you
specify.
- --random-wait
- Some web sites may perform log analysis to identify
retrieval programs such as Wget by looking for statistically significant
similarities in the time between requests. This option causes the time
between requests to vary between 0.5 and 1.5 * wait seconds, where
wait was specified using the --wait option, in order to mask
Wget's presence from such analysis.
- --no-proxy
- Don't use proxies, even if the appropriate *_proxy environment variable is defined.
- -Q quota
- --quota=quota
- Specify download quota for automatic retrievals. The value
can be specified in bytes (default), kilobytes (with k suffix), or
megabytes (with m suffix).
- --no-dns-cache
- Turn off caching of DNS lookups. Normally, Wget remembers
the IP addresses it looked up from DNS so it doesn't have to repeatedly
contact the DNS server for the same (typically small) set of hosts it
retrieves from. This cache exists in memory only; a new Wget run will
contact DNS again.
- --restrict-file-names=modes
- Change which characters found in remote URLs must be
escaped during generation of local filenames. Characters that are
restricted by this option are escaped, i.e. replaced with
%HH, where HH is the hexadecimal number that
corresponds to the restricted character. This option may also be used to
force all alphabetical cases to be either lower- or uppercase.
- -4
- --inet4-only
- -6
- --inet6-only
- Force connecting to IPv4 or IPv6 addresses. With
--inet4-only or -4, Wget will only connect to IPv4 hosts,
ignoring AAAA records in DNS, and refusing to connect to IPv6 addresses
specified in URLs. Conversely, with --inet6-only or -6, Wget
will only connect to IPv6 hosts and ignore A records and IPv4 addresses.
- --prefer-family=none/IPv4/IPv6
- When given a choice of several addresses, connect to the
addresses with specified address family first. The address order returned
by DNS is used without change by default.
- --retry-connrefused
- Consider "connection refused" a transient error and try again. Normally Wget gives up on a URL when it is unable to connect to the site because failure to connect is taken as a sign that the server is not running at all and that retries would not help. This option is for mirroring unreliable sites whose servers tend to disappear for short periods of time.
- --user=user
- --password=password
- Specify the username user and password password for both FTP and HTTP file retrieval. These parameters can be overridden using the --ftp-user and --ftp-password options for FTP connections and the --http-user and --http-password options for HTTP connections.
- --ask-password
- Prompt for a password for each connection established. Cannot be specified when --password is being used, because they are mutually exclusive.
- --no-iri
- Turn off internationalized URI (IRI) support. Use
--iri to turn it on. IRI support is activated by default.
- --local-encoding=encoding
- Force Wget to use encoding as the default system
encoding. That affects how Wget converts URLs specified as arguments from
locale to UTF-8 for IRI support.
- --remote-encoding=encoding
- Force Wget to use encoding as the default remote
server encoding. That affects how Wget converts URIs found in files from
remote encoding to UTF-8 during a recursive fetch. This options is only
useful for IRI support, for the interpretation of non-ASCII characters.
- --unlink
- Force Wget to unlink file instead of clobbering existing file. This option is useful for downloading to the directory with hardlinks.
Directory Options
- -nd
- --no-directories
- Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions .n).
- -x
- --force-directories
- The opposite of -nd---create a hierarchy of directories, even if one would not have been created otherwise. E.g. wget -x http://fly.srk.fer.hr/robots.txt will save the downloaded file to fly.srk.fer.hr/robots.txt.
- -nH
- --no-host-directories
- Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
- --protocol-directories
- Use the protocol name as a directory component of local file names. For example, with this option, wget -r http://host will save to http/host/... rather than just to host/....
- --cut-dirs=number
- Ignore number directory components. This is useful
for getting a fine-grained control over the directory where recursive
retrieval will be saved.
No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> . --cut-dirs=1 -> ftp.xemacs.org/xemacs/ ...
- -P prefix
- --directory-prefix=prefix
- Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory).
HTTP Options
- --default-page=name
- Use name as the default file name when it isn't known (i.e., for URLs that end in a slash), instead of index.html.
- -E
- --adjust-extension
- If a file of type application/xhtml+xml or
text/html is downloaded and the URL does not end with the regexp
\.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html
to be appended to the local filename. This is useful, for instance, when
you're mirroring a remote site that uses .asp pages, but you want
the mirrored pages to be viewable on your stock Apache server. Another
good use for this is when you're downloading CGI-generated materials. A
URL like http://site.com/article.cgi?25 will be saved as
article.cgi?25.html.
- --http-user=user
- --http-password=password
- Specify the username user and password
password on an HTTP server. According to the type of the challenge,
Wget will encode them using either the "basic" (insecure), the
"digest", or the Windows "NTLM" authentication scheme.
- --no-http-keep-alive
- Turn off the "keep-alive" feature for HTTP
downloads. Normally, Wget asks the server to keep the connection open so
that, when you download more than one document from the same server, they
get transferred over the same TCP connection. This saves time and at the
same time reduces the load on the server.
- --no-cache
- Disable server-side cache. In this case, Wget will send the
remote server an appropriate directive ( Pragma: no-cache) to get
the file from the remote service, rather than returning the cached
version. This is especially useful for retrieving and flushing out-of-date
documents on proxy servers.
- --no-cookies
- Disable the use of cookies. Cookies are a mechanism for maintaining server-side state. The server sends the client a cookie using the "Set-Cookie" header, and the client responds with the same cookie upon further requests. Since cookies allow the server owners to keep track of visitors and for sites to exchange this information, some consider them a breach of privacy. The default is to use cookies; however, storing cookies is not on by default.
- --load-cookies file
- Load cookies from file before the first HTTP
retrieval. file is a textual file in the format originally used by
Netscape's cookies.txt file.
- "Netscape 4.x."
- The cookies are in ~/.netscape/cookies.txt.
- "Mozilla and Netscape 6.x."
- Mozilla's cookie file is also named cookies.txt, located somewhere under ~/.mozilla, in the directory of your profile. The full path usually ends up looking somewhat like ~/.mozilla/default/ some-weird-string/cookies.txt.
- "Internet Explorer."
- You can produce a cookie file Wget can use by using the File menu, Import and Export, Export Cookies. This has been tested with Internet Explorer 5; it is not guaranteed to work with earlier versions.
- "Other browsers."
- If you are using a different browser to create your cookies, --load-cookies will only work if you can locate or produce a cookie file in the Netscape format that Wget expects.
wget --no-cookies --header "Cookie: <name>=<value>"
- --save-cookies file
- Save cookies to file before exiting. This will not save cookies that have expired or that have no expiry time (so-called "session cookies"), but also see --keep-session-cookies.
- --keep-session-cookies
- When specified, causes --save-cookies to also save
session cookies. Session cookies are normally not saved because they are
meant to be kept in memory and forgotten when you exit the browser. Saving
them is useful on sites that require you to log in or to visit the home
page before you can access some pages. With this option, multiple Wget
runs are considered a single browser session as far as the site is
concerned.
- --ignore-length
- Unfortunately, some HTTP servers (CGI programs, to be more
precise) send out bogus "Content-Length" headers, which makes
Wget go wild, as it thinks not all the document was retrieved. You can
spot this syndrome if Wget retries getting the same document again and
again, each time claiming that the (otherwise normal) connection has
closed on the very same byte.
- --header=header-line
- Send header-line along with the rest of the headers
in each HTTP request. The supplied header is sent as-is, which means it
must contain name and value separated by colon, and must not contain
newlines.
wget --header='Accept-Charset: iso-8859-2' \ --header='Accept-Language: hr' \ http://fly.srk.fer.hr/
wget --header="Host: foo.bar" http://localhost/
- --max-redirect=number
- Specifies the maximum number of redirections to follow for a resource. The default is 20, which is usually far more than necessary. However, on those occasions where you want to allow more (or fewer), this is the option to use.
- --proxy-user=user
- --proxy-password=password
- Specify the username user and password
password for authentication on a proxy server. Wget will encode
them using the "basic" authentication scheme.
- --referer=url
- Include `Referer: url' header in HTTP request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them.
- --save-headers
- Save the headers sent by the HTTP server to the file, preceding the actual contents, with an empty line as the separator.
- -U agent-string
- --user-agent=agent-string
- Identify as agent-string to the HTTP server.
- --post-data=string
- --post-file=file
- Use POST as the method for all HTTP requests and send the
specified data in the request body. --post-data sends string
as data, whereas --post-file sends the contents of file.
Other than that, they work in exactly the same way. In particular, they
both expect content of the form
"key1=value1&key2=value2", with percent-encoding for special
characters; the only difference is that one expects its content as a
command-line parameter and the other accepts its content from a file. In
particular, --post-file is not for transmitting files as
form attachments: those must appear as "key=value" data (with
appropriate percent-coding) just like everything else. Wget does not
currently support "multipart/form-data" for transmitting POST
data; only "application/x-www-form-urlencoded". Only one of
--post-data and --post-file should be specified.
# Log in to the server. This can be done only once. wget --save-cookies cookies.txt \ --post-data 'user=foo&password=bar' \ http://example.com/auth.php # Now grab the page or pages we care about. wget --load-cookies cookies.txt \ -p http://example.com/interesting/article.php
- --method=HTTP-Method
- For the purpose of RESTful scripting, Wget allows sending of other HTTP Methods without the need to explicitly set them using --header=Header-Line. Wget will use whatever string is passed to it after --method as the HTTP Method to the server.
- --body-data=Data-String
- --body-file=Data-File
- Must be set when additional data needs to be sent to the
server along with the Method specified using --method.
--body-data sends string as data, whereas --body-file
sends the contents of file. Other than that, they work in exactly
the same way.
- --content-disposition
- If this is set to on, experimental (not fully-functional)
support for "Content-Disposition" headers is enabled. This can
currently result in extra round-trips to the server for a "HEAD"
request, and is known to suffer from a few bugs, which is why it is not
currently enabled by default.
- --content-on-error
- If this is set to on, wget will not skip the content when the server responds with a http status code that indicates error.
- --trust-server-names
- If this is set to on, on a redirect the last component of the redirection URL will be used as the local file name. By default it is used the last component in the original URL.
- --auth-no-challenge
- If this option is given, Wget will send Basic HTTP
authentication information (plaintext username and password) for all
requests, just like Wget 1.10.2 and prior did by default.
HTTPS (SSL/TLS) Options
To support encrypted HTTP (HTTPS) downloads, Wget must be compiled with an external SSL library. The current default is GnuTLS. In addition, Wget also supports HSTS (HTTP Strict Transport Security). If Wget is compiled without SSL support, none of these options are available.- --secure-protocol=protocol
- Choose the secure protocol to be used. Legal values are
auto, SSLv2, SSLv3, TLSv1, TLSv1_1,
TLSv1_2 and PFS. If auto is used, the SSL library is
given the liberty of choosing the appropriate protocol automatically,
which is achieved by sending a TLSv1 greeting. This is the default.
- --https-only
- When in recursive mode, only HTTPS links are followed.
- --no-check-certificate
- Don't check the server certificate against the available
certificate authorities. Also don't require the URL host name to match the
common name presented by the certificate.
- --certificate=file
- Use the client certificate stored in file. This is needed for servers that are configured to require certificates from the clients that connect to them. Normally a certificate is not required and this switch is optional.
- --certificate-type=type
- Specify the type of the client certificate. Legal values are PEM (assumed by default) and DER, also known as ASN1.
- --private-key=file
- Read the private key from file. This allows you to provide the private key in a file separate from the certificate.
- --private-key-type=type
- Specify the type of the private key. Accepted values are PEM (the default) and DER.
- --ca-certificate=file
- Use file as the file with the bundle of certificate
authorities ("CA") to verify the peers. The certificates must be
in PEM format.
- --ca-directory=directory
- Specifies directory containing CA certificates in PEM
format. Each file contains one CA certificate, and the file name is based
on a hash value derived from the certificate. This is achieved by
processing a certificate directory with the "c_rehash" utility
supplied with OpenSSL. Using --ca-directory is more efficient than
--ca-certificate when many certificates are installed because it
allows Wget to fetch certificates on demand.
- --crl-file=file
- Specifies a CRL file in file. This is needed for certificates that have been revocated by the CAs.
- --pinnedpubkey=file/hashes
- Tells wget to use the specified public key file (or hashes)
to verify the peer. This can be a path to a file which contains a single
public key in PEM or DER format, or any number of base64 encoded sha256
hashes preceded by "sha256//" and separated by ";"
- --random-file=file
- [OpenSSL and LibreSSL only] Use file as the source
of random data for seeding the pseudo-random number generator on systems
without /dev/urandom.
- --egd-file=file
- [OpenSSL only] Use file as the EGD socket. EGD
stands for Entropy Gathering Daemon, a user-space program
that collects data from various unpredictable system sources and makes it
available to other programs that might need it. Encryption software, such
as the SSL library, needs sources of non-repeating randomness to seed the
random number generator used to produce cryptographically strong keys.
- --no-hsts
- Wget supports HSTS (HTTP Strict Transport Security, RFC 6797) by default. Use --no-hsts to make Wget act as a non-HSTS-compliant UA. As a consequence, Wget would ignore all the "Strict-Transport-Security" headers, and would not enforce any existing HSTS policy.
- --hsts-file=file
- By default, Wget stores its HSTS database in
~/.wget-hsts. You can use --hsts-file to override this. Wget
will use the supplied file as the HSTS database. Such file must conform to
the correct HSTS database format used by Wget. If Wget cannot parse the
provided file, the behaviour is unspecified.
- --warc-file=file
- Use file as the destination WARC file.
- --warc-header=string
- Use string into as the warcinfo record.
- --warc-max-size=size
- Set the maximum size of the WARC files to size.
- --warc-cdx
- Write CDX index files.
- --warc-dedup=file
- Do not store records listed in this CDX file.
- --no-warc-compression
- Do not compress WARC files with GZIP.
- --no-warc-digests
- Do not calculate SHA1 digests.
- --no-warc-keep-log
- Do not store the log file in a WARC record.
- --warc-tempdir=dir
- Specify the location for temporary files created by the WARC writer.
FTP Options
- --ftp-user=user
- --ftp-password=password
- Specify the username user and password
password on an FTP server. Without this, or the corresponding
startup option, the password defaults to -wget@, normally used for
anonymous FTP.
- --no-remove-listing
- Don't remove the temporary .listing files generated
by FTP retrievals. Normally, these files contain the raw directory
listings received from FTP servers. Not removing them can be useful for
debugging purposes, or when you want to be able to easily check on the
contents of remote server directories (e.g. to verify that a mirror you're
running is complete).
- --no-glob
- Turn off FTP globbing. Globbing refers to the use of
shell-like special characters ( wildcards), like *,
?, [ and ] to retrieve more than one file from the
same directory at once, like:
wget ftp://gnjilux.srk.fer.hr/*.msg
- --no-passive-ftp
- Disable the use of the passive FTP transfer mode.
Passive FTP mandates that the client connect to the server to establish
the data connection rather than the other way around.
- --preserve-permissions
- Preserve remote file permissions instead of permissions set by umask.
- --retr-symlinks
- By default, when retrieving FTP directories recursively and
a symbolic link is encountered, the symbolic link is traversed and the
pointed-to files are retrieved. Currently, Wget does not traverse symbolic
links to directories to download them recursively, though this feature may
be added in the future.
FTPS Options
- --ftps-implicit
- This option tells Wget to use FTPS implicitly. Implicit FTPS consists of initializing SSL/TLS from the very beginning of the control connection. This option does not send an "AUTH TLS" command: it assumes the server speaks FTPS and directly starts an SSL/TLS connection. If the attempt is successful, the session continues just like regular FTPS ("PBSZ" and "PROT" are sent, etc.). Implicit FTPS is no longer a requirement for FTPS implementations, and thus many servers may not support it. If --ftps-implicit is passed and no explicit port number specified, the default port for implicit FTPS, 990, will be used, instead of the default port for the "normal" (explicit) FTPS which is the same as that of FTP, 21.
- --no-ftps-resume-ssl
- Do not resume the SSL/TLS session in the data channel. When starting a data connection, Wget tries to resume the SSL/TLS session previously started in the control connection. SSL/TLS session resumption avoids performing an entirely new handshake by reusing the SSL/TLS parameters of a previous session. Typically, the FTPS servers want it that way, so Wget does this by default. Under rare circumstances however, one might want to start an entirely new SSL/TLS session in every data connection. This is what --no-ftps-resume-ssl is for.
- --ftps-clear-data-connection
- All the data connections will be in plain text. Only the control connection will be under SSL/TLS. Wget will send a "PROT C" command to achieve this, which must be approved by the server.
- --ftps-fallback-to-ftp
- Fall back to FTP if FTPS is not supported by the target server. For security reasons, this option is not asserted by default. The default behaviour is to exit with an error. If a server does not successfully reply to the initial "AUTH TLS" command, or in the case of implicit FTPS, if the initial SSL/TLS connection attempt is rejected, it is considered that such server does not support FTPS.
Recursive Retrieval Options
- -r
- --recursive
- Turn on recursive retrieving. The default maximum depth is 5.
- -l depth
- --level=depth
- Specify recursion maximum depth level depth.
- --delete-after
- This option tells Wget to delete every single file it
downloads, after having done so. It is useful for pre-fetching
popular pages through a proxy, e.g.:
wget -r -nd --delete-after http://whatever.com/~popular/page/
- -k
- --convert-links
- After the download is complete, convert the links in the
document to make them suitable for local viewing. This affects not only
the visible hyperlinks, but any part of the document that links to
external content, such as embedded images, links to style sheets,
hyperlinks to non-HTML content, etc.
- •
- The links to files that have been downloaded by Wget will
be changed to refer to the file they point to as a relative link.
- •
- The links to files that have not been downloaded by Wget
will be changed to include host name and absolute path of the location
they point to.
- --convert-file-only
- This option converts only the filename part of the URLs,
leaving the rest of the URLs untouched. This filename part is sometimes
referred to as the "basename", although we avoid that term here
in order not to cause confusion.
- -K
- --backup-converted
- When converting a file, back up the original version with a .orig suffix. Affects the behavior of -N.
- -m
- --mirror
- Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
- -p
- --page-requisites
- This option causes Wget to download all the files that are
necessary to properly display a given HTML page. This includes such things
as inlined images, sounds, and referenced stylesheets.
wget -r -l 2 http://<site>/1.html
wget -r -l 2 -p http://<site>/1.html
wget -r -l 1 -p http://<site>/1.html
wget -r -l 0 -p http://<site>/1.html
wget -p http://<site>/1.html
wget -E -H -k -K -p http://<site>/<document>
- --strict-comments
- Turn on strict parsing of HTML comments. The default is to
terminate comments at the first occurrence of -->.
Recursive Accept/Reject Options
- -A acclist --accept acclist
- -R rejlist --reject rejlist
- Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard characters, *, ?, [ or ], appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in -A "*.mp3" or -A '*.mp3'.
- --accept-regex urlregex
- --reject-regex urlregex
- Specify a regular expression to accept or reject the complete URL.
- --regex-type regextype
- Specify the regular expression type. Possible types are posix or pcre. Note that to be able to use pcre type, wget has to be compiled with libpcre support.
- -D domain-list
- --domains=domain-list
- Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.
- --exclude-domains domain-list
- Specify the domains that are not to be followed.
- --follow-ftp
- Follow FTP links from HTML documents. Without this option, Wget will ignore all the FTP links.
- --follow-tags=list
- Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option.
- --ignore-tags=list
- This is the opposite of the --follow-tags option. To
skip certain HTML tags when recursively looking for documents to download,
specify them in a comma-separated list.
wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>
- --ignore-case
- Ignore case when matching files and directories. This influences the behavior of -R, -A, -I, and -X options, as well as globbing implemented when downloading from FTP sites. For example, with this option, -A "*.txt" will match file1.txt, but also file2.TXT, file3.TxT, and so on. The quotes in the example are to prevent the shell from expanding the pattern.
- -H
- --span-hosts
- Enable spanning across hosts when doing recursive retrieving.
- -L
- --relative
- Follow relative links only. Useful for retrieving a specific home page without any distractions, not even those from the same hosts.
- -I list
- --include-directories=list
- Specify a comma-separated list of directories you wish to follow when downloading. Elements of list may contain wildcards.
- -X list
- --exclude-directories=list
- Specify a comma-separated list of directories you wish to exclude from download. Elements of list may contain wildcards.
- -np
- --no-parent
- Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
ENVIRONMENT
Wget supports proxies for both HTTP and FTP retrievals. The standard way to specify proxy location, which Wget recognizes, is using the following environment variables:- http_proxy
- https_proxy
- If set, the http_proxy and https_proxy variables should contain the URLs of the proxies for HTTP and HTTPS connections respectively.
- ftp_proxy
- This variable should contain the URL of the proxy for FTP connections. It is quite common that http_proxy and ftp_proxy are set to the same URL.
- no_proxy
- This variable should contain a comma-separated list of domain extensions proxy should not be used for. For instance, if the value of no_proxy is .mit.edu, proxy will not be used to retrieve documents from MIT.
EXIT STATUS
Wget may return one of several error codes if it encounters problems.- 0
- No problems occurred.
- 1
- Generic error code.
- 2
- Parse error---for instance, when parsing command-line options, the .wgetrc or .netrc...
- 3
- File I/O error.
- 4
- Network failure.
- 5
- SSL verification failure.
- 6
- Username/password authentication failure.
- 7
- Protocol errors.
- 8
- Server issued an error response.
FILES
- /usr/local/etc/wgetrc
- Default location of the global startup file.
- .wgetrc
- User startup file.
BUGS
You are welcome to submit bug reports via the GNU Wget bug tracker (see < https://savannah.gnu.org/bugs/?func=additem&group=wget>). Before actually submitting a bug report, please try to follow a few simple guidelines.- 1.
- Please try to ascertain that the behavior you see really is a bug. If Wget crashes, it's a bug. If Wget does not behave as documented, it's a bug. If things work strange, but you are not sure about the way they are supposed to work, it might well be a bug, but you might want to double-check the documentation and the mailing lists.
- 2.
- Try to repeat the bug in as simple circumstances as
possible. E.g. if Wget crashes while downloading wget -rl0 -kKE -t5
--no-proxy http://example.com -o /tmp/log, you should try to
see if the crash is repeatable, and if will occur with a simpler set of
options. You might even try to start the download at the page where the
crash occurred to see if that page somehow triggered the crash.
- 3.
- Please start Wget with -d option and send us the
resulting output (or relevant parts thereof). If Wget was compiled without
debug support, recompile it---it is much easier to trace bugs with
debug support on.
- 4.
- If Wget has crashed, try to run it in a debugger, e.g. "gdb `which wget` core" and type "where" to get the backtrace. This may not work if the system administrator has disabled core files, but it is safe to try.
SEE ALSO
This is not the complete manual for GNU Wget. For more complete information, including more detailed explanations of some of the options, and a number of commands available for use with .wgetrc files and the -e option, see the GNU Info entry for wget.AUTHOR
Originally written by Hrvoje NikXiX <hniksic@xemacs.org>.COPYRIGHT
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2015 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".2022-03-05 | GNU Wget 1.18 |