方法一

登陆账号

1
2
3
curl 'https://signon.jgi.doe.gov/signon/create' --data-urlencode 'login=*****' --data-urlencode 'password=*****' -c cookies > /dev/null
# ****处修改为账号与密码

下载所有文件的列表

1
curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory?organism=PhytozomeV12' -b cookies > files.xml
1
https://genome.jgi.doe.gov

下载文件

files.xml文件里记录每个文件的大小、存放路径、md5、类型等
比如下面记录的是拟南芥的cds序列文件,其中的url=" “中的内容提取出来,”&“替换为”&",前面加上网站https://genome.jgi.doe.gov,用curl下载(记得指定cookie文件)。

1
<file label=“PhytozomeV12” filename=“Athaliana_167_TAIR10.cds_primaryTranscriptOnly.fa.gz” size=“10 MB” sizeInBytes=“11041833” timestamp=“Wed Jan 08 16:38:08 PST 2014” url="/portal/ext-api/downloads/get_tape_file?blocking=true&amp;url=/PhytozomeV12/download/_JAMO/585474407ded5e78cff8c47a/Athaliana_167_TAIR10.cds_primaryTranscriptOnly.fa.gz" project="" library="" md5=“6085fd39ad3327c727838f9da4f4b222” fileType=“Assembly” />

下面是测试下载拟南芥的数据文件,对于批量下载来讲还是比较麻烦的,可以查看files.xml文件,
将这些curl 放到一个bash文件里也可以实现批量下载。

1
2
3
4
5
6
7
8
9
curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/PhytozomeV12/download/_JAMO/585474407ded5e78cff8c47a/Athaliana_167_TAIR10.cds_primaryTranscriptOnly.fa.gz' -b cookies > Athaliana_167_TAIR10.cds_primaryTranscriptOnly.fa.gz

curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/PhytozomeV12/download/_JAMO/587b0adf7ded5e4229d885ab/Athaliana_447_TAIR10.fa.gz' -b cookies > Athaliana_447_TAIR10.fa.gz

curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/PhytozomeV12/download/_JAMO/587b0ade7ded5e4229d885aa/Athaliana_447_Araport11.protein_primaryTranscriptOnly.fa.gz' -b cookies > Athaliana_447_Araport11.protein_primaryTranscriptOnly.fa.gz

curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/PhytozomeV12/download/_JAMO/587b0ade7ded5e4229d885a8/Athaliana_447_Araport11.gene.gff3.gz' -b cookies > Athaliana_447_Araport11.gene.gff3.gz

curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/PhytozomeV12/download/_JAMO/587b0adb7ded5e4229d885a1/Athaliana_447_Araport11.cds_primaryTranscriptOnly.fa.gz' -b cookies > Athaliana_447_Araport11.cds_primaryTranscriptOnly.fa.gz

方法二 | Get JGI Genomes

该方法适合批量下载

下载

1
git clone https://hub.fastgit.org/guyleonard/get_jgi_genomes.git

用法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Usage:
get_jgi_genomes [-u <username> -p <password>] | [-c <cookies>] [-f | -a | -P 12 | -m 3] (-i) (-l) (-A) (-C) (-g) (-t) (-q)

Required:
-u <username>
-p <password>
or
-c <cookie file>
Portal Choice:
-f Mycocosm aka fungi
-a Phycocosm aka algae
-P <version> PhytozomeV aka plants
-m <version> MetazomeV aka metazoans
Portal File Options:
-A get assembly
-C get CDS
-g get GFF
-t get transcripts
JGI Taxa ID:
-i <id> JGI ID of Genome Project
Other:
-l list only, no downloads

下载示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 登录:
./bin/get_jgi_genomes -u your.email@address.com -p password


# 登录后从 Mycocosm 下载所有蛋白质文件的列表:
./bin/get_jgi_genomes -c signon.cookie -f -l

# 登录后从 Phycocosm 下载所有 CDS 文件:
./bin/get_jgi_genomes -c signon.cookie -a -C

# 登录后从 Phytozome V12 下载所有程序集文件:
./bin/get_jgi_genomes -c signon.cookie -P 12 -A


方法三 | jgi-query

这是一个python写的脚本,感兴趣的可以查看使用信息,点击此处链接

下载

1
git clone https://github.com/glarue/jgi-query.git

使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
usage: jgi-query.py [-h] [-x [XML]] [-c] [-s] [-f] [-u] [-n RETRY_N]
[-l logfile] [-r REGEX] [-a]
[organism_abbreviation]

This script will list and retrieve files from JGI using the curl API. It will
return a list of all files available for download for a given query organism.

positional arguments:
organism_abbreviation
organism name formatted per JGI's abbreviation. For
example, 'Nematostella vectensis' is abbreviated by
JGI as 'Nemve1'. The appropriate abbreviation may be
found by searching for the organism on JGI; the name
used in the URL of the 'Info' page for that organism
is the correct abbreviation. The full URL may also be
used for this argument (default: None)

optional arguments:
-h, --help show this help message and exit
-x [XML], --xml [XML]
specify a local xml file for the query instead of
retrieving a new copy from JGI (default: None)
-c, --configure initiate configuration dialog to overwrite existing
user/password configuration (default: False)
-s, --syntax_help
-f, --filter_files filter organism results by config categories instead
of reporting all files listed by JGI for the query
(work in progress) (default: False)
-u, --usage print verbose usage information and exit (default:
False)
-n RETRY_N, --retry_n RETRY_N
number of times to retry downloading files with errors
(0 to skip such files) (default: 4)
-l logfile, --load_failed logfile
retry downloading from URLs listed in log file
(default: None)
-r REGEX, --regex REGEX
Regex pattern to use to auto-select and download files
(no interactive prompt) (default: None)
-a, --all Auto-select and download all files for query (no
interactive prompt) (default: False)

方法四

点击链接前往