On the crunchr homepage, click on "Compute," and then click "Create Container." You can name your container whatever you would like. the Domain does not need to change. For "Compute Image," select "PMAP Jupyter Python 2.7" from the drop-down. then click "Create."
On the left-hand side you should see folders, like in the image below.
You have two options:
OR
The following steps will show you what code to run in order to access your data. You can copy and paste the code from here to your own notebook, or run it directly in this notebook if you opened it in crunchr. Just make sure to make any necessary changes (database name, your jhed, etc).
The following code establishes a connection with the CAMP_PMCoe_Projection database, for demonstration purposes. Make sure to confirm the name of your database and update that field accordingly.
#Libraries needed
import sqlalchemy
import urllib.parse
import pandas as pd
import getpass
from SciServer import Authentication
myUserName = Authentication.getKeystoneUserWithToken(Authentication.getToken()).userName
passwd = getpass.getpass('Password for ' + myUserName + ': ')
user = "win\\" + myUserName
#SQL Driver
driver="FreeTDS"
tds_ver="8.0"
# Database
host_ip="ESMPMDBPR4.WIN.AD.JHU.EDU" # Update this accordingly
db_port="1433"
db="CAMP_PMCoe_Projection" # Update this accordingly
# Create Connection String
conn_str=("DRIVER={};Server={};PORT={};DATABASE={};UID={};PWD={};TDS_VERSION={}"
.format(driver, host_ip, db_port, db, user, passwd, tds_ver)
)
# Create Engine
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect=' +
urllib.parse.quote(conn_str)
)
Run the following code to retrieve a list of the table names in your database.
query = "SELECT name FROM SYS.tables"
df = pd.read_sql_query(query, engine)
df
name | |
---|---|
0 | patients |
1 | encounters |
2 | labs |
3 | meds |
4 | problemlist |
5 | procedures |
6 | symptoms |
7 | vitals_BP |
8 | vitals_height |
9 | vitals_pulse |
10 | vitals_respiration |
11 | vitals_temperature |
12 | vitals_weight |
13 | sysdiagrams |
The following code will put a table from your database into a dataframe. In this example, we will put the encounters table into a dataframe we are calling df2. You can change the object name (df2 --> df3, etc) and the table name to get all your tables into dataframes.
query="SELECT * FROM encounters"
df2 = pd.read_sql_query(query, engine)
df2
osler_id | encounter_id | encounter_type | encounter_date | |
---|---|---|---|---|
0 | 5303550b-8ed2-42fd-885a-d32b308b05f3 | 17938 | Office Visit | 2015-10-10 |
1 | 5303550b-8ed2-42fd-885a-d32b308b05f3 | 142706 | Office Visit | 2016-01-24 |
2 | 5303550b-8ed2-42fd-885a-d32b308b05f3 | 571465 | Office Visit | 2017-03-19 |
3 | 5303550b-8ed2-42fd-885a-d32b308b05f3 | 432470 | Office Visit | 2016-10-22 |
4 | 5303550b-8ed2-42fd-885a-d32b308b05f3 | 410795 | Office Visit | 2016-10-01 |
... | ... | ... | ... | ... |
753479 | ebd052dd-14a7-48dd-9130-f3e7f43b9038 | 270512 | Office Visit | 2016-07-04 |
753480 | ebd052dd-14a7-48dd-9130-f3e7f43b9038 | 150692 | Office Visit | 2016-03-09 |
753481 | ebd052dd-14a7-48dd-9130-f3e7f43b9038 | 567824 | Clinical Support | 2017-04-16 |
753482 | ebd052dd-14a7-48dd-9130-f3e7f43b9038 | 548100 | Office Visit | 2017-04-03 |
753483 | ebd052dd-14a7-48dd-9130-f3e7f43b9038 | 534937 | Office Visit | 2017-03-26 |
753484 rows × 4 columns
Please note that if you are proficient with SQL, you can put any SQL query in place of the queries we have above. This is helpful if you do not want all rows or all columns of a table.
There are many more resources within this cookbook which show you additional ways to work with SQL, and different things you can do with your data once you have it in dataframes. The link below is a particularly good follow-up to this guide.