!pip install spacy

Collecting spacy
  Downloading https://files.pythonhosted.org/packages/52/da/3a1c54694c2d2f40df82f38a19ae14c6eb24a5a1a0dae87205ebea7a84d8/spacy-2.1.3-cp36-cp36m-manylinux1_x86_64.whl (27.7MB)
    100% |################################| 27.7MB 243kB/s eta 0:00:010:02
Collecting srsly<1.1.0,>=0.0.5 (from spacy)
  Downloading https://files.pythonhosted.org/packages/6b/97/47753e3393aa4b18de9f942fac26f18879d1ae950243a556888f389d1398/srsly-0.0.5-cp36-cp36m-manylinux1_x86_64.whl (180kB)
    100% |################################| 184kB 9.0MB/s eta 0:00:01
Requirement already satisfied: jsonschema<3.0.0,>=2.6.0 in /home/idies/miniconda3/lib/python3.6/site-packages (from spacy) (2.6.0)
Requirement already satisfied: numpy>=1.15.0 in /home/idies/miniconda3/lib/python3.6/site-packages (from spacy) (1.15.2)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading https://files.pythonhosted.org/packages/3d/61/9b0520c28eb199a4b1ca667d96dd625bba003c14c75230195f9691975f85/cymem-2.0.2-cp36-cp36m-manylinux1_x86_64.whl
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/idies/miniconda3/lib/python3.6/site-packages (from spacy) (2.19.1)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading https://files.pythonhosted.org/packages/a6/e6/63f160a4fdf0e875d16b28f972083606d8d54f56cd30cb8929f9a1ee700e/murmurhash-1.0.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting thinc<7.1.0,>=7.0.2 (from spacy)
  Downloading https://files.pythonhosted.org/packages/a9/f1/3df317939a07b2fc81be1a92ac10bf836a1d87b4016346b25f8b63dee321/thinc-7.0.4-cp36-cp36m-manylinux1_x86_64.whl (2.1MB)
    100% |################################| 2.1MB 6.2MB/s eta 0:00:01                | 542kB 20.6MB/s eta 0:00:01
Collecting plac<1.0.0,>=0.9.6 (from spacy)
  Downloading https://files.pythonhosted.org/packages/9e/9b/62c60d2f5bc135d2aa1d8c8a86aaf84edb719a59c7f11a4316259e61a298/plac-0.9.6-py2.py3-none-any.whl
Collecting blis<0.3.0,>=0.2.2 (from spacy)
  Downloading https://files.pythonhosted.org/packages/34/46/b1d0bb71d308e820ed30316c5f0a017cb5ef5f4324bcbc7da3cf9d3b075c/blis-0.2.4-cp36-cp36m-manylinux1_x86_64.whl (3.2MB)
    100% |################################| 3.2MB 5.9MB/s eta 0:00:01a 0:00:01
Collecting wasabi<1.1.0,>=0.2.0 (from spacy)
  Downloading https://files.pythonhosted.org/packages/76/6c/0376977df1ba9f0ec27835d80456d9284c79737cb5205649451db1181f01/wasabi-0.2.1-py3-none-any.whl
Collecting preshed<2.1.0,>=2.0.1 (from spacy)
  Downloading https://files.pythonhosted.org/packages/20/93/f222fb957764a283203525ef20e62008675fd0a14ffff8cc1b1490147c63/preshed-2.0.1-cp36-cp36m-manylinux1_x86_64.whl (83kB)
    100% |################################| 92kB 7.8MB/s eta 0:00:01
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/idies/miniconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.0.4)
Requirement already satisfied: idna<2.8,>=2.5 in /home/idies/miniconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.7)
Requirement already satisfied: urllib3<1.24,>=1.21.1 in /home/idies/miniconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.23)
Requirement already satisfied: certifi>=2017.4.17 in /home/idies/miniconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2018.10.15)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /home/idies/miniconda3/lib/python3.6/site-packages (from thinc<7.1.0,>=7.0.2->spacy) (4.26.0)
Installing collected packages: srsly, cymem, murmurhash, preshed, wasabi, blis, plac, thinc, spacy
Successfully installed blis-0.2.4 cymem-2.0.2 murmurhash-1.0.2 plac-0.9.6 preshed-2.0.1 spacy-2.1.3 srsly-0.0.5 thinc-7.0.4 wasabi-0.2.1


#Imports that could be useful in this homework:
import re
import nltk
import spacy
from collections import defaultdict

text = '''
197067530 | TWMH | 20930664 | | 908831 | 1/20/1998 12:00:00 AM | INCARCERATED UMBILICAL HERNIA | Signed | DIS | Admission Date: 6/7/1998 Report Status: Signed

Discharge Date: 5/25/1998
PRINCIPAL DIAGNOSIS: INCARCERATED UMBILICAL HERNIA.
HISTORY: Jewell Gauthreaux is a 78 year old woman with a complex past
medical history including coronary artery disease with a
history of MIs times two in the past, a history of DVT back in
1970 , hypertension, rheumatoid arthritis, gout and history of
atrial fibrillation and atrial flutter as well as onset adult
diabetes mellitus. She presented to the Rial Community Hospital on
the day of admission complaining of an umbilical bulge over the
past several weeks. This umbilical bulge had been increasing
somewhat in size, but had not bothered her and was always
reducible. However, over the preceding weekend it became
incarcerated and then became somewhat painful. It was not
associated with any nausea or vomiting and she reported that she
was having normal bowel movements even in the face of this problem.
She presented initially to the Sey Al Skaez County Health Center and was
admitted with the diagnosis of incarcerated umbilical hernia.
PAST MEDICAL HISTORY:
1. Coronary artery disease with a history of MI times two in the
past with a recent echocardiogram on 11/8 showing an EF of
55-60%.
2. History of DVT in 1970.
3. Hypertension.
4. Rheumatoid arthritis.
5. Gout.
6. Atrial fibrillation and atrial flutter on Coumadin.
7. Adult onset diabetes mellitus.
PAST SURGICAL HISTORY:
1. Status post appendectomy.
2. Status post mitral valve replacement with St. Jude valve.
3. Left hip fracture repair.
4. Status post mitral valve commissurotomy in 1955.
MEDICATIONS ON ADMISSION: Lasix 80 mg a day, sublingual
nitroglycerin p.r.n., Propafenone 225 mg
t.i.d., Lopressor 150 mg b.i.d. , Lisinopril 10 mg a day and
Micronase 10 mg b.i.d., Isordil 40 mg t.i.d., Coumadin 5 mg a day
with 2-1/2 mg every Sunday.
ALLERGIES:: She is allergic to aspirin and penicillin.
PHYSICAL EXAMINATION: She is an extremely pleasant elderly woman
in no acute distress. HEENT - showed
extraocular movements intact. Pupils equally round and reactive to
light. NECK - supple. HEART - regular rhythm. LUNGS - clear.
ABDOMEN - soft, nontender, nondistended with approximately 1.5 cm
in diameter umbilical hernia to the left of her umbilicus. This
hernia was somewhat tender to palpation , but showed no overlying
erythema or evidence of necrosis. She had normal bowel sounds.
EXTREMITIES - no clubbing , cyanosis or edema. NEUROLOGIC - intact.
Preoperative laboratory showed BUN of 35, creatinine 1.3,
hematocrit 42.0, white count 6.8, coagulation studies within normal
limits.
HOSPITAL COURSE: Ms. Peick was admitted to the Lum Hospital on the day of admission with the
diagnosis of incarcerated umbilical hernia. Because of her history
of coumadinization for both her mitral valve as well as her atrial
fibrillation it was felt that it would be necessary to hospitalize
her , hold her Coumadin and heparinize her until it was possible to
do her surgery. However, upon arrival here her admission INR was
noted to be subtherapeutic at 1.3 and she was , therefore ,
immediately started on heparin. The Cardiology Service was
consulted regarding her significant past cardiac history and
recommended an echocardiogram to be performed preoperatively. This
echo was done on 9/6/98 demonstrating an EF of 55% with an
abnormal subdural wall motion , trace areas of aortic insufficiency
and mildly increased right ventricular size and the artificial
mitral valve was noted to be functioning well. The cardiology felt
that in the face of this largely unchanged echocardiogram that
showed stable and should go to the operating room for repair of
umbilical hernia. On 7/1/98 the patient was taken to the
operating room and underwent umbilical hernia repair with primary
reapproximation of the fascia. The procedure was done without any
complications. She was extubated and transferred in stable
condition to the postoperative recovery area and observed on the
floor. She was immediately restarted on her Coumadin, as well as
her heparin, which she continued for the next three days
postoperatively. The patient did quite well with gradual up in her
INR to a greater than 2 level and she was discharged to home on
4/23/98 on a regular dose.
DISCHARGE MEDICATIONS: Tylenol 650 mg p.o. q four hours p.r.n.
headache , Estrogen cream topical which is
applied to her vagina because of her atrophic vaginitis. Colace
100 mg p.o. b.i.d. , Lasix 80 mg p.o. q.d. , Micronase 10 mg p.o.
b.i.d. , Isordil 40 mg p.o. t.i.d. , lisinopril 10 mg p.o. q.d. ,
Lopressor 150 mg p.o. b.i.d. , Percocet 1-2 tabs q 3-4 hours p.r.n.
pain , Propafenone 225 mg p.o. t.i.d. and Coumadin 5 mg p.o. q.d.
with 7.5 mg take every Sunday.
DISPOSITION: To home. She will follow-up at the Oxtri- Hospital one week after discharge with follow-up with
primary medical doctor in one week after discharge.
Dictated By: MARK BEALER , M.D. GN70
Attending: DON N. FRITZLER , M.D. AI17  TV067/7242
Batch: 73284 Index No. YSQMAY6DNL D: 1/23/98
T: 6/22/98
'''


drugs = set(['Coumadin', 'Lasix'])
drug_counts = defaultdict(int)
tokens = text.split()

for t in tokens:
    if t in drugs:  #if the current token is a drug
        drug_counts[t]+=1

drug_counts

defaultdict(int, {'Lasix': 2, 'Coumadin': 3})


#MODIFY THIS CODE TO ADD MULTIPLE DRUGS, AND TAKE INTO ACCOUNT PUNCTUATION AT THE END OF A TOKEN: "Lasix,"
drugs = set(['Coumadin', 'Lasix'])
drug_counts = defaultdict(int)
tokens = text.split()

for t in tokens:
    if t in drugs:  #if the current token is a drug
        drug_counts[t]+=1

drug_counts


print(text.lower())


import re # this is the regular expression library
#\d is the charaacter class for digits - this means all digits 0-9, alternatively we could use [0-9] 
#{} indicates the number of matches we are looking for. {1,2} means at least one, at max 2
re.findall('\d{1,2}/\d{1,2}/\d{2}', text)


NameErrorTraceback (most recent call last)
<ipython-input-4-7cc8bffbeb36> in <module>
      2 #\d is the charaacter class for digits - this means all digits 0-9, alternatively we could use [0-9]
      3 #{} indicates the number of matches we are looking for. {1,2} means at least one, at max 2
----> 4 re.findall('\d{1,2}/\d{1,2}/\d{2}', text)

NameError: name 'text' is not defined


text2 = '''On 02-12-1998 the the patient had an appendectomy.  The patient did quite well with gradual up in her
inr to a greater than 2 level and she was discharged to home on 4-23-98 on a regular dose.'''


re.findall('\d{1,2}/\d{1,2}/\d{2}', text2)


NameErrorTraceback (most recent call last)
<ipython-input-1-fb7cbb950d2d> in <module>
----> 1 re.findall('\d{1,2}/\d{1,2}/\d{2}', text2)

NameError: name 're' is not defined


re.findall('(lasix|coumadin|percocet|tylenol)', text, re.IGNORECASE)

['Coumadin',
 'Lasix',
 'Coumadin',
 'coumadin',
 'Coumadin',
 'Coumadin',
 'Tylenol',
 'Lasix',
 'Percocet',
 'Coumadin']


re.findall("([0-9]{1,5})(\s*)(mg|cc|ml)", text)

[('80', ' ', 'mg'),
 ('225', ' ', 'mg'),
 ('150', ' ', 'mg'),
 ('10', ' ', 'mg'),
 ('10', ' ', 'mg'),
 ('40', ' ', 'mg'),
 ('5', ' ', 'mg'),
 ('2', ' ', 'mg'),
 ('650', ' ', 'mg'),
 ('100', ' ', 'mg'),
 ('80', ' ', 'mg'),
 ('10', ' ', 'mg'),
 ('40', ' ', 'mg'),
 ('10', ' ', 'mg'),
 ('150', ' ', 'mg'),
 ('225', ' ', 'mg'),
 ('5', ' ', 'mg'),
 ('5', ' ', 'mg')]


#REGULAR EXPRESSION HERE


#REGULAR EXPRESSION HERE

Homework 2: Matching and Regular Expressions¶

Task 1: Dictionary Matching¶

Drug dictionary:¶

Drug dictionary different tokenization:¶

Drug dictionary string modifications:¶

Task 2: Regular Expressions:¶

Task 3: Section Breaking:¶