Investigation of the optimal timing of treatment change to maximize the delay of onset mucoid pseudomonas aeruginosa pulmonary infection in Pediatric Cystic Fibrosis Patients

Investigation of the optimal timing of treatment change to maximize the delay of onset mucoid pseudomonas aeruginosa pulmonary infection in Pediatric Cystic Fibrosis Patients

Title	Investigation of the optimal timing of treatment change to maximize the delay of onset mucoid pseudomonas aeruginosa pulmonary infection in Pediatric Cystic Fibrosis Patients
Publication Type	dissertation
School or College	College of Pharmacy
Department	Pharmacotherapy
Author	Jiao, Tianze
Date	2017
Description	Cystic Fibrosis (CF) is the most common life-shortening autosomal recessive disorder. Those patients who have CF suffer from multiple comorbidities. Nearly 85% of the deaths related to CF are caused by lung disease. CF lung disease begins early in life with inflammation, impaired mucociliary clearance, and initial airway colonization by pathogens; it then progresses to chronic infection of the airways. To treat the continuous deterioration of lung function, CF patients need to use lung maintenance therapies continuously. These treatments are applied to patients for more than 30 years on average. However, the majority of evidence was identified using short-term follow-ups (less than 1 year). Moreover, no guidelines suggest when a treatment change is needed, nor do they suggest the order of prescribing those treatments. Therefore, a retrospective observational study was conducted using a national patient registry, the Cystic Fibrosis Foundation Patient Registry (CFFPR). By emulating randomized clinical trials (RCTs), this study investigated the treatment change pattern and the causality between suboptimal treatment status and the time to delay in acquisition of mucoid Pseudomonas aeruginosa pulmonary infection (mucoid PaPI). A cohort of pediatric CF patients (n=4,970) who were diagnosed with nonmucoid PaPI before mucoid PaPI during 2006-2011 was identified. Those patients were young, healthy, and received multiple chronic treatments only at the baseline. An instrument that indicated when the suboptimal treatment status has been achieved and a rational treatment change is needed was successfully generated by including demographic characteristics, comorbidities, clinical signals, and treatment histories. According to various thresholds of the instrument, which steered the decision of treatment change, 25 regimes were built. Each patient was hypothetically randomized to follow each one of 25 regimes independently. A fixed parameterization of the dynamic logistic marginal structural model with the constant-time hazard was applied to investigate the effectiveness of following each one of the 25 regimes. Using the effect of following one regime as the reference, if a physician changed treatment and was not following any regime, it would cause 17% more hazard of developing mucoid PaPI in his/her patient, during the 6-year follow-up. The hazard ratio ranged from 0.98 to 1.07 for other regimes. To summarize, for a physician, changing treatment without following any regime caused the worst outcome. The differences of treatment effect were trivial for the same patient who followed varied regimes to receive treatment. To achieve a better outcome, a physician should follow a regime, which is, perhaps, the optimal one, to change lung maintenance therapies, prudently prescribing an additional treatment from one of the three treatment classes: inhaled antibiotic, mucolytic, or anti-inflammatory.
Type	Text
Publisher	University of Utah
Subject	Pharmaceutical sciences; Epidemiology; Biostatistics
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	© Tianze Jiao
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s69k8wtq
Setname	ir_etd
ID	1440272
OCR Text	Show INVESTIGATION OF THE OPTIMAL TIMING OF TREATMENT CHANGE TO MAXIMIZE THE DELAY OF ONSET MUCOID PSEUDOMONAS AERUGINOSA PULMONARY INFECTION IN PEDIATRIC CYSTIC FIBROSIS PATIENTS by Tianze Jiao A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Pharmacotherapy Outcomes Research and Health Policy Department of Pharmacotherapy The University of Utah December 2017 Copyright © Tianze Jiao 2017 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL Tianze Jiao The dissertation of has been approved by the following supervisory committee members: Diana I. Brixner Theodore G. Liou Vanessa Stevens David C. Young Yue Zhang and by Karen M. Gunning the Department/College/School of and by David B. Kieda, Dean of The Graduate School. , Chair June 2nd, 2017 , Member June 2nd, 2017 , Member June 2nd, 2017 , Member June 2nd, 2017 , Member June 2nd, 2017 Date Approved Date Approved Date Approved Date Approved Date Approved , Chair/Dean of Pharmacotherapy ABSTRACT Cystic Fibrosis (CF) is the most common life-shortening autosomal recessive disorder. Those patients who have CF suffer from multiple comorbidities. Nearly 85% of the deaths related to CF are caused by lung disease. CF lung disease begins early in life with inflammation, impaired mucociliary clearance, and initial airway colonization by pathogens; it then progresses to chronic infection of the airways. To treat the continuous deterioration of lung function, CF patients need to use lung maintenance therapies continuously. These treatments are applied to patients for more than 30 years on average. However, the majority of evidence was identified using short-term follow-ups (less than 1 year). Moreover, no guidelines suggest when a treatment change is needed, nor do they suggest the order of prescribing those treatments. Therefore, a retrospective observational study was conducted using a national patient registry, the Cystic Fibrosis Foundation Patient Registry (CFFPR). By emulating randomized clinical trials (RCTs), this study investigated the treatment change pattern and the causality between suboptimal treatment status and the time to delay in acquisition of mucoid Pseudomonas aeruginosa pulmonary infection (mucoid PaPI). A cohort of pediatric CF patients (n=4,970) who were diagnosed with nonmucoid PaPI before mucoid PaPI during 2006-2011 was identified. Those patients were young, healthy, and received multiple chronic treatments only at the baseline. An instrument that indicated when the suboptimal treatment status has been achieved and a rational treatment change is needed was successfully generated by including demographic characteristics, comorbidities, clinical signals, and treatment histories. According to various thresholds of the instrument, which steered the decision of treatment change, 25 regimes were built. Each patient was hypothetically randomized to follow each one of 25 regimes independently. A fixed parameterization of the dynamic logistic marginal structural model with the constant-time hazard was applied to investigate the effectiveness of following each one of the 25 regimes. Using the effect of following one regime as the reference, if a physician changed treatment and was not following any regime, it would cause 17% more hazard of developing mucoid PaPI in his/her patient, during the 6-year follow-up. The hazard ratio ranged from 0.98 to 1.07 for other regimes. To summarize, for a physician, changing treatment without following any regime caused the worst outcome. The differences of treatment effect were trivial for the same patient who followed varied regimes to receive treatment. To achieve a better outcome, a physician should follow a regime, which is, perhaps, the optimal one, to change lung maintenance therapies, prudently prescribing an additional treatment from one of the three treatment classes: inhaled antibiotic, mucolytic, or anti-inflammatory. iv TABLE OF CONTENTS ABSTRACT .................................................................................................................................. iii ACKNOWLEDGEMENTS ....................................................................................................... viii Chapters 1. EXECUTIVE SUMMARY ........................................................................................................ 1 2. BACKGROUND AND SIGNIFICANCE ................................................................................. 8 2.1 Cystic Fibrosis and Clinical Issues in its Management ......................................................... 8 2.1.1 Pathophysiology and Incidence Rate ............................................................................. 8 2.1.2 Diagnosis and Symptoms............................................................................................... 9 2.1.3 Current Treatments ...................................................................................................... 10 2.1.4 Health Resource Utilization and Cost .......................................................................... 12 2.2 Pulmonary Infection ............................................................................................................ 13 2.2.1 Pseudomonas aeruginosa ............................................................................................ 14 2.2.2 Other Infections ........................................................................................................... 15 2.3 Chronic Medications for Maintaining Lung Health ............................................................ 16 2.3.1 Inhaled Antibiotics ....................................................................................................... 16 2.3.2 Other Lung Health Maintenance Medication .............................................................. 17 2.3.3 Other Medications........................................................................................................ 18 2.3.4 The Dilemma of Maintaining Lung Health ................................................................. 18 2.3.5 Treatment Classifications ............................................................................................ 19 2.4 Signals for Clinical Decisions ............................................................................................. 20 2.4.1 Predicted Normal FEV1% ........................................................................................... 21 2.4.2 Pulmonary Exacerbation .............................................................................................. 23 2.5 Prescribing Decisions .......................................................................................................... 24 2.5.1 Internal Factors that Influence Prescribing Decision-making...................................... 24 2.5.2 External Factors that Influence Prescribing Decision-making .................................... 26 2.5.3 Uniqueness on Antibiotics and Lung Treatment.......................................................... 29 2.5.4 Treatment Change ........................................................................................................ 30 2.6 Dynamic Treatment Regimes and Common Applications .................................................. 33 2.7 Causal Inference .................................................................................................................. 37 2.8 Significance ......................................................................................................................... 41 3. OBJECTIVES AND SPECIFIC AIMS .................................................................................. 57 3.1 Objectives............................................................................................................................ 57 3.2 Specific Aims ...................................................................................................................... 57 4. METHODS................................................................................................................................ 58 4.1 Data Sources ....................................................................................................................... 58 4.2 Study Design and Population .............................................................................................. 60 4.2.1 Study Design ................................................................................................................ 60 4.2.2 Inclusion and Exclusion Criteria .................................................................................. 61 4.3 Exposure, Covariate, and Outcome Assessment ................................................................. 65 4.3.1 Rational Treatment Change ......................................................................................... 66 4.3.2 Exposure Assessment .................................................................................................. 69 4.3.3 Outcome Assessment ................................................................................................... 70 4.3.4 Covariate Assessment .................................................................................................. 71 4.4 Methods............................................................................................................................... 73 4.5 Variable Selection ............................................................................................................... 82 4.6 Statistical Analyses ............................................................................................................. 93 4.7 Data Reformatting ............................................................................................................. 100 4.8 Missing Data ..................................................................................................................... 102 4.9 Assumptions ...................................................................................................................... 105 4.9.1 Assumption 1 ............................................................................................................. 106 4.9.2 Assumption 2 ............................................................................................................. 116 4.9.3 Assumption 3 ............................................................................................................. 122 4.9.4 Assumption 4 ............................................................................................................. 125 5. TREATMENT CHANGE PATTERN .................................................................................. 148 5.1 Data Management ............................................................................................................. 148 5.1.1 Assumption on Unclear Pseudomonas aeruginosa Culture Test Results .................. 148 5.1.2 Assumptions on Race to Predict Normal Lung Function Given Other Demographic Characteristics..................................................................................................................... 150 5.1.3 Other Exclusion Criteria ............................................................................................ 150 5.1.4 Assumptions on Recovered Lung Function after Hospitalization ............................. 151 5.1.5 Assumptions about Imputing Height and Weight ...................................................... 155 5.2 Results ............................................................................................................................... 157 5.2.1 Baseline Characteristics of Patients in the Cohort ..................................................... 157 5.2.2 Baseline Characteristics of the Subgroup Patients in the Cohort............................... 160 5.2.3 Competing Risks of Death by Calendar Year ............................................................ 164 5.2.4 Treatment Combinations and Treatment Change Patterns ........................................ 164 5.3 Discussions........................................................................................................................ 168 5.3.1 Summary Regarding Assumptions ............................................................................ 169 5.3.2 Baseline Characteristics of Patients in the Cohort ..................................................... 170 5.3.3 Baseline Characteristics of the Subgroup Patients in the Cohort............................... 173 5.3.4 Competing Risks of Death by Calendar Year ............................................................ 175 5.3.5 Treatment Combinations and Treatment Change Patterns ........................................ 176 5.4 Conclusions ....................................................................................................................... 180 6. PREDICTIVE MODEL ......................................................................................................... 207 6.1 Results ............................................................................................................................... 207 6.1.1 Independent Variable Identification .......................................................................... 208 6.1.2 Variable Selection by Elastic Net .............................................................................. 209 6.1.3 Calculating the Predicted Probability of Having Rational Treatment Change and Identifying Strategies for Treatment Change According to Different Thresholds ............. 216 vi 6.2 Discussions........................................................................................................................ 220 6.2.1 Data Management of Missing Values ........................................................................ 220 6.2.2 Strengths of the Predictive Model.............................................................................. 223 6.2.3 Limitations of the Predictive Model .......................................................................... 227 6.3 Conclusions ....................................................................................................................... 229 7. OPTIMAL TREATMENT REGIME................................................................................... 259 7.1 Results ............................................................................................................................... 259 7.1.1 Creating the Augmented Datasets .............................................................................. 260 7.1.2 Variable Selection for the Weights ............................................................................ 261 7.1.3 Calculating the Weights ............................................................................................. 265 7.1.4 Influence of Applying Different Methods to Calculate Weights ............................... 267 7.1.5 Results of Applying Different Models ....................................................................... 270 7.2 Discussions........................................................................................................................ 274 7.2.1 Strengths .................................................................................................................... 274 7.2.2 Limitations ................................................................................................................. 278 7.3 Applications ...................................................................................................................... 280 7.3.1 Steering the Design of RCTs ..................................................................................... 280 7.3.2 Directing the Clinical Practice ................................................................................... 283 7.3.3 Supporting the Design of Value-based Drug Formulary ........................................... 284 7.4 Conclusions ....................................................................................................................... 286 8. OVERALL CONCLUSIONS AND IMPACTS ................................................................... 361 Appendices A. EXPLORATORY ANALYSIS OF INVESTIGATING THE QUALITY OF DATA IN CFFPR ......................................................................................................................................... 367 B. EXPLORATORY ANALYSIS OF INVESTIGATING THE RELATIONSHIP BETWEEN DRUG APPROVAL AND IRRATIONAL TREATMENT CHANGE ............ 383 C. EXPLORATORY ANALYSIS OF INVESTIGATING THE IMPACT OF DIFFERENT MEASUREMENTS ON THE NUMBER OF VARIABLES THAT WOULD BE SELECTED BY ELASTIC NET .............................................................................................. 387 D. EXPLORATORY ANALYSIS OF INVESTIGATING THE RELATIONSHIP BETWEEN FREQUENCY OF VISIT AND DETERIORATION OF LUNG FUNCTION ................................................................................................................................. 393 E. EXPLORATORY ANALYSIS OF INVESTIGATING THE INFLUENCES OF USING DIFFERENT METHODS TO DEFINE INDEX DATE ON BASELINE VARIABLES AND CLINICAL OUTCOMES .......................................................................................................... 398 F. DATA MANAGEMENT OF MISSING VALUES ............................................................. 418 REFERENCES ........................................................................................................................... 447 vii ACKNOWLEDGEMENTS There are tremendous people who have contributed to the completion of this research intellectually, logistically, or through the provision of moral and emotional support, all of whom deserve my deepest gratitude. First, I would like to thank my advisor and committee chair, Dr. Diana Brixner, who provided guidance and support throughout the journey of my graduate career. Aside from hours spent discussing the complexity of my project, and answering varied questions, Diana has been an endless source of knowledge, advice, and inspiration. Without her, this project certainly would not have been possible. I would like to express my sincere appreciation to Dr. Theodore Liou for acting as a member of my committee and a mentor in the therapeutic area of Cystic Fibrosis (CF). His subject matter knowledge and critical examination of my project substantially elevated the quality of the work, and shortened the time of translating research results into real-world practice. Special thanks are due to Dr. Yue Zhang for providing me with the opportunity to study advanced methods in causal inference, for directing me to solve the endless statistical issues in this project, and for helping me to be a responsible scientist making sure the facts are present. I would also like to thank Dr. Vanessa Stevens and Dr. David Young for their contributions of time, useful suggestions, and expertise in their roles as members of my dissertation committee. Several biostatisticians in the Causal Inference Group in University of Utah were instrumental in leading me towards the causal inference area, including Dr. Marlene Egger and Dr. Tom Greene. I also owe a debt of gratitude to Dr. Brandon Bellows and Xiangyang Ye from Pharmacotherapy Outcomes Research Center, for their help in the data cleaning and management process, together with suggestions and discussions during my Ph.D. journey. I would like to thank the Cystic Fibrosis Foundation for the use of CF Foundation Patient Registry data to conduct this study. Additionally, I would like to thank the patients, care providers, and clinic coordinators at CF Centers throughout the United States for their contributions to the CF Foundation Patient Registry. With their endeavors, time, and contributions, I was able to access this fabulous database, which made the investigation of this complex research possible. My fellow doctoral students, past and present, are also deserving of many thanks. In particular: Mukul Singhal and Junjie Ma for their continuous friendship, moral support, and fabulous jokes, and Yan Cheng for the numerous methodological discussions and references during the evolution of my dissertation. Last but not least, I would like to express my deepest appreciation to my family and friends, without whom none of this would have been possible. In particular, to my parents, Zhihua and Lifeng, who impressed upon me the importance of education, the meaning of being skeptical, and continually supported me through every stage. I am also thankful to my grandparents, Chunju and Wenbin, who taught me to be a better person, and directed my career towards the healthcare field. I would like to thank my dearest ix friends Jingran Wen and Miaomiao Zhang, who have celebrated and commiserated with me during this fabulous journey. x CHAPTER 1 EXECUTIVE SUMMARY Cystic Fibrosis (CF) is the most common life-shortening autosomal recessive disorder, which causes mutations in the CF gene on the long arm of chromosome 7 that encodes the cystic fibrosis transmembrane conductance regulator (CFTR) protein.1-4 Those mutations on the CF gene can disrupt CFTR function within epithelial cells in various ways, ranging from completely losing protein to surface expression with poor chloride conductance.5 In the United States, there are approximately 30,000 individuals suffering from CF and around 1,000 new cases are diagnosed each year; worldwide, there are approximately 60,000 sufferers.6,7 Currently, having made great strides in health technology and understanding this disease, CF patients born today have a median survival of nearly 40 years.8 This is a significant improvement, compared to the 6 months expected survival time in 1938, when CF was first identified. Given the longer survival time for patients with CF, many comorbidities have emerged, such as chronic pulmonary infection, gastrointestinal symptoms,9 and metabolic bone disease. Nearly 85% of the deaths related to CF are caused by lung disease.10 CF lung disease begins early in life with inflammation, impaired mucociliary clearance, and initial airway colonization by pathogens, then progresses to chronic infection of the airways. For CF patients, those pulmonary 2 infections cause progressive decline of their lung functions, with episodes of acute worsening of respiratory symptoms, which are defined as pulmonary exacerbations (PEx). By deteriorating the lung functions from two trajectories, those pulmonary infections, especially the chronic pulmonary infections, can significantly shorten overall survival.11 Pseudomonas aeruginosa is the most common12 and significant life-threatening pathogen13 that causes pulmonary infection in pediatric patients. There are two colony phenotypes of P. aeruginosa: nonmucoid P. aeruginosa (nonmucoid Pa) and mucoid P. aeruginosa (mucoid Pa). Generally speaking, the median age of developing nonmucoid Pa and mucoid Pa is 1 and 13 years old, respectively.14 Compared with nonmucoid Pa, mucoid Pa has much stronger virulence traits. These traits are associated with irreversible damage of lung function,14,15 and quicker and more frequent pulmonary exacerbation. Unlike nonmucoid Pa, which may be eradicated by aggressive antibiotics for P. aeruginosa, mucoid Pa is much more difficult to treat or eradicate with current antibiotics, due to the pathogen's ability to form a biofilm. Mucoid Pa's ability to produce a biofilm allows for persistent infection, and renders itself resistant to various antibiotics,16-19 which results in a poor prognosis for patients.14 Because of this, mucoid Pa caused pulmonary infection (mucoid PaPI) is always applied as an indicator of disease progression. During a CF patient's life, multiple treatments are needed to maintain health and improve survival. Basically, those treatments aid a CF patient in three areas: lung health, nutrition, and gene expression. Among them, the maintenance treatments for lung health are the most vital and are classified as short-term treatments, chronic treatments, and airway-clearance techniques. Short-term treatment includes all treatments that use 3 medication to temporarily treat PEx, such as intravenous (i.v.) antibiotics and oral antibiotics. Chronic treatment covers mucolytics, inhaled antibiotics, specific oral antibiotics, anti-inflammatory medication, and bronchodilators. The airway clearance techniques (ACT) involve cough, percussion, or vibration to loosen mucus from airway walls. On average, patients receive one or several of those treatments for 35 years. Benefiting from the closer monitoring and innovative therapies, the double-edged sword, health resource utilization has increased dramatically, which enhances the clinical outcome and healthcare expenditure at the same time. For CF patients over 30 years of age, total medical costs per year have more than doubled from $20,536 in 2001 to $56,116 in 2007, using 2007 dollars. The increase is even more dramatic, from $3,060 to $31,723, for patients under the age of 11.20 A large amount of CF-related spending is prescription costs, especially for treating chronic pulmonary infection. Ouyang et al.,21 using insurance claims from 2004-2006, found that, compared to a matched sample without CF, medical expenditures were nearly $50,000 per year using 2006 dollars, more than 22 times greater than the matched sample. More than a third of these expenditures were for prescription drugs. Another study22 reported that for inhaled antibiotics and mucolytics, which are used to maintain lung function, each one class costs more than a quarter of the overall annual health expenditure for treating CF. More importantly, even though the annual cost varies according to age or disease severity, the percentage of inhaled antibiotics and mucolytics taken by the patients remains the same. This indicates that no matter how sickly or healthy the CF patient is, on average, they are taking inhaled antibiotics and mucus active drugs with the same frequency. Making things even worse, after including two expensive, gene-based therapies, which were launched after 2012, the 4 prescription cost is expected to increase dramatically. Even though these treatments are very effective, considering the long-term utilization and high price, $312,000 and $259,000 for ivacaftor (Kalydeco®, Vertex Pharmaceuticals) and lumacaftor/ivacaftor (Orkambi™, Vertex Pharmaceuticals), respectively,23 the barriers for patient access to these treatments are nearly insurmountable. Given the huge economic burden and enormous spending on lung health maintenance medications, the evidence to differentiate suboptimal from optimal treatment status for each patient is urgently needed. However, the evidence has not existed either in any publication or in the guidelines. Rather than suggesting when a suboptimal treatment status has been achieved and a treatment change is needed, the guidelines only categorize all treatments by the certainty of net benefits. Additionally, those certainties were summarized by existing RCTs, which had small sample size and extremely narrow characteristics to represent the whole patient population. In contrast, during clinical practice, healthcare providers are facing varied patients case-by-case; each individual has unique characteristics ranging from demographic characteristics, disease severity, and treatment pattern to personal preference. The causality between suboptimal treatment status, which indicates by a treatment change, and time to delay in acquisition of mucoid PaPI must be investigated. Ideally, an RCT is supposed to identify causal effect by analogously gathering data through a randomized assignment of treatment, perfect compliance, and no-right censoring. However, the enormous time and monetary cost for an RCT, together with long-term follow-up and the tremendous sample size, makes the idea of conducting an RCT to capture the causal effect with dynamic treatment regimes impossible. A 5 longitudinal, retrospective observational database is the most appropriate source of data for constructing dynamic treatment regimes (DTRs) as complicated as the one in this study. Combined with the design of DTRs and methods from causal inference, an observational database is able to account for the above issues perfectly. Moreover, an observational database reduces the chance of violating unmeasured confounder (conditional exchangeability) assumption compared with an RCT, since it captures all of the information that exists in physicians' hands, when a decision to change treatment is about to be made. Last but not least, the observational database captures many useful variables, which may include innovative variables to aid decision-making for achieving optimal treatment effects. To summarize, there are several unsolved issues for steering the utilization of chronic treatments: 1) guidelines were generated according to the net benefits of each individual treatment, which were investigated in RCTs with short-term follow-ups and a small sample size; 2) no study investigated the treatment change pattern; 3) lack of evidence-based direction on when and how to make treatment change; 4) the economic burden was huge; 5) preliminary results were needed before conducting an RCT. Given the above issues, a retrospective observational study, which emulated RCT to investigate the treatment change pattern and the causality between suboptimal treatment status and time to delay in acquisition of mucoid PaPI, was conducted using a national patient registry, the Cystic Fibrosis Foundation Patient Registry (CFFPR). The primary objective of this study was to examine the treatment initiation and change in patients diagnosed with new or continuing nonmucoid PaPI. The second objective was to investigate the optimal treatment regime to delay the acquisition of 6 mucoid PaPI for pediatric CF patients. Those two objectives were investigated considering three aims: 1) to analyze the treatment change pattern in the current database for CF patients diagnosed with nonmucoid PaPI; 2) to predict the probability of having a rational treatment change given patients' demographic characteristics, comorbidities, clinical signals, and treatment histories; 3) to investigate the strategy for rational lung treatment change, which maximized the delay in acquisition of mucoid PaPI, specifically in patients diagnosed with nonmucoid PaPI. A large cohort of CF patients, who were diagnosed with nonmucoid PaPI and had not developed mucoid PaPI from 2006 to 2011 in the United States, was identified. Those patients were young, healthy, and only received minimal multiple chronic treatments at the baseline. Regardless of whether physician only consider the first treatment change or all treatment changes in the cohort, they were prone to change treatment prudently by only prescribing one additional treatment from one of the three treatment classes, inhaled antibiotic, mucolytic, and anti-inflammatory. An instrument that indicated when the suboptimal treatment status has been achieved, and a rational treatment is needed, was successfully generated by including demographic characteristics, comorbidities, clinical signals, and treatment histories. Given various thresholds of predicted probability of having rational treatment change and relative change of predicted probability of having rational treatment change between the current and previous visit, which was predicted using the instrument, 25 DTRs for making rational treatment change were generated. Patients who did not follow any regime to receive treatment changes encountered the worst outcomes than those following any regime. Among the patients who followed different DTRs, with the increase of threshold of relative change of 7 predicted probability, the hazard ratio of developing mucoid PaPI increased first, then decreased. The regime, in which the threshold of relative change of predicted probability equaled 1.831%, always caused the worst outcome among the regimes that shared the same threshold of predicted probability. An optimal strategy was identified (among 25 strategies) that maximized the time to infection with mucoid PaPI. With the results of this study, healthcare providers could switch from experiencebased to evidence-based decision-making. The probability of having rational treatment change and DTR strategy aids in identifying suboptimal treatment status, and supports the personalized decision-making of treatment change to maintain optimal treatment effects. At the same time, the study results could also assist value-based insurance design by optimizing traditional treatment utilization prior to reimbursement for extremely expensive medications, through step therapy, tiered formulary, prior authorization, and other tools of managed care pharmacy. The results of this study provide preliminary evidence of when and how to make a change to chronic lung treatments for pediatric CF patients using retrospective observational study to emulate RCT. Further analyses are needed to confirm the evidence using RCTs. CHAPTER 2 BACKGROUND AND SIGNIFICANCE 2.1 Cystic Fibrosis and Clinical Issues in its Management 2.1.1 Pathophysiology and Incidence Rate Cystic Fibrosis (CF) is the most common life-shortening autosomal recessive disorder, which causes mutations in the CF gene on the long arm of chromosome 7 that encodes the cystic fibrosis transmembrane conductance regulator (CFTR) protein.1-3 Those mutations on the CF gene can disrupt CFTR function within epithelial cells in different ways, ranging from completely losing protein to surface expression with poor chloride conductance.5 In the United States, there are approximately 30,000 individuals suffering from CF and around 1,000 new cases diagnosed each year; worldwide, there are approximately 60,000 sufferers.6,7 The majority of CF patients are Caucasian. The incidence rates range from 1/3,700 to 1/1,900 in the U.S. Caucasian population,2,24 while rates reduce to 1/9,000, 1/15,000, and 1/32,000 for Hispanic,25 African American,24,26 and Asian27 populations, respectively. In Europe, the overall incidence rate for the entire population is about 1/3,500.28,29 Currently, having made great strides in health technology and understanding this disease, CF patients born today have a median survival of nearly 40 years.8 This is a 9 significant improvement compared to the 6 months' expected survival in 1938, when CF was first identified. Given the longer survival time for patients with CF, many comorbidities have emerged such as chronic pulmonary infection, gastrointestinal symptoms,9 and metabolic bone disease. Among them, chronic pulmonary infection is the main cause of pulmonary exacerbation (PEx), episodes of acute worsening of respiratory symptoms, and can significantly shorten overall survival.11 2.1.2 Diagnosis and Symptoms Prior to the development of a newborn screening (NBS) test in 1990s, patients were diagnosed with CF using classic signs and symptoms of the disease alone (Table 2.130). The CF NBS is a screening test, broadly utilized in the U.S., which quantifies the immunoreactive trypsinogen (IRT) value, a pancreatic enzyme precursor in a newborn's blood. The concentration is elevated in majority of infants with CF, since pancreatic ducts are blocked and damaged by a flow of secretions with a high protein concentration.31 However, for those people who do not have a CFTR mutation, the IRT value varies only slightly. Whenever there is an abnormal IRT value, the infant either undergoes DNA testing to identify known CFTR mutations (IRT/DNA strategy), or a second blood sample to measure IRT is collected when the infant is about 2 weeks old.32 Of all the screening tests, even those with 90% to 95% of sensitivity,33,34 NBS alone only identifies newborns at risk for CF, not performing as an ultimate gold standard diagnosis tool. The sweat chloride test, which measures sweat electrolyte concentrations using the Gibson-Cooke35 method, is still the gold standard on which a diagnosis of CF should 10 be made. Considering how the sweat chloride values for a newborn decline gradually,36 this test should only be measured after the infant is 2 weeks old. Sweat chloride values are universally categorized into three groups: normal (<=39 mmol/L), intermediate (4059 mmol/L), and abnormal (>=60 mmol/L). These categories do not take age into consideration, which may cause uncertainty due to the increase on sweat chloride, when an individual ages from infant to teenager. Given the uncertainty of the sweat test, together with the fact that genotype analysis can identify mutations on the CFTR gene that do not cause CF, the Cystic Fibrosis Foundation (CFF) has suggested that doctors arrive at a diagnosis of CF through combined strategies.37 If infants have a positive NBS, and have sweat chloride values equal to or greater than 60 mmol/L, then a CF diagnosis is confirmed. If an infant has a sweat chloride value equal to or less than 29 mmol/L, a diagnosis of CF is very unlikely, unless it arises from a rare phenotype. Infants with a positive NBS test result and with a sweat chloride value within the intermediate range should be given an extra CFTR mutation assessment. The diagnosis can be confirmed with the presence of two CF causing mutations. With no, or only one, CF mutation, no finalized diagnosis should be made until after a follow-up clinical assessment and another sweat chloride test conducted after the infant is 2 months old.37 With these advanced tools, the diagnosis of an infant is close to reality, but still with a measure of uncertainty. 2.1.3 Current Treatments During a CF patient's life, multiple treatments are needed to maintain health and improve survival. Basically, those treatments aid a CF patient in three areas: lung health, nutrition, and gene expression. These three treatment areas are discussed in the following paragraphs. 11 The maintenance treatments for lung health are classified as short-term treatments, chronic treatments, and airway clearance techniques. Short-term treatment includes all treatments that use medication to temporarily treat PEx, such as intravenous (i.v.) antibiotics and oral antibiotics. Chronic treatment covers mucolytics, inhaled antibiotics, specific oral antibiotics, anti-inflammatory medication, and bronchodilators. The airway clearance techniques (ACT) involve cough, percussion, or vibration to loosen mucus from airway walls. A better treatment effect may be achieved when treating a patient with bronchodilators and inhaled antibiotics before and after ACT. Nutrition is a major component in maintaining health for CF patients. Maintaining optimal nutrition involves taking minerals, vitamins, and pancreatic enzymes. As with the lungs, CF causes the pancreas to produce thick mucus that blocks the release of enzymes needed for proper digestion. Benefiting from enteric coating, pancreatic enzyme supplements could be released in the small intestine directly enhancing the patient's digestion ability. Cystic fibrosis transmembrane conductance regulator (CFTR) modulators are gene-based therapies, which were designed to correct the function of the defective protein directly. This allows chloride and sodium to move properly in and out of lung and organ cells. Gene-based therapies are treatments that address the cause of CF rather than simply modifying symptoms. Ivacaftor and lumacaftor are two compounds that belong to this therapeutic class. Kalydeco® (ivacaftor) and Orkambi™ (lumacaftor and ivacaftor) were approved by FDA on Jan 31st, 2012, and July 2nd, 2015, respectively, and have already been released into the market. 2.1.4 Health Resource Utilization and Cost 12 Benefiting from the closer monitoring and innovative therapies, the double-edged sword, health resource utilization has increased dramatically, which enhances the clinical outcome and healthcare expenditure at the same time. The economic burden of CF is substantial. Briesacher et al.20 show that the improved outcomes of CF patients are linked to closer monitoring of patients. For example, annual pulmonary function testing increased 53% from 2001 to 2007. The use of respiratory cultures more than doubled over the same time period; utilization of lung maintenance therapy, such as dornase alfa and oral antibiotics, also increased. With additional utilization on testing and therapy, both short-term clinical outcomes and survival saw marked improvement, while at the same time, the cost of treating the disease also saw marked increases. For CF patients over 30 years of age, total medical costs per year have more than doubled from $20,536 in 2001 to $56,116 in 2007 using 2007 dollars. The increase is even more dramatic, from $3,060 to $31,723, for patients under the age of 11.20 A large amount of CF-related spending is prescription cost, especially for treating chronic pulmonary infection. Ouyang et al.,21 using insurance claims from 2004-2006, found that compared to a matched sample without CF, medical expenditures were nearly $50,000 per year using 2006 dollars, more than 22 times greater than the matched sample. More than a third of these expenditures were for prescription drugs. Another study22 reported that for inhaled antibiotics and mucolytics, which are used to maintain lung function, each one class costs more than a quarter of the overall annual health expenditure of treating CF. More importantly, even though the annual cost varies according to age or disease severity, the percentage of inhaled antibiotics and mucolytics 13 taken by the patients remains the same. This indicates that no matter how sickly or healthy the CF patient is, on average they are taking inhaled antibiotics and mucus active drugs with the same frequency. O'Sullivan et al.38 also found that CF patients who experienced pulmonary infections spent $20,000 for medication, more than 40% of the overall annual spending. Since two expensive, gene-based therapies were launched after 2012, the prescription cost is expected to increase dramatically. Even though these treatments are very effective, considering the long-term utilization and a price, of $312,000 and $259,000 for ivacaftor (Kalydeco®, Vertex Pharmaceuticals) and lumacaftor/ivacaftor (Orkambi™, Vertex Pharmaceuticals), respectively,23 the barrier for patient access to treatment may be significant. Given the huge economic burden and enormous spending on lung health maintenance medication, a way to differentiate suboptimal from optimal treatment status for each patient is urgently needed. With such an ability to differentiate, treatment changes could be made to maintain optimal treatment effects for CF patients before considering expensive drugs, and the value-based pharmacy formulary could be optimized, steering society to spend limited health resources more efficiently. 2.2 Pulmonary Infection Nearly 85% of the deaths related to CF are caused by lung disease.10 CF lung disease begins early in life with inflammation, impaired mucociliary clearance, and initial airway colonization by pathogens, then progresses to chronic infection of the airways. For CF patients, those pulmonary infections cause progressive decline of their lung function, with episodes of acute worsening of respiratory symptoms, PEx. 2.2.1 Pseudomonas aeruginosa 14 Pseudomonas aeruginosa is the most common12 and significant life-threatening pathogen13 that causes pulmonary infection for pediatric patients. There are two colony phenotypes of P. aeruginosa: nonmucoid P. aeruginosa (nonmucoid Pa) and mucoid P. aeruginosa (mucoid Pa). Generally speaking, the median age of developing nonmucoid Pa and mucoid Pa is 1 and 13 years old, respectively.14 Compared with nonmucoid Pa, mucoid Pa has much stronger virulence traits. These traits are associated with irreversible damage of lung function,14,15 and quicker and more frequent pulmonary exacerbation. Unlike nonmucoid Pa, which may be eradicated by aggressive antibiotics for P. aeruginosa, mucoid Pa is much more difficult to treat or eradicate with current antibiotics due to the pathogen's ability to form of a biofilm. Mucoid Pa's ability to produce a biofilm allows for persistent infection, and renders it resistant to various antibiotics,16-19 which results in a poor prognosis for patients.14 What causes nonmucoid Pa to transition to mucoid Pa has not been comprehensively studied, but current evidence supports the theory that the conversion is driven by the unique CF microenvironment39,40 which provides the pathogen some protection from dehydration.41,42 2.2.1.1 Intermittent Pseudomonas aeruginosa Pulmonary Infection Pseudomonas aeruginosa can exist in a CF patient from an early age. When testing patients for Pa, children younger than 1 have tested positive when testing for the Pa antibody. However, patients don't usually test positive for Pa through cultures of the upper or lower airway for Pa until they're older.43 Initially, Pa pulmonary infection (PaPI) occurs transiently, so it named either intermittent PaPI or initial PaPI. Several risk 15 factors are associated with the occurrence of intermittent PaPI, such as female, homozygous F508 genotype, and Staphylococcus aureus isolation.44 It is possible to treat intermittent PaPI through aggressive therapy, but as time passes, the pathogen adapts to the airway by developing a mucoid phenotype, which is difficult to eradicate. That is when PaPI progresses to a chronic condition in CF patients' lower and upper airways.45 Therefore, current guidelines recommend early treatment of initial PaPI,46 so as to reduce the prevalence of this pathogen within the body and delay the progression to chronic PaPI in order to improve prognosis. 2.2.1.2 Chronic Pseudomonas aeruginosa Pulmonary Infection Chronic infection can be defined as an infection that persists despite appropriate treatment, immune, and inflammatory response from the host. Moreover, in contrast to bacterial colonization, chronic infection is characterized by persistent pathology and immune responses.47 Currently there is no universally accepted definition of chronic P. aeruginosa infection. Most of the currently used definitions are based on frequency and the results of microbiological assessment of secretions from the respiratory tract of CF patients. Several definitions of chronic PaPI in CF that have been published or used either in clinical settings or for research purposes are listed in Table 2.2. 2.2.2 Other Infections Other than P. aeruginosa, several pathogens can also cause pulmonary infection. Staphylococcus aureus and Haemophilus influenza are the most frequent causes of early infection in airways of CF patients. As time passes and the disease progresses, more 16 pathogens may occur, from P. aeruginosa to late emerging pathogens such as Burkerholderia cepacia, fungi, which include Aspergillus species and nontuberculous mycobacteria. The most commonly found form of nontuberculous mycobacteria, mycobacterium avium complex, causes mycobacterium avium-intracellulare infection (MAI), which is also a chronic infection.12 2.3 Chronic Medications for Maintaining Lung Health Based on Cystic Fibrosis Pulmonary Treatment Guidelines,48 there are several treatment classes available for patients 6 years of age and older with moderate to severe disease. These medications include mucolytics, bronchodilators, inhaled antibiotics, and anti-inflammatory medications. Among these drugs, with sufficient evidence, current guidelines highly recommend the utilization of dornase alfa and inhaled antibiotics for patients with P. aeruginosa. 2.3.1 Inhaled Antibiotics In the U.S., two inhaled antibiotics, inhaled tobramycin and inhaled aztreonam, have been approved by the FDA. Tobramycin is an aminoglycoside antibiotic, used to treat Gram-negative infections particularly and especially effective against Pseudomonas species. Aztreonam is a monobactam antibiotic under β-lactam class, also used primarily to treat infections caused by Gram-negative bacteria. Generally speaking, the inhalation route is a fast and effective way of delivering medication locally to the lungs together with attractive characteristics compared with traditional route, such as painless and flexible administration, rapid onset of action, lower dosing, avoidance of first pass 17 metabolism, and potentially fewer side-effects.49,50 Nebulizer and metered dose inhaler is the device to supply the medication as an aerosol created from solution or suspension formulation.49,50 Dry-powder inhaler, a simple, fast, and convenient delivery system, releases powdered medication directly to the lungs.49,50 Colistin, as a polypeptide antibiotic, is effective against most Gram-negative bacteria and has been used as a first line approach to suppress chronic P. aeruginosa in the UK and Europe. Even though not approved in the U.S., inhaled colistin may still be given to CF patient as off-label treatment in U.S. Several randomized clinical trials have shown that due to the ability to deliver high concentration of drug into the lungs directly, inhaled antibiotics, especially tobramycin and aztreonam for P. aeruginosa, have a stronger treatment effect51-65 than oral antibiotics, even if pathogens have already developed drug resistance. Other than one study,58 which had 56 weeks follow-up, most of these studies had less than 6 months of follow-up. 2.3.2 Other Lung Health Maintenance Medication Dornase alfa, hypertonic saline, azithromycin, and high dose ibuprofen are four other medications that deliver moderate to substantial treatment effects and are recommended by the guidelines.48 Dornase alfa has been developed to cleave high molecular weight DNA which, when released by dead neutrophils, contributes to the tenacity of airway phlegm.66,67 Hypertonic saline directly delivers salt and water to the lungs, restoring airway surface hydration to improve mucociliary clearance in vivo.68 Azithromycin is a macrolide that is most frequently prescribed as an oral antibiotic for 18 patients with CF. A significant part of the treatment effects of this medication are due to its function as an anti-inflammatory medication-decreasing the number of neutrophils at the site of infection69,70 and reducing the pro-inflammatory cytokines that recruit more neutrophils.71,72 Unlike the above three medications, which are suggested for broad utilization, high dose ibuprofen, given the rare, but serious adverse events associated with it,73 together with the scant data on use in adults, is only suggested for children. 2.3.3 Other Medications Aside from those medications recommended by the current guidelines, several other chronic medications are also prescribed to treat CF related lung disease. These include corticosteroids, 𝛽𝛽2-adrenergic receptor agonists, antifungals, clarithromycin, and inhaled colistin. As anti-inflammatory medications, corticosteroids have conflicting treatment effects on reducing the rates of pulmonary function decline.74-76 Because of these, they are only suggested for CF patients with asthma. Due to insufficient evidence on their efficacy, inhaled 𝛽𝛽 2-adrenergic receptor agonists are also not suggested for chronic use. Antifungals and a combined therapy that includes clarithromycin, rifampin, and ethambutol used to treat Aspergillosis species and MAI, respectively, are rarely prescribed chronically because of the relative low incidence of those pathogens. 2.3.4 The Dilemma of Maintaining Lung Health Therefore, from a short-term perspective, maintaining current inhaled antibiotics for patients infected by chronic P. aeruginosa, regardless of drug resistance, seems to be the best choice. However, unlike initial colonization of nonmucoid P. aeruginosa, which 19 is more easily eradicated, chronic P. aeruginosa is difficult to cure and long-term drug suppression is the only option. With inhaled antibiotics suppression that lasts longer than 1 year, drug resistance may easily occur. Without appropriate treatment, increasing drug resistance may decrease the time of the transition from nonmucoid PaPI to chronic mucoid PaPI.77 Given the consistency of the microbial community structure before and after treatment with antibiotics for pulmonary exacerbation, the progression from nonmucoid PaPI to chronic mucoid PaPI could be the main reason for decreasing lung function and increasing incidence rate of pulmonary exacerbation. The dilemma then becomes how to obtain optimal treatment effects over the long term considering lung function deterioration, the existence of drug resistance, and other clinical variables. 2.3.5 Treatment Classifications All treatments will be classified into five classes according to current guidelines48 and their functions as per Table 2.3. Theses classes are: mucolytics, inhaled antibiotics, anti-inflammatories, bronchodilators, and other chronic treatments. Mucolytics aim to alter the properties of lung phlegm to make it easier to clear from the airways. Inhaled antibiotics directly fight against and suppress bacterial pathogens isolated from the respiratory tract. In order to reduce neutrophils in the lungs, which increase the viscosity of the CF spectrum and damage lung structure,78 anti-inflammatories are prescribed. Bronchodilators dilate bronchi and bronchioles, decrease resistance in the respiratory airway, and increase the airflow to the lungs. Other chronic treatments are used against various pathogens or comorbidities that may accompany Pa. Since treatments in the same class have similar treatment effects and improve the lung function from the same 20 mechanism, physicians could prescribe them interchangeably in clinical practice. If additional treatment effects are needed from a specific mechanism, then additional treatments in that class should be prescribed. 2.4 Signals for Clinical Decisions In clinical practice, FEV1%, which measures the proportion of a patient's forced expiratory volume in 1 second (FEV1) against the predicted forced expiratory volume in 1 second for a hypothetical healthy person sharing the same demographic characteristics as the patient, is the gold standard for measuring disease severity.11,48,79 The relative change between FEV1% at the current visit and the optimal FEV1% in the past year is the measure that healthcare providers use to steer treatment change ( ∆𝐹𝐹𝐹𝐹𝐹𝐹1% = 𝐹𝐹𝐹𝐹𝐹𝐹1%−𝐹𝐹𝐹𝐹𝐹𝐹1𝑜𝑜𝑜𝑜𝑜𝑜% 𝐹𝐹𝐹𝐹𝐹𝐹1𝑜𝑜𝑜𝑜𝑜𝑜% ). The number of PEx that occurred in the previous year also informs decisions on treatment change. A decreasing FEV1% or more PEx reflects a decline in lung function, which may be caused by new infections, a failure to respond to current treatment, or other CF related comorbidities. If evidence supports a conclusion that a specific treatment is having a poor response, then healthcare providers should adjust the treatment accordingly by switching or adding on one or more new treatments, or stopping the current treatment. Hypothetically, drug resistance should also work as a signal for treatment change. However, that is not always the case for patients with CF because, as mentioned previously, unlike intravenous antibiotics, inhaled antibiotics deliver a high concentration of medicine directly to the lungs, providing far more medicine than needed. Together with the reality that the arsenal of applicable inhaled antibiotics is limited, healthcare providers may ignore drug resistance as a signal for treatment change. 21 2.4.1 Predicted Normal FEV1% Forced expiratory volume in 1 second (FEV1), together with its derivatives, is the most widely employed clinical measurement for lung disease progression in CF.80 Compared to other spirometric variables that are applied to guide and monitor treatment such as forced vital capacity (FVC) and forced expiratory flow at mid-vital capacity (FEV25-75), FEV1 also serves as the key short-term endpoint in most clinical trials. In clinical practice, the relative change between predicted FEV1% in the current visit and the maximum value of predicted FEV1% among all visits that occurred in the past 1 year (ΔFEV1%) is always used as a key clinical signal to eliminate short-term fluctuation of FEV1. In order to calculate the relative change of predicted FEV1%, the provider first measures FEV1% in each encounter visit, using the equation 𝐹𝐹𝐹𝐹𝐹𝐹1% = 𝐹𝐹𝐹𝐹𝐹𝐹1𝑜𝑜𝑜𝑜𝑜𝑜 𝐹𝐹𝐹𝐹𝐹𝐹1𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 , where observed FEV1 at that visit is the numerator, and predicted FEV1 for a hypothetical healthy person given the same characteristics the observed patient had at that visit is the denominator. Since the 1980s, several algorithms to predict FEV1 for a hypothetical healthy person have been created, but currently, the majority of accredited hospitals follow the latest model, the NHANES prediction algorithm, which can predict a normal FEV1 or reference value given a person's age, height, gender, race, and ethnicity. Among all traditional prediction algorithms that assumed a fixed distribution for each parameter in advance with constant variability across the lifespan, the NHANES81 algorithm is the most complex one, with highly accurate predictions. However, since two algorithms are applied independently for adolescents and adults, when a patient is transitioning from an adolescent to an adult, the prediction of a reference FEV1 is not smooth. Besides the 22 NHANES algorithm, several hospitals apply the Crapo82-84 method to predict reference FEV1 values for young adults, but there are no guidelines to suggest which prediction algorithm is optimal for which age. Table 2.4 lists several well-accepted algorithms for predicting reference FEV1. In 2012, an innovative spirometry prediction algorithm was published. Over 160,000 records from 72 centers in 33 countries were shared with the European Respiratory Society Global Lung Function Initiative (GLI). After excluding records that had missing and outlying data, 97,759 records of healthy nonsmokers aged from 2.5 to 95 years were fed into the prediction model. Besides the inclusion of the pediatric population and a huge sample size, the prediction algorithm also applied an innovative parametric method, the Lambda-Mu-Sigma (LMS), which allows simultaneous modeling of the median (mu), the coefficient of variation (sigma), and the skewness (lambda) of a distribution family.85 Benefiting from uniqueness, the LMS method is able to convert individual's measurement into a Z score, normally distributed with a mean of 0 and a standard deviation of 1. Specifically, the lower limit of normal for all spirometric values will be calculated as the 5th percentile of the distribution of Z scores.86 Compared to the traditional algorithm, it simultaneously incorporates the relationship between height and age into the prediction and provides smoothly changing curves for the transition from childhood to adulthood. In order to appropriately predict normal lung function as a reference value for prediceted FEV1%, in addition to the coefficients for age, height, and race, an agevarying coefficient for the median spline is also needed. A cross-sectional analysis using data from the UK Cystic Fibrosis Trust Registry demonstrated significant differences in 23 interpretation of spirometry results between the GLI algorithm and other traditional algorithms. Differences on population level for each age are limited, while individual patient results are quite discrepant, especially in young children, adolescents, or patients older than 50 years. 2.4.2 Pulmonary Exacerbation CF lung disease is characterized by intermittent episodes of acute worsening of respiratory symptoms such as cough, sputum production, hemoptysis, often manifesting together with systemic symptoms such as weight loss and fatigue. These changes in respiratory signs and symptoms, which also necessitate additional treatments, are termed pulmonary exacerbation (PEx). PEx also has a significant impact on short-term mortality,11 quality of life,87-89 and healthcare expenditure and serves as an indicator in the acquisition of new pathogens. While PEx equals FEV1% in clinical importance, there is no standard definition for PEx. In fact, PEx has several nonvalidated definitions. The components among those definitions are varied, basically consisting of a constellation of symptoms, physician examination, and lab test results. Table 2.5 contains components of several well-accepted definitions. Three studies have examined components of these definitions in order to create a model for perfect prediction of PEx using either clinical data90,91 or patients' and healthcare providers' opinions.92 Each of those studies indicates that symptoms, rather than physician examinations or laboratory values, were found to be more predictive for PEx. The only drawback of these studies is that the analyses90,92 failed to measure the severity of PEx as an outcome. In order to create a unified definition for PEx in the future, appropriate identification on the severity of PEx, as well 24 as including the variables of physician examination, laboratory values, and symptoms are needed. For my dissertation, PEx is captured by a question, which is self-assessed exacerbation with four levels, absent, mild, moderate, and severe using Cystic Fibrosis Foundation Patient Registry Questionnaire. 2.5 Prescribing Decisions Prescribing decisions are a series of complex composite decisions, which are affected by both internal and external factors. The prescribing decision-making is definitely dominated by the internal factors that a physician has, but it is also influenced by external factors, such as the impact and pressure from senior physicians, pharmaceutical representatives, and patients. The first two sections use primary care physicians as an example, exploring the internal and external factors that influence prescribing decision-making. Then, it focuses on the unique issues on prescribing antibiotics and treating chronic lung diseases. At the end, the treatment change, especially rational treatment change, will be explained. 2.5.1 Internal Factors that Influence Prescribing Decision-making Internal factors include two perspectives: from the physician himself such as personality traits, medical training, and clinical experiences, and from therapy such as the effectiveness, efficacy, and safety. Among them, clinical effectiveness and safety of a therapy is the most important one,93 since the first step of a prescribing decision-making is to decide whether a treatment is required, by weighing the trade-off between benefit and risk. The trade-off is determined by clinical effectiveness, and safety together with 25 physician-perceived medical needs from a patient.94 Once physicians have decided to prescribe, they will then decide whether to choose a drug which they know well, or adopt a new therapy. Each doctor has a personal set of drugs with which he/she is familiar and always chooses from it when confronting a patient.94,95 Other than balancing the benefits and risks through the above process, three different decision strategies could be applied as shortcuts to prescribe from the set directly: pragmatic, intuitive, and emotional strategies.94 Internal factors are involved in the above strategies, but not all of them are scientifically reasonable: personal experiences and emotions are examples. The pragmatic approach recognizes that physicians do not continually consider the same trade-offs, so for repetitive situations, or routine visits without any change of symptoms, they will adopt the previous treatments according to their clinical experiences. The intuitive approach indicates that the decision may sometimes be based on intuition and personal experience. For example, a physician prescribes a specific treatment to a patient because it worked well in previous cases, which ignores the difference of clinical signals between current and previous patients. The emotional approach highlights that other than cognitive factors, emotional factors may also drive prescribing behavior, even though they may conflict with physicians' judgment. For example, a patient requests a specific treatment, which he saw on advertisements. The physician may feel the required treatment is inferior to another one, but the patient is being forceful as he believes what he saw. Therefore, the physician may follow the patient's request. Several internal factors are highly involved in the decision of initiating new therapies. When physicians personally make the decision to initiate a new therapy, they 26 are influenced by its perceived economic or pharmacological advantages over alternatives.96 Internally, physicians accumulate the economic and pharmacological knowledge of new therapies through a series of pathways, including peer-reviewed medical journals, guidelines, medical textbooks, and proceedings of conferences.93 Personality traits, especially the variety of attitudes to innovation, risk perception, and benefit, significantly differentiate physicians' behavior on prescribing new therapies. Prosser et al.96 ranked physicians as low, medium, and high prescribers of new therapies, according to the local health authority prescribing data. All groups felt they would only prescribe a new therapy when they believe it offered a relative advantage over current therapy. Compared with low prescribers who treated risks with a ‘wait and see' policy, high prescribers either accepted the risks and uncertainty, or considered risks had been minimized by the licensing authority when approving the therapy. 2.5.2 External Factors that Influence Prescribing Decision-making In the modern healthcare environment, especially in the hospital setting, external factors are also highly involved in the prescribing decision-making, other than internal factors which affect physicians' prescribing behavior directly. For example, the medication policies are often steered by the pharmacy and therapeutics (P&T) committee, and clinical pharmacists always play key roles in suggesting or even automatically switching medications.97 Even though the internal factors that create to analyze physicians' prescribing behavior could be generalized to other behaviors, the influence of the P&T committee and clinical pharmacists is unique for determining prescription. Basically, there are three sources of external factors: colleagues in the hospital, 27 pharmaceutical representatives, and patients. Colleagues and representatives are cited as the most common reasons for prescribing a new therapy after failure of current therapy or adverse events.96 According to diverse functions, the colleagues in the hospital could be classified into three groups: senior physicians or specialists, P&T committee members, or clinical pharmacists. Each of them affects the process of decision-making on prescriptions uniquely. Senior physicians or specialists are important influences on physicians' prescribing decision-making,96,98 which is caused by physicians either worrying about the change or that refusal to prescribe may jeopardize their professional relationships,99,100 or believing they are knowledgeable about the new therapies.98 P&T committee members and clinical pharmacists indirectly affect the prescribing decision-making by introducing the importance of cost comparison and peer-reviewed prescribing patterns.101 Pharmaceutical representatives are employed by pharmaceutical companies to promote their products. The majority of the time, they deliver new therapy information packed in ‘bite-size' pieces, which is marketed well, easy to remember, targeted to physicians, and is often accompanied with a free lunch and/or small gifts that relate to therapy. Their impacts on prescribing a new therapy are tremendous, almost the same as clinical experiences of new therapy, which are accumulated internally, from self-learning guidelines, and externally, from colleague endorsements.96 Aside from the huge influences on promoting therapies, negative influences are always associated with pharmaceutical representatives, including inappropriate prescribing,102 increasing medication cost,103,104 and specifically, prescribing earlier, if physicians have accepted samples and gifts.105 28 Patients' socioeconomic status, preferences, and expectations, at the moment of prescribing, also impact physicians' prescribing decisions. Among them, patient preference is the most significant one. The whole healthcare field is transitioning from making decisions by physicians alone to patient-involved decision-making, which integrates the evidence-based medicine with patient preference.106 For example, patients with cancer may decline chemotherapy and trade potential survival benefit for living with current quality of life, after collecting treatment and survival information from physicians, and assessing personal preferences. Patient preference could also affect trivial decisionmaking on prescriptions, such as choosing cream rather than lotion for treating dermatitis. If a patient has low socioeconomics status, a physician may change his prescribing decision, shifting to a cheaper therapy within a therapeutic class or shifting to another therapy covered by the insurance plan.107 Patient expectations also contribute much to the decision-making. Patients who expect medication are about three times more likely to receive a therapy, and the odds ratio even goes up to 10, if a physician thinks that the patients expect getting medications.108 This partially attributes to patient preference, which may differentiate patient's adherence, and thus treatment effect. The phenomenon is mainly affected by physicians' wish to maintain the ‘doctor-patient' relationship,109 even though some of the expectations could go against physicians' judgment, such as prescribing antibiotics in unnecessary cases.110 Considering the large amount of factors to into take account, an appropriate prescribing decision that requires the balance between patient-centered and evidence-based care is hard to achieve.111 The foundation is maintaining a good relationship and trust between physician and patient. In conclusion, a couple of internal and external factors tangling together 29 complicates the sophisticated decision-making process. In order to prescribe rationally, weighing the trade-off between benefit and risk, physicians have to fight against the attraction of using short-cuts and handle the pressure from colleagues, pharmaceutical representatives, and patients. 2.5.3 Uniqueness on Antibiotics and Lung Treatment Only one study112 specifically investigated the decision-making of antibiotic prescribing among primary care physicians. Ten Icelandic primary care physicians were involved in the qualitative, semistructured in-depth interviews. Three paths led to prescribing antibiotics. In the first path, physicians believed that the infection can/will interfere with the patient's planned activities, and antibiotics could help. For the second path, the physicians failed to handle patients' pressure, either due to lack of time or because they were too tired to explain that the infection is viral and that antibiotics will not help. The physicians had a neutral attitude in the last path, where they valued the patients' autonomy higher than welfare, and letting the patients make the decision by himself. The above three paths are consistent with the results of studies that did not focus on any therapeutic area. However, the main difference is on the concern of internal and external factors for prescribing decision-making, which is intensively affected by internal factors for prescribing antibiotics. A physician's attitude, restrictive, neutral, or liberal, differentiates the prescribing behavior. Being a restrictive prescriber was influenced by ecological considerations and concerns for producing resistance, while a liberal prescriber was worried about the possible consequences of withholding a necessary antibiotic. A patient's occupation also has a huge impact on prescribing decision-making 30 regardless of their attitudes to the antibiotic. Physicians were quite concerned about the effect of illness on the patient's daily work and life, such as farmers, people in danger of losing their jobs, students during exams periods, and children. When treating a farmer, the physician knows perfectly well that an antibiotic does not cure the common cold, but he may prescribe it as a prophylaxis to protect patient from becoming more ill under exposure to wild nature. Unlike antibiotics being prescribed periodically for general patients, inhaled antibiotics are supposed to be prescribed chronically for patients with CF along with chronic lung health maintenance medications. Therefore, it is more valuable to investigate treatment changes that could optimize long-term outcomes than to come to a better understanding of initiating a treatment. Consider the uniqueness of antibiotics: lung health maintenance medications have to be prescribed chronically, the majority of prescribers are specialists, and alternative treatments are limited; the influence of external factors would therefore be minimized. Internal factors from both physicians' and treatments' perspective dominate the prescribing decision-making for chronic lung health maintenance medications. 2.5.4 Treatment Change Generally speaking, there are two different choices regarding prescriptions: maintaining the previous treatment or making a treatment change. Treatment change can be defined as including one or more of the following events: prescribing a new therapy, making any adjustments to dose and/or frequency, switching to another treatment within the same treatment class or from a different treatment class, or stopping one or more 31 medications. Depending on whether any evidence is associated with it, treatment changes can be categorized into two types: rational and irrational. Before the difference between rational and irrational treatment changes is explained, the concept of rational treatment will be introduced. According to WHO's definition,113 prescribing rational treatments consists of six steps: defining the patient's problem; specifying the therapeutic objective; verifying whether the personal treatment is suitable for this patient; starting the treatment; giving information, instructions, and warnings; monitoring the treatment. We have to believe that physicians try their best to make rational prescribing decisions and feel confident about them; this has been verified by several studies.98,112 In Jacoby et al.,98 physicians are classified as low, medium, and high prescribers according to the likelihood of their prescribing new therapies. When making prescribing decisions, all physicians believe that they themselves are "conservative" and "cautious" based on their personality traits, medical training, and clinical experiences, regardless of the likelihood of prescribing new therapy. While health technology develops fast, rational treatment according to the current evidence may be untenable in future after disease becomes better understood through a comprehensive perspective. For decades, physicians and patients treated as common sense for the notion that higher salt intake is associated with higher blood pressure, a risk factor of heart attack. However, several studies114,115 that have been published recently indicate a tenuous association or even a reverse association. A meta-analysis,115 which combined results from seven RCTs involving a total of 6,250 subjects, found no strong evidence that cutting salt intake reduces the risk of mortality or cardiovascular morbidity. 32 However, another study,114 which included 28,880 subjects from two prospective cohorts, reported that the less sodium the subjects excreted in urine within 24 hours, the greater their risk of cardiovascular-caused mortality. Therefore, reducing salt intake may increase rather than diminish the risk of cardiovascular morbidity. It is definitely hard to achieve rational treatment, even when prescribing lung health maintenance medications to patients with CF. The behavior of prescribing rationally is mainly dominated not by external factors but by internal factors. To achieve rational treatment, physicians have to keep updating their treatment knowledge from the right sources. Even so, in future, when they look back, it is possible that the past decisions on treatments from decades ago are untenable. Rational treatment change functions as a subcategory of rational treatment. Basically, all treatment changes that are supported by evidence in up-to-date studies could be defined as rational treatment changes. All other treatment changes are defined as irrational treatment changes. Irrational treatment changes could be caused by both internal factors and external factors. Compared with the impact from therapy, the characteristics of a physician have more chance to induce irrational treatment change, especially through personality traits, and emotion. For example, if a physician has negative attitudes toward innovation, he may be less likely to prescribe a new therapy even if it has been well investigated and has produced tremendous treatment effects. In contrast, it is easy for him to prescribe a treatment that a patient asked for, even without a good reason, if he is an emotional prescriber. External factors, from colleagues in the hospital to patients to pharmaceutical representatives, could also lead to irrational treatment changes without a scientific 33 rationale. The scientific rationale does not need to be well understood; it can even be a patient preference. However, to quantify the rationale, it has to be a measureable variable. For example, if a patient wants to switch from treatment A to treatment B because of side effects, this change definitely is rational. However, it would be irrational if the patient tried to switch only because of hating treatment A's brand name. In conclusion, only treatment changes supported by evidence would be defined as rational treatment changes. Some of the rationales for treatment change are hard to identify given the sparse applicable information in databases, such as using a patient registry to identify physicians' attitudes to an innovative treatment. Therefore, the definition of rational treatment change would be diverse and unique according to the research question and the database that is available for each study. Further information about how to identify the rational treatment change in this study will be explained in the method section. 2.6 Dynamic Treatment Regimes and Common Applications Dynamic treatment regimes (DTRs) are personalized treatment plans. Formally, a dynamic treatment regime is a sequence of decision-making that specifies how the intensity, frequency, and type of treatments should change to maximize treatment effects depending on a patient's characteristics and needs. It includes two components.116 First, rules for how the treatment level and type should vary with disease progression, which were identified prior to change any treatment. Second, all of those rules are based on time-varying measurements of each individual's specific needs for the treatment. Thus, the rules for a dynamic treatment regime have to be a measure of each individual need, 34 together with decisions on treatment type and level that mirror subject-specific need.117 The definitions of these needs are varied; they could be severe adverse effects, clinical signals that indicate disease progression, or patient preference. DTRs are routinely implemented when there is a danger of serious side effect or when the necessary dose varies across subjects.117 Whenever the clinical signals/risk indicators, such as CD4 count for HIV, move beyond a specific rule-defined threshold, then the treatment is changed.118 If the rules are already known, it is simple to identify those adjustments, especially if the gold standard, randomized control trials exist. Not surprisingly, sometimes the doses, or classes of available drugs are fixed, and the thresholds of those needs or rules are made relying on healthcare providers' experiences or unknown reasons. Under those situations, with the access to a reasonable database, DTRs can be used as an experimental method to identify those potential thresholds. DTRs perfectly fit into the concept of precision medicine and are attractive not only to patients, but also to formulary and public policy decision makers. DTRs particularly apply well to those patients who show needs for treatment adjustments: allowing intensive treatments for better control of the disease, switching treatment entirely to prevent severe adverse effects, or delaying the application of expensive therapies. The above advantages are exactly the characteristics that a cost-effective test or treatment has to acquire summarized in a McKinsey&Company report.119 Therefore, if identified appropriately, DTRs are very likely to save money and time by avoiding unnecessary treatment. At the same time, with scrupulous definition of a priori rule, the use of DTR in a study can estimate the treatment effect more precisely than use nondynamic treatment 35 regime. For example, in an RCT that follows nondynamic treatment regime, the protocol does not allow any change in treatment, regardless of the disease progression or severe side effects. Yet those patients are subjects whose needs occasionally change. They are very likely to require treatment adjustments, which are defined as noncompliance by the protocol. In contrast, studies utilizing DTRs can explicitly provide treatment adjustments, switching, adding, or discontinuing treatment if and when those needs reach a predetermined level.116,120 DTRs can be applied to both RCTs and observational studies. For example, Sequential Multiple Assignment Randomized Trial (SMART) is an innovative RCT design, which combines a unique characteristic-a decision point-into the traditional RCT.121 At each decision point, subjects are re-randomized to one of the available treatment options at that stage. The research plan can contain N+1 stages, given the number of decision points, N, during the overall follow-up time. In the trial, each subject can proceed through stages of treatment as they reach the predetermined level at related decision point. In a two-stage SMART, there is only one decision point: for instance, whether patients get drug resistance after 3 years. To create an example involving CF patients, a two-stage SMART study is created. The responder at decision point is defined as "patient who develops drug resistance after using first stage treatment for 3 years." During the first stage, all participants are randomized to inhaled antibiotics alone or inhaled antibiotic together with preliminary treatments. As the disease progresses, some participants may meet the requirement of the decision point. In the second stage, only responders to the first stage treatment are re-randomized into two groups: adjust (switch 36 or stop) previous treatment or stay on the same treatment. The optimal treatment regimens would identify treatment strategies considering both the baseline characteristics at first stage and second stage. In this case, hypothetically, for a male younger than 20 years old, inhaled antibiotics alone is the optimal treatment, while inhaled antibiotics together with preliminary treatment is the optimal treatment for the rest of the participants. And for female responders or male responders who are older than 20 years old, if they were assigned to inhaled antibiotics alone group, then adjust treatment is the optimal strategy in second stage; if they were assigned to inhaled antibiotics together with preliminary treatment arm, then keeping the same treatment is the optimal strategy. The combination of all the optimal treatment strategies, given the baseline characteristics, and history of treatment is the dynamic treatment regimes. DTRs can be identified within observational databases. The two main issues that a researcher must take into account are 1) the definition of rules or protocols and 2) randomization at each decision point. The importance of the first issue is obvious. DTRs capture the treatment effects on only those patients who follow the rules exactly. Therefore, the definition of the rules is as vital as the identification of exposure, which significantly affects the result. Any blur or inappropriate definition of the rules will definitely bias the parameter estimation. Failure to randomly assign patients into each treatment arm is another issue in observational studies. Several reasons are to blame for this issue: baseline covariates such as gender, race, and genetic information; time-varying covariates such as weight, disease severity, and clinical variables; and time-varying exposures, such as previous treatments. Without appropriately adjusting those associations between different reasons and random treatment assignments, the estimation 37 of causality would be biased. It is because the probability of counterfactually having different treatments would not be even in each time point, and patients using different treatments would not have the same treatment effects as a counterfactual population would get if they received the same treatment. Indeed, it is the core of why (sequentially) randomized treatments are preferred, when applicable, for making inferences concerning DTRs. In conclusion, being able to solve the above two issues is the foundation of appropriately identifying DTR using observational databases.116 Unlike the judge for properly defining the rules, which is obscure, randomization can be identified transparently. Traditional statistical methods, such as stratification and matching, are feasible to investigate the optimal treatment effects of DTRs, as long as the data are of good quality. Those data can be collected from either an RCT or a cohort with an explicit study design that very likely satisfies the sequential randomization assumption. However, if there was no explicit study design when data were collected, which may jeopardize the assumption of sequential randomization, it is necessary to use advanced statistical methods such as a series of methods under causal inference. 2.7 Causal Inference Armed with more advanced study designs and statistical methods, researchers are not satisfied with merely figuring out the association between exposure and outcome. They are eager to investigate the causality, which boosts the development of causal inference theory. Unlike common study designs that mainly focus on the observed exposures and outcomes, the focus of causal inference is on the unobserved values. For example, in order to investigate the causal effect between treatment A and death in 5 38 years, the researcher needs to compare the difference of survival of the same patient with and without using treatment A during those 5 years, given other circumstances remain exactly same. While it is impossible to go back in time to follow the same patient taking a different treatment choice, we can use counterfactual outcomes to estimate what could have happened instead. Let us assume, after treating with A, Zeus survived for 10 years. At the same time, let us assume that somehow we know without treatment A, Zeus would die in 5 years. This is then identified as a counterfactual outcome for Zeus given the reality. Consider a dichotomous treatment variable A (1: treated, 0: untreated), and a dichotomous outcome variable Y (1:death, 0: survival). Here we shall refer to variables such as A and Y that have different values for different individuals or subjects as random variables. Let Ya=1 represent the outcome variable that would have been observed under the treatment value a=1, while Ya=0 denotes the outcome variable that would have been observed if a patient didn't get treatment. If we measure the outcome at the fifth year after the treatment decision was made, then Zeus has Ya=1=0 and Ya=0=1, because he survived when treated, and would have died if untreated. The variables Ya=1 and Ya=0 are referred to as counterfactual outcomes or potential outcomes. In order to identify an individual causal effect, three components are needed: an outcome of interest; the action, such as treatment, a=1 or 0 to be compared here; and individual counterfactual outcomes, Ya=1 and Ya=0. Considering the diversity of each individual causal effect within the population, and the impossibility of knowing all the counterfactual outcomes for each individual, we mainly focus on investigating the average causal effect of a population.122 The ability to handle time-dependent confounders is another advantage of the 39 causal inference related method. A covariate is a time-dependent confounder for the effect of exposure on outcome where the past covariate values predict current exposure and current covariate value predicts outcome. In addition, a time-dependent confounder may simultaneously be an intermediate variable if past exposure predicts the current covariate value.122 The investigation of causality between treatment change and delay of time to mucoid PaPI gives a perfect example, shown in Figure 2.1 as a directed acyclic graph (DAG). ΔTx(t-1) marks the treatment change, compared to the previous observed treatment, at t-1. ΔFEV1%(t-1) marks the predicted FEV1% change, compared to optimal predicted FEV1% in previous year, at time t-1. Y(t) represents the outcome at t. In clinical practice, the decision of current treatment change, ΔTx(t), is determined by the current change of predicted FEV1% (ΔFEV1%(t)), the previous change of FEV1% (ΔFEV1%(t-1)), together with the previous treatment adjustment (ΔTx(t-1)), assuming the rest of the clinical variables do not impact the decision. For example, if a patient received a short-term additional treatment X to treat pulmonary exacerbation, and the treatment was effective, then the physician will be likely to prescribe it again if the patient experiences the same symptoms. At the same time, the current change of FEV1% is determined by the previous change of FEV1% (ΔFEV1%(t-1)) and the previous treatment change (ΔTx(t-1)). Under this situation, ΔFEV1%(t) is definitely a time-dependent confounder, since ΔFEV1%(t-1) predicts current exposure, ΔTx(t), and ΔFEV1%(t) also predicts outcome, Y(t). Because previous exposure, ΔTx(t-1) could predict ΔFEV1%(t), ΔFEV1%(t) is also an intermediate variable. The challenge of using a standard method is that to estimate the joint effects of ΔTx(t) and ΔTx(t-1), we must adjust for the confounding effect of 40 ΔFEV1%(t) to consistently estimate the effect of ΔTx(t) on Y(t), but the moment we adjust for the confounding by stratification, regression, or matching on ΔFEV1%(t), we cannot consistently estimate the effect of ΔTx(t-1) because the association between ΔFEV1%(t) and ΔTx(t-1) results in selection bias, even under the null hypothesis of no causal effect (direct, indirect or net) of ΔTx(t-1) on Y. The adjustment of intermediate variable ΔFEV1%(t) blocks the potential pathway from ΔTx(t-1), ΔFEV1%(t), to Y(t), which increases the effect of ΔTx(t-1) on Y(t). A series of methods under causal inference can adjust time-dependent confounders perfectly, which I will concisely describe in the method section. In order to provide consistent estimates for counterfactual quantities, E(Y a ), at least three assumptions have to be met: consistency, conditional exchangeability, and positivity. 1. 2. Consistency: If 𝐴𝐴̅=𝑎𝑎 for a given subject, then 𝑌𝑌 𝑎𝑎 =Y for that subject. Conditional exchangeability: ̅ 𝑌𝑌 𝑎𝑎 ∐ A(t)\| 𝐴𝐴̅(𝑡𝑡 − 1)=𝑎𝑎(𝑡𝑡 − 1), 𝐿𝐿(t)=𝑙𝑙 (t) (2.1) for all regimes a. 3. then Positivity: If P(𝐴𝐴̅(𝑡𝑡 − 1) = 𝑎𝑎(𝑡𝑡 − 1), 𝐿𝐿(𝑡𝑡) = 𝑙𝑙 ̅ (𝑡𝑡)) ≠ 0 P(𝐴𝐴(𝑡𝑡)\|𝐴𝐴̅(𝑡𝑡 − 1) = 𝑎𝑎(𝑡𝑡 − 1), 𝐿𝐿(𝑡𝑡) = 𝑙𝑙 ̅ (𝑡𝑡)) > 0 (2.2), (2.3) for all 𝑎𝑎(𝑘𝑘) ∈ 𝐴𝐴(𝐾𝐾), and 𝑡𝑡 = 0, …, K. The consistency assumption simply means that the outcome for every treated patient equals the outcome that would have occurred if he had counterfactually received 41 treatment, and the outcome for every untreated patient equals the outcome that would have occurred if the patient had remained counterfactually untreated. Intervention and its contrast absolutely have to be well defined. Conditional exchangeability reflects the assumption that the value of counterfactual outcomes is independent of the current observed treatment and is conditional on treatment history and time-varying covariates. In other words, no unmeasured confounder is unevenly distributed between treated and untreated groups to bias the estimation. Positivity indicates that the probability of being assigned to each treatment level is more than zero. This assumption ensures that any patient may experience any level of the treatment at any point of time regardless of his covariate history.116,122,123 With the consistency assumption, the observed outcome connects to the counterfactual outcome; with the conditional exchangeability assumption, the observational database has RCT features: (sequential) randomization at each decision point. According to the positivity assumption, patients exist in each treatment level at any point of time. Together, the assumptions of causal inference make the association between exposure and outcome in observational database as an unbiased estimation of the causality between that exposure and outcome in a counterfactual population. Therefore, causal inference perfectly bridges the gap of conducting DTRs using observational databases. 2.8 Significance Several guidelines for chronic lung health maintenance treatments exist, which somehow steer the prescribing practice. However, rather than suggesting the order of prescription, the guidelines only categorize all treatments by the certainty of net benefits. 42 Additionally, those certainties are summarized by existing RCTs, which have small sample size and extremely narrow characteristics to represent the whole patient population. In contrast, during clinical practice, healthcare providers are facing patients case by case; each individual has unique characteristics ranging from demographic characteristics, disease severity, and treatment pattern to personal preference. When and how does treatment initiate and change in patients diagnosed with new or continuing nonmucoid PaPI? How is it determined if the current treatments are providing optimal treatment effects? How can demographic and clinical variables create a score to identify suboptimal treatment? What are the cutoffs of the score that indicate a treatment change is needed? If a treatment change is needed, what should the healthcare providers do? Should they stop a specific medication, switch to another medication, or add on another medication to current treatment? Given the above sophisticated questions, no doubt, the decisions around treatment adjustment are difficult to make. To make things worse, none of the current guidelines in the CF field provides a comprehensive suggestion for rules or composite clinical signals that a healthcare provider could follow to deliver the optimal treatment effects with appropriate treatment change. Finally, the rarity and complicated nature of CF itself reduces the accuracy of decision-making based on routine clinical practice. A comprehensive study with a sophisticated design and a broad scope of longitudinal data, which reflects real-world practice questions to support the utilization of dynamic treatment regimes for CF patients, is lacking. Ideally, an RCT is supposed to identify causal effect by analogously gathering data through a randomized assignment of treatment, with perfect compliance, and without right censoring. However, the cost of conducting a new RCT with a small sample size 43 and a 1-year follow-up is exhaustive in terms of time and money. Furthermore, CF is a chronic disease. In order to capture intermediate outcomes such as the length of time it takes to develop mucoid PaPI, an RCT with a minimum of follow-up of 5 years is needed. Last but not least, treatment change could be determined by a combination of demographic variables, clinical signals, and treatment histories, which includes a tremendous number of scenarios. To ensure that results are not examined by chance, there should be a sufficient sample size for each scenario, enlarging the overall sample size and increasing the cost dramatically. For example, let us assume that only ΔFEV1%, PEx, and drug resistance determine the treatment change decision. Each additional unit change is clinically meaningful, which represents by one or several of the following clinical variables: additional 1% change of ΔFEV1%, additional one PEx, additional specific drug resistance. In order to measure the causality, I have assigned patients to all scenarios. According to the results of a survival analysis,11 we can assume that the clinically meaningful range for ΔFEV1% is 6% to 15%; the additional effect of having more than five PExs in a year is trivial; and only drug resistance that relates to aminoglycoside, beta-lactam, or macrolide affects the treatment change decision. Overall, there are 240 potential scenarios. Obviously, the sample size should be huge to appropriately capture the causality. In conclusion, the enormous time and monetary cost for an RCT, together with long-term follow-up and tremendous sample size, makes the thought of conducting an RCT to capture causal effect within dynamic treatment regimes an illusion. A longitudinal retrospective observational database is the most appropriate source of data for constructing DTRs as complicated as the one in this study. This is because that 44 observational database is not only cheaper than conducting an RCT, but also it comes with a huge sample size and hypothetically collects patient information in all scenarios. However, a traditional retrospective observational database contains none of the advantages of an RCT. For example, the treatment that a patient received would have been based on treatment history and clinical symptoms rather than being randomly assigned; a patient may or may not take the drug; and some patients may be lost to follow-up before the targeted outcome/symptom occurs. However, with appropriate adjustment those challenges could become strengths. This is because, unlike an RCT, an observational database represents specific features of daily clinical practice. As discussed previously, combined with the design of a dynamic treatment regime and methods from causal inference, an observational database will be able to account for those issues perfectly. Moreover, using an observational database reduces the chance of violating the unmeasured confounder (conditional exchangeability) assumption compared with using an RCT since the observational database captures all the information a physician has when a treatment decision is about to be made. Last but not least, the observational database provides the chance to identify a score, which may include innovative variables to aid decision-making for achieving optimal treatment effects. This research was conducted using the U.S. Cystic Fibrosis Foundation Patient Registry (CFFPR), which is a nationwide patient registry aiming at tracking treatment effects and disease transitions for CF patients. Since 1986, it has tracked over 300 clinically relevant variables, from demographic characteristics, clinical characteristics, and lab test results to treatments used.124 Considering the longitudinal and national characteristics of CFFPR, together with the abundant variables that are measured in the 45 database, there is no doubt that with correct measurement, this study was the most appropriate estimation of DTRs for nationwide CF patients. Armed with the dynamic treatment regime design, causal inference methods made the measurement of causation between rational treatment change and treatment effect a reality. With the identification of several potential treatment change rules for DTRs, the study provided the optimal treatment pattern for a specific patient given his current characteristics. In this study, the focus was on identifying optimal dynamic treatment effects by treatment class level rather than by each individual treatment. The current guidelines only suggest which treatments should be considered for patients older than 6 years old with mild to severe lung functions. However, the guidelines lack any information on when to initiate which class of treatment and in which order, let alone the timeframe for treatment changes and guidelines for providing personalized medicine for each patient based on individual characteristics. Ideally, optimal dynamic treatment effects for each individual treatment should be identified, but considering the extremely large number of treatment combinations and the number of patients in the U.S. Cystic Fibrosis Foundation Patient Registry (CFFPR), the ambitious aim is difficult to achieve. Moreover, it is complicated to differentiate the rationale of treatment change within a class; the change may have nothing to do with treatment effects but instead be related to cost, consumer-directed advertising, or patient/parent preference for time spent on treatment. For these reasons, this study did not take into consideration medication change within a treatment class unless the number of treatments in any certain class had changed. For example, if a patient switched from 46 using dornase alfa to hypertonic saline with all other individual treatments remaining the same, this switch was not considered a treatment change. However, if a patient previously received only dornase alfa and was given an additional hypertonic saline prescription during a visit, then a rational treatment change occurred in this visit. The assumption is that the physician believed additional treatment was needed in order to change the properties of lung phlegm given the patient's health status. The first goal of this study was to better understand the treatment pattern in the current database for CF patients diagnosed with nonmucoid PaPI. The second goal was to identify the lung treatment score, which affected the decisions of treatment change for achieving optimal treatment effects given patients' demographic characteristics, comorbidities, and clinical outcomes. Compared to a composite clinical signal, a score is more flexible in assigning weights for each variable. Finally, the study investigated the comparative effectiveness of different strategies for lung treatment scores in delaying the acquisition of mucoid PaPI, specifically in patients diagnosed with nonmucoid PaPI. In addition to these goals, the study also investigated the DTR rule that optimized treatment effects. The rules for DTRs, here, included measurement of needs, which are summarized by the lung treatment score, together with general decisions of whether to provide treatment changes to address subject-specific needs. With the results of this study, healthcare providers could switch from experiencebased to evidence-based decision-making. The lung treatment score and DTR strategy aided in identifying suboptimal treatment, and aided in making personalized decision on treatment change to maintain optimal treatment effects. At the same time, the study results could also support value-based insurance design by optimizing traditional 47 treatment utilization prior to reimbursement of extremely expensive medications through step therapy, tiered formulary, prior authorization, and other tools for managed care pharmacy. The balance between healthcare expenditure and effectiveness will be a permanent and tough topic for each individual with CF. Since CF is a genetic disorder, despite breakthroughs in treating patients, a permanent cure is unlikely. The latest treatment for CF, ivacaftor, cannot fully cure the genetic disorder, but can significantly increase lung function, weight, and decrease the probability of developing pulmonary exacerbations for those patients who have the specific genetic mutation that ivacaftor targets for. Especially from the perspective of relative improvement of lung function, after being on the treatment for 2 weeks, the treatment effects are sustained around 17% regardless of baseline age, lung function, or length of treatment with ivacaftor.125-129 The effects are also not permanent; after the treatment of ivacaftor is stopped, patients' lung function decreases to its prior level. In order to maintain the benefits of ivacaftor, the patient must remain on the medication permanently. Considering there are only 4%-5% of CF patients who can benefit from using ivacaftor, the cost of the drug shouldn't be a huge burden for an insurance company. The issue that insurances should be looking at isn't whether to reimburse ivacaftor, but how to maintain patients' lung function so as to avoid or delay the need for ivacaftor. This is exactly the type of decision-making that the results of this study supported. With the study results, insurance companies will be able to create a value-based pharmacy formulary in order to help control rapidly increasing medication expenditures while providing optimal health outcomes through cost-effective treatments. In such a value-based strategy, extremely expensive treatments, such as 48 ivacaftor, will be avoided unless the healthcare provider has already prescribed all other treatments step by step (step therapy), and the scenario of suboptimal treatment effect has already occurred (prior authorization). Last but not at least, even though the study design was rigorous and causal inference methodology was applied to adjust for the biases in the observational database, considering the huge reliance on treatment pattern that existed in the database (positivity assumption of causal inference), and assumptions of routine visits and continuing treatment utilizations, the result of this study should be considered exploratory. A gold standard RCT should be performed to get solid confirmation of the results for more confident application in future. Table 2.1. Phenotypic features consistent with a diagnosis of CF 49 Table 2.2. Definitions of CF patients with chronic PaPI Method Copenhagen, 1977130 Ballmann, 1988131 Lee, 2003132 Proesmans, 2006133 Length of persistence with P. aeruginosa >= 6 consecutive months, or less when combined with the presence of two or more P. aeruginosa precipitating antibodies more than 50% of cultures in 12 months had to be + 50% of months, when samples had been taken, were + 50% of months, when samples had been taken, were + Frequency of tests (sputum sample or deep throat swabs) In study, patients had an average of 10 sputum cultures per year 1-4 times a year every 3 months 4 times a year (in different months) 50 Table 2.3. Classes of treatments 51 Table 2.4. Current existing parametric algorithms for predicted normal FEV1 NHANES/Hankinson et al.81 Male Caucasian & <20 yr Caucasian & >=20 yr African-American & <20 yr African-American & >=20 yr Mexican-American & <20 yr Mexican-American & >=20 yr Female Caucasian & <18 yr Caucasian & >=18 yr African-American & <18 yr African-American & >=18 yr Mexican-American & <18 yr Mexican-American & >=18 yr Crapo et al.82-84 Male Female Applied age 8-80 Age Unit:yr Age2 -0.7453 0.5536 -0.7048 0.3411 -0.8218 0.6306 -0.04106 -0.01303 -0.05711 -0.02309 -0.04248 -0.02928 0.004477 -0.000172 0.004316 NA 0.004291 NA NA NA NA NA NA NA 0.00014098 0.00014098 0.00013194 0.00013194 0.00015104 0.00015104 0.00011607 0.00011607 0.00010561 0.00010561 0.0001267 0.0001267 -0.871 0.4333 -0.963 0.3433 -0.9641 0.4529 0.06537 -0.00361 0.05799 -0.01283 0.0649 -0.01178 NA -0.000194 NA -0.000097 NA -0.000113 NA NA NA NA NA NA 0.00011496 0.00011496 0.00010846 0.00010846 0.00012154 0.00012154 0.00009283 0.00009283 0.00008546 0.00008546 0.0000989 0.0000989 -2.19 -1.578 Unit:yr -0.0244 -0.0255 NA NA Unit:inch 0.1052 0.0869 NA NA NA NA Intercept Height Unit:cm Height2 Height2 * 52 Table 2.4. (continued) Applied age Knudson et al.134 Male <=24 yr >24 yr Female <=19 yr >19 yr Cherniak et al.135 Male Female Morris et al.136 Male Female Intercept Age Unit:yr Age2 Height Unit:inch Height2 Height2 * -4.808 -4.203 0.045 -0.027 NA NA 0.1168 0.1321 NA NA NA NA -2.703 -0.794 0.085 -0.021 NA NA 0.0686 0.0686 NA NA NA NA -2.59946 -2.56958 Unit:yr -0.03509 -0.02147 NA NA Unit:inch 0.1149 0.1034 NA NA NA NA -1.26 -1.931 Unit:yr -0.032 -0.025 NA NA Unit:inch 0.0919 0.0889 NA NA NA NA 53 Table 2.5. Clinical symptoms and signs used to define PEx or improvement from PEx in RCTs Item Change in sputum production: volume, appearance or color New or increased hemoptysis Increased cough Decreased activity Malaise, fatigue, or lethargy Absent from school/work due to illness Decreased exercise tolerance Increased dyspnea Increased chest discomfort Increasing respiratory rate Work of breathing Fever>38 °C orally Anorexia or weight loss Changes in chest sounds Decrease in FEV1 or FVC Radiographic changes indicative of an exacerbation Sinus pain or tenderness Change in sinus discharge Oxygen desaturation Articles Dakin et al92. Rosenfield et al91. Rabin et al90. ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 137 65 Fuch et al . Ramsey et al . ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 54 Table 2.5. (continued) 137 65 Articles Dakin et al92. Rosenfield et al91. Item Fuch et al . Ramsey et al . Rabin et al90. ✔ ESR, CRP, WCC, NC* Retractions or use of accessory ✔ muscles * ESR, erythrocyte sedimentation rate; CRP, C-reactive protein; WCC, white cell count; NC, neutrophil count 55 Figure 2.1: Influence of time-dependent confounder in the causation of this study. 56 CHAPTER 3 OBJECTIVES AND SPECIFIC AIMS 3.1 Objectives The primary objective of this study was to examine the treatment initiation and change in patients diagnosed with new or continuing nonmucoid PaPI. The second objective was to investigate the optimal treatment regime to delay the acquisition of mucoid PaPI for pediatric CF patients. 3.2 Specific Aims 1. To describe treatment use patterns and changes in pediatric CF patients diagnosed with nonmucoid PaPI in the CFFPR from 2006 to 2011. 2. To create a lung treatment score that indicates suboptimal lung health management by when rational treatment changes occur. 3. To investigate the comparative effectiveness of different treatment regimes, which is determined by the threshold of the predicted probability of lung treatment changes, to delay the acquisition of mucoid PaPI in pediatric CF patients. CHAPTER 4 METHODS 4.1 Data Sources All of the aims in this dissertation were conducted using data from the Cystic Fibrosis Foundation's Patient Registry (CFFPR). The CFFPR was created in 19668 to collect and track information on demographic characteristics, genetic and microbiological information, diagnoses, clinical outcomes, self-reported therapies, and hospitalization variables of patients with CF who receive care at Cystic Fibrosis Foundation (CFF) accredited care centers in the U.S. This information was used to create CF care guidelines, assist clinical practice, guide quality improvement on health services, and boost the research with complex questions, all of which eventually optimizes survival time and quality of life for the entire CF population. The CFFPR has set international standards for gathering patient data and has served as a model for other nonprofit health organizations all around the world.8 To better supervise data collection and confirm the quality of the database, in 2013, the CFF began conducting an external audit of the data entered into the CFFPR each calendar year. In 2013, 28 centers of varying size and geographic location participated, which included 1,606 patients. Data from 8,247 encounters and 1,471 care episodes were audited. All the key information, such as demographic, microbiological, 59 treatment, and hospitalization variables, were compared with the data in the electronic medical record (EMR) for completeness and accuracy. Overall, the CFFPR contained 96.5% of the encounters and 89.7% of the hospitalizations that were recorded in the EMR. Among the key variables examined, the accuracy of the data in the registry was over 95% accurate for date of birth, sex, and CFTR mutations. Microbiology were recorded accurately for 93.1 % of cultures, and medications were recorded accurately with some variability by type-over 95% for dornase alfa and azithromycin, over 90% for hypertonic saline and aztreonam, and over 85% for inhaled tobramycin.124 According to the above result, CFFPR has collected information with similar accuracy as EMR since 2012. The CFFPR can be applied to mimic EMR data even as early as 2006, since the quality of data is similar between 2006-2012 and after 2012. In the assumption section, I will concisely describe the results of several preliminary analyses, which have investigated the quality of data in the CFFPR from various perspectives. All of the data are captured through questionnaires during the inpatient or outpatient visit. Data are entered into a secure Web-based portal by trained staff at each CFF accredited care center.138 Four databases, an annualized database, an encounter database, a care episode database, and a demographic and diagnosis database, are used to save the information. The annualized database is the summary of all the events and disease deteriorations that were recorded on each visit for the same patient during the full calendar year. The care episode database includes information during the same care episode, which could be either hospitalization or home care. In order to handle all of the study's aims, data from all databases were used. 4.2 Study Design and Population 60 A cohort of de-identified patients diagnosed with nonmucoid PaPI existing in the CFFPR from Jan 1, 2006 to Dec 31, 2011 was identified. The main hypotheses and aims of this study were addressed using the retrospective cohort design. Patients were identified based on exposure to different treatment strategies, and were followed forward until diagnosed with mucoid PaPI, death, the end of the study, or did not match the treatment strategy, whichever occured first. Considering the long span of time for disease progression from nonmucoid PaPI to mucoid PaPI, together with the high cost of conducting a prospective cohort study, a retrospective study design is the perfect match for the hypotheses. 4.2.1 Study Design Armed with the causal inference method, after identifying the potential treatment strategies from the cohort and expert opinions, the data were able to emulate an RCT with DTR design. For each patient, 25 replicates were created from the index date. At the index date, each one of the patients (replicates) was assigned into one of the 25 related treatment strategy groups, respectively, given the threshold of lung treatment score when a treatment change was received. Those patients were followed until the occurrence of outcome, censoring, or failing to match the treatment strategy. With that design and method, the study can deliver better causality estimations with fewer biases than a study that assesses exposure retrospectively, such as case control studies. 4.2.2 Inclusion and Exclusion Criteria 61 Inclusion criteria: • Diagnosed with nonmucoid PaPI; o More than 1 year history OR o More than 1 negative culture test before positive • Patient has demographic information and existed in CFFPR from 2006- • Patient has moderately or severely impaired lung function. 2011; Exclusion criteria: • Patients born before 1988 or after 2006; • Patients didn't have any visit after 2006; • Diagnosed with mucoid PaPI before nonmucoid PaPI; • Patients had lung transplant before index date. In order to be qualified as diagnosed with nonmucoid PaPI, patients must have at least 1 year of prior history or have at least one negative culture test before the diagnosis. With this restriction, only patients initially diagnosed with nonmucoid PaPI were identified. Because of the way culture test results are collected and the uniqueness of the research question, this study focused on identifying the date of patients' initial diagnosis with nonmucoid PaPI, bypassing the ambiguity of identifying chronic nonmucoid PaPI diagnosis using CFFPR. To better aid decision-making about chronic medications for maintaining lung health, both treatment-naïve and nonnaïve patients were included, as long as the diagnosis date of nonmucoid PaPI was somehow identifiable. Following the inhaled antibiotics' label approved by the FDA, all patients, who 62 were treated with inhaled antibiotics, should be older than 6 years old with moderately to severely impaired lung function, indicated by the predicted FEV1% that was less than or equaled to 70%. Considering that the prospective RCTs that demonstrated the efficacy of inhaled dornase alfa, tobramycin, and oral azithromycin was published in 1994,137 1999,65 and 2003,139 respectively, all patients for inclusion in the study should have been initially diagnosed or initially treated for chronic nonmucoid PaPI after 2004. This is to account for the fact that prior to those approvals and publications, physicians may not have been prescribing those medications due to the lack of published evidence of efficacy. Moreover, before 2006, the richness of data on patient-reported treatment of encounter for the CFFPR was suboptimal. An exploratory analysis (Appendix A) was conducted to investigate the quality of CFFPR data and whether patient-reported treatments can be used as a proxy for prescription or refill records. Because of the features of lung health maintaining treatments, chronic utilization, and the results an external 2012 CFFPR audit showing that patient-reported treatment appropriately reflected prescription information in EMR, the annual inconsistency rate was identified. The rate was defined by the proportion of visits in which a patient did not report on a specific treatment that was initiated within the calendar year. At the same time, in order to investigate whether patient-reported treatments can be used as proxy for refill records, the discordance between self-reported treatments and refills in a commercial claims database was tested. Generally speaking, self-reported treatment has been captured appropriately and consistently since 2006 and is eligible to represent prescriptions since the inconsistency rate was about half after 2006. Before 2006, the inconsistency rate was about 80%; since 2006, it has decreased to 30% and maintained around 20% at the end 63 (Appendix A, Table A.3). Even though the data in CFFPR are appropriate for indicating prescribing patterns, they do not qualified as proxy for refill records, considering the discordance between self-reported treatments and refills in a commercial claims database. As shown in Appendix A, Table A.4, patients were reported on treatment in many records from the CFFPR, but about only half of those records also had claims indicated that the same patient had refilled the treatment at related encounter date in the claims database. The discordance could be caused by the reality that there are fewer claim records in the database than encounter records in the CFFPR. Other than dornase alfa, which has a high concordance, when patients reported on treatment and had refilled claims, the rest of the concordances came from that both claims database and CFFPR did not collect enough information about prescriptions, such as for inhaled aztreonam, TOBI® Podhaler, and ivacaftor. The concordance for above treatments is probably caused by the small number of patients who have access to the treatment, by the short time period since drug approval, or by the extremely high price of the treatment. Without further information, it is not possible to draw the conclusion that self-reported treatment in the CFFPR can be used as a proxy for refills. However, some patients lied about the treatment they received: about 1% to 6% of patients claimed that they were not on treatment but actually had refill claims as shown in Appendix A, Table A.8. This result definitely supports the previous assumption that patient-reported treatment in the CFFPR indicates the prescribing treatment pattern. Detailed descriptions of discordance tests are mentioned in the assumption section. With these qualifications in place, only patients who were listed in the CFFPR from January 1, 2006, to December 31, 2011, were included in the study. Another 64 criterion for patient selection was age. The median ages of those who develop nonmucoid Pa and mucoid Pa are 1 and 13 years old,14 respectively, and the majority of patients suffering from chronic PaPI are adolescent;140 for inclusion in the study, patients had to be younger than 18 years old at the index date. Index date was identified as the date when the patient had the first encounter visit after January 1, 2006, if he/she had been diagnosed with nonmucoid PaPI previously, or the date of encounter visit when patient initially diagnosed with nonmucoid PaPI after January 1, 2006. As an antibiotic for treating chronic PaPI, inhaled aztreonam was approved by the FDA on February 21, 2010; this recent approval could affect the rationale of treatment change given physicians' belief that newer is better. Another exploratory analysis was conducted both in order to understand how the drug approval date or the date when a treatment's efficacy was demonstrated may influence irrational treatment changes and also in order to estimate the potential presence of channeling bias in the specific aims. The result (Appendix B) supports part of the hypothesis: drug approval date, or the date when the efficacy of a treatment was demonstrated, affects treatment change. Before the date, the mean number of targeted treatment-associated treatment changes was lower than the mean number measured after the date. For instance, 1 year before the approval date, the mean number of treatment changes associated with azithromycin was 0.22, while the number went up to 0.56 during the first year after the approval date. Hypothetically, before the date, the mean number for targeted treatment-associated treatment changes should be zero since the drug either hadn't been approved yet or did not receive enough attention about treatment effects in publications. In reality, both off-label use and RCTs provide access to targeted treatment before it was approved, which explains the nonzero 65 value of mean number of treatment change that associated with targeted treatment. In azithromycin case, the off-label use was probably the dominant reason for the positive value. Even though the results already partially support the hypothesis, it is still difficult to investigate the association between the approval date and irrational treatment change in the preliminary analysis. Further analysis is needed, considering the challenge of differentiating rational changes from irrational changes and failing to adjust for other issues that may confound the influence of drug approval date on irrational treatment changes in the current exploratory analysis. Fortunately, inhaled aztreonam is the only CF treatment-related to this research that received approval between 2006 and 2011. More importantly, this study focused on investigating treatment changes among treatment class levels. The approval of a new drug within a preexisting class has little actual impact in related treatment class levels unless there was no alternative in that treatment class. Given the results of the exploratory analysis together with a general understanding of the treatment and with comprehensive procedures to capture and adjust influence, drug approval date does not significantly impact irrational treatment change. Patients with a lung transplant before the index date or under 6 years old were excluded, since in those situations participants may already be using other treatments, and antagonism between pathogens may exist, which complicates the identification of treatment effects for current dynamic treatment regimes. 4.3 Exposure, Covariate, and Outcome Assessment The core concept of this study is rational treatment change. This indicates whether to switch to, add on, or stop one or multiple classes of chronic treatments compared to the 66 treatment received in the previous visit. The treatment mainly consist of two categories: inhaled antibiotics and lung health maintenance medications. Moreover, the pulmonary guidelines48 for CF separate the suggested treatments by the estimation of net benefit: positive, neutral or uncertain, and negative. In this study, I assigned all the related medications into five classes regardless of the net benefit, as shown in Table 2.5. The utilization of medications that were not included in this table was not considered, since either those treatments were only prescribed for short-term use, i.v. antibiotics, or they only received approval recently (beyond the time constraints of this study). However, some excluded medications may influence the clinical outcome indirectly, with appropriate adjustment for those demographic, clinical, and treatment-related variables (Figure 4.1), the influence can be mitigated. For example, pancreatic enzyme replacements help with digesting and absorbing food, which increases lung function of a pediatric patient indirectly by enhancing his weight and height. In this section, I will describe the identification and assessment for exposure, covariate, and outcome, respectively. Since rational treatment change directly acts as the outcome for Aim 2, and it is also included in the lung treatment score, which functions as an exposure for Aim 3, I will define it first. 4.3.1 Rational Treatment Change As defined in the background section, any treatment change steered by evidence is a rational treatment change. Since the main focus of this study is to investigate optimal timing of rational treatment change on a class level, the appropriate identification of the exposure, rational treatment change, is vital. In my study, treatment information, 67 including both type and quantity, was specifically captured from the class level. Initial treatment was defined as the first class of treatment that a patient received after index date from 2006 to 2011. For instance, if a patient received both dornase alfa and hypertonic saline on the index date, the initial treatment for this patient would be two mucolytics. Rational treatment change was defined as a dichotomous variable and was measured at each visit. It occurred at a visit only when either a different class or a different number of treatments within a class was prescribed according to clinical evidence. The clinical evidence had to include at least one change to a clinical variable, such as FEV1%, PEx, pathogens, adverse effects, drug resistance, etc. In order to investigate how much clinical evidence is needed to define a rational treatment change and in order to determine the impact of different definitions on the time to acquire mucoid PaPI, rational treatment changes are defined under three assumptions: loose, neutral, and strict. For the loose assumption, all treatment changes are rational treatment changes regardless of whether clinical evidence exists. A treatment change is defined as strict if it is associated with related clinical evidence. For the neutral assumption, whenever a physician stops prescribing a treatment, the change can be identified as a rational treatment change only if the change in clinical variables is consistent with suspending prescription. Consistency means that the change of treatment and change of clinical status have an identical direction. For instance, the initiation of one treatment causes AE, or drug resistance. If the physician stops prescribing this medication accordingly, then it is a rational treatment change. The rest of the decisions to terminate a prescription are defined as irrational treatment changes that did not have clinical status change in the same direction. Moreover, under the neutral assumption, adding on and 68 switching both between and within treatment class levels are rational treatment changes regardless of the existing evidence. Since all treatments were prescribed chronically, I anticipated that under the neutral assumption, the probability of a physician's randomly prescribing a treatment without any reason would be low. More clinical evidence is needed to define rational treatment change for the loose, neutral, and strict assumptions. Results in Aim 1 used the neutral assumption to present the existing treatment change pattern. For the following aims, the strict assumption was applied. A sensitivity analysis was also conducted to investigate the influence of different rational treatment change assumptions on investigating the optimal lung treatment maintenance strategy. Table 4.1 lists specific scenarios to illustrate the neutral assumption of treatment change. To simplify the scenario, only mucolytics and inhaled antibiotics are taken into consideration. All other individual treatments/treatment classes are assumed to remain the same between two consecutive visits. After the first of the two consecutive visits, a rational treatment change would be confirmed as long as the number of treatments prescribed in the second visit increased, regardless of whether the change occurred within or between classes. As shown in scenario 2, the minute that inhaled tobramycin is initiated in the second visit, the change is considered a rational treatment change because the patient has been taking only hypertonic saline but not any other treatment since the last visit. Taking treatments every other month is common for patients using lung maintenance treatments such as inhaled antibiotics. According to the preliminary analysis, patients sometimes fail to report on treatment when the visit occurred in a break month. In order to prevent failure to capture treatment caused by the frequency of taking a treatment, all treatment frequencies were captured, adjusting incorrect reports during the 69 break month. For terminating a medication, the definition of rational treatment change relies on whether clinical variables change in the same direction as treatments reported by patients. In scenario 6, a patient stops taking inhaled tobramycin in current visit. This would be confirmed as a rational treatment change if the change of clinical status increased lung function, yielded a negative culture test result, or caused resistance to aminoglycoside to disappear. At the beginning, the first four classes were the main focus in defining rational treatment change. However, according to the chronic treatment guideline, which determined that the chronic use of bronchodilators was associated with uncertain or negative benefits, only three treatment classes-inhaled antibiotics, mucolytics, and antiinflammatories-were considered to define the rational treatment change. Each individual treatment in the last class-other chronic treatments-was treated as a confounder in the study since all of them can not only treat specific pathogens or comorbidities, but also improve patient's lung function indirectly, which complicates the treatment effects of the other three treatment classes. 4.3.2 Exposure Assessment Aim 1 was the descriptive analysis, so there was no exposure. All of the variables were described in the outcome assessment section. Aim 2, a prediction model, also did not have any exposure. In Aim 3, each unique threshold of the lung treatment score was defined as one exposure, which indicated whether or not there should be a lung health maintenance treatment change in the current encounter visit according to the treatment score threshold. 70 At the end of Aim 2, a lung treatment score was created that predicted the probability of a rational treatment change in current visit. Furthermore, the score thresholds were determined by clinical experience, score distribution, and variations in disease severity. Whenever a treatment was not consistent with the lung treatment score threshold, the patient was artificially censored at the visit given the related threshold. Given the various treatment score thresholds, each one of the relative observed cohorts was unique, from the number of patients to the number of visits for each patient in the cohort. 4.3.3 Outcome Assessment Aim 1: Patient demographic characteristics, including but not limited to age, gender, race, ethnicity, smoking status, second hand smoking status, pregnancy, transplant status, height, and weight, were measured. Clinical variables, such as FEV1, predicted current FEV1%, relative change of predicted FEV1% compared to the optimal value from the previous year, and number of PEx in the previous year, were also measured. The variations of clinical variables by demographic characteristics or other clinical variables were measured as well. Treatment patterns, treatment change, and time of change were captured, together with other treatment-related variables. More importantly, considering the time-varying issue, time-varying covariates, such as FEV1, predicted current FEV1%, and PEx were measured at three temporal points for each visit: current visit, last visit, and when optimal value was measured among all the visits in the previous year. The relationship between clinical variables and changes in lung health maintenance treatment was also described. All comorbidities were measured by indicators. 71 Aim 2: The main result of Aim 2, the lung treatment score, is a probability of getting rational treatment change, which also indicates the probability of having suboptimal lung health management. Aim 3: Ideally, time from the date of being initially diagnosed with nonmucoid PaPI to mucoid PaPI should be identified as the primary outcome. However, with the developing technology and early detection leading to better understanding of the disease, the age of being initially diagnosed with nonmucoid PaPI is decreasing. At the same time, CF patients may receive treatment after 6 years of age. Therefore, the time from the index date to being initially diagnosed with mucoid PaPI was applied as the primary outcome instead. Patients were censored based on the earliest development of mucoid PaPI, death, or the end of the study (Dec 31, 2011), whichever occurred first. 4.3.4 Covariate Assessment Aim 2: All variables that are mentioned in Table 4.2, ranging from demographic characteristics, clinical variables, and comorbidities to treatment-related variables, were considered to enhance the accuracy of prediction, even though demographic characteristics and CFRD status are not treated as confounders in the DAG (Figure 4.1). All the clinical variables, treatment-related variables, comorbidities, and weight for age Z score were treated as time-varying variables. The rest of the demographic characteristics were handled as baseline variables in the score prediction model. Aim 3: In Figure 4.1, variables in current visit were denoted with t, and (t-1) represents the value that occurred in previous encounter visit. To simplify the DAG figure, other than ΔFEV1% and ΔTx, the rest of the time-varying covariates were not 72 denoted with t or (t-1) in the figure, such as age, height, weight for age Z score, CFRD, and drug resistance. They were taken into consideration as time-varying covariates in the analysis. All treatment information was self-reported, which reflected what treatment they were on previously, so treatment information of current encounter visit only represents the prescribing behavior of the last encounter visit. Other than exposure and outcome, which is represented by the green and blue nodes, respectively, the figure consists of four groups of independent variables. All the variables of demographic characteristics are located in the top-left corner. The top-right corner belongs to comorbidity variables. Clinical variables, such as predicted FEV1%, are located in the center, and the treatment/pathogen-related variables stand at the bottom. Treatment/pathogen-related variables include pulmonary infection and drug resistance caused by pathogens other than P. aeruginosa. To simplify the figure, I have only included ΔFEV1%; it represents the combination or the matrix of all the clinical time-varying covariates, such as ΔFEV1%, FEV1%, and PEx. All variables of demographic characteristics, comorbidity variables, clinical variables, and treatment/pathogen-related variables were treated as covariates in Aim 3. As long as antifungals and clarithromycin were used for more than 1 month, those two treatments were considered as covariates. Table 4.3 (a-d) shows the resource, original type, descriptions, together with aiming type or class of those covariates in the study. The majority of the covariates from the group of clinical variables and treatment/pathogen-related variables were adjusted as the confounders, which is described in the method section. 73 4.4 Methods As mentioned previously, only applying a series of methods under causal inference could use an observational database to emulate an RCT. Thereafter, the DTR of optimal treatment change strategy can be identified. In order to use an observational database to provide consistent estimates of counterfactual quantities, E(𝑌𝑌 𝑎𝑎 ), at least three assumptions have to be met: consistency, conditional exchangeability, and positivity. With the satisfaction of the above assumptions, several methods are available to handle time-dependent confounders, such as inverse probability weighted (IPW) estimation of marginal structural models (MSMs),141 g-estimation of structural nested models (SNMs),142 and g-computation.143 Compared with the other two methods, MSMs require less computational ability, are more precise to explain, and-most importantly- have less potential of being misspecified. The major drawback of these models is that, compared to SNMs, MSMs fail to explore the potential interactions between exposure (treatment) and time-dependent confounders. Specifically for Aim 3, I applied IPW estimation to the dynamic MSMs since failing to explore the potential interaction between treatment and time-dependent confounders was not a major issue in this study. The simplest indication of the interaction was mentioned in Robins et al.:144 if there exists a value of 𝑙𝑙𝑗𝑗 , say 𝑙𝑙𝑗𝑗 = 0, for all but one 𝑎𝑎𝑗𝑗 ∈𝐴𝐴𝑗𝑗 , ̅ , 𝑙𝑙𝑗𝑗 = 0, 𝑎𝑎𝑗𝑗−1 = 0. 𝑓𝑓𝑎𝑎𝑗𝑗 𝑙𝑙𝑗𝑗−1 (4.1) Therefore, an MSM is not applicable since the probability of having artificial censoring is 0. For example, a study investigating the effect of occupational exposure on mortality falls exactly into this scenario: if a subject is off work at time j, and 𝑙𝑙𝑗𝑗 = 0, then 74 that subject could not have occupational exposure 𝑎𝑎𝑗𝑗 = 0. Such a scenario definitely did not occur in this study. As mentioned in the background, clinical time-dependent confounders are the core clinical signals affecting treatment-change decision-making. However, the clinical time-dependent confounders were also included in the exposure, different treatment score thresholds, in Aim 3. Therefore, there was an association between time-dependent confounders and exposure, but the chance of having an interaction was trivial. Even though treatment change scores were determined by clinical variables and demographic characteristics (𝐿𝐿𝑗𝑗 , 𝑉𝑉), the occurrence of exposure depends on having a treatment change when the treatment change score was beyond specific thresholds, ̅ , 𝑙𝑙𝑗𝑗 , 𝑎𝑎𝑗𝑗−1 = 𝑑𝑑𝜃𝜃 (𝑙𝑙𝑗𝑗−1 ̅ ), 𝑣𝑣 ≠ 0 𝑓𝑓𝑎𝑎𝑗𝑗 = 𝑑𝑑𝜃𝜃 (𝑙𝑙𝑗𝑗̅ )𝑙𝑙𝑗𝑗−1 (4.2) for any 𝑙𝑙𝑗𝑗 ∈ 𝐿𝐿𝑗𝑗 . Moreover, unless a traditional method has a fixed exposure, the exposure in my study is dynamic-a treatment change strategy. Counterfactually, the same patient (replicate) was assigned to each strategy at the index date; whenever the observed treatment change did not match the treatment change strategy, the patient was artificially censored. Since the treatment change strategy was indirectly determined by those timedependent confounders, the probability of interaction between exposure and timedependent confounders would be quite low after censoring the patient, whose treatment pattern did not match the treatment strategy. MSMs are a class of causal models for estimating the causal effects of timevarying exposure in the presence of time-varying covariates that may be simultaneously time-dependent confounders and intermediate variables in the observational data.141,144 The term marginal comes from the focus of these models-marginal distribution with the 75 counterfactual exposure-rather than the joint distribution. MSMs are structural models, since in the econometric and social science fields, anything that models the probabilities of counterfactual exposure is often referred to as structural.145 The parameters of an MSM can be consistently estimated using IPW estimators. Calculating the IPW estimation is a core procedure when conducting MSMs. IPW estimation provides an innovative way to adjust both confounders, especially time-dependent confounders, as well as selection biases through creating a pseudopopulation within which the confounders and selection biases no longer exist. Therefore, the unbiased estimation of a parameter in pseudopopulation is consistently equal to the unbiased estimation of the counterfactual parameter.141 IPW estimation is the product of inverse probability of treatment weighting (IPTW) and inverse probability of censoring weighting (IPCW). IPTW is used to adjust the confounders, especially time-dependent confounders, between exposure (treatment) and outcome. IPCW is applied to adjust selection bias. Using IPTW as an example, which is the inverse probability that a patient will have specific treatment or treatment patterns, let us assume that the treatment here is a dichotomous variable-that is, the patient is either treated or not treated. If the IPTW=wi, then the subject contributes wi copies of him/herself to the pseudopopulation. After the IPTW for each patient in the original cohort is calculated and all patients are allowed to contribute multiple copies of themselves according to the value of IPTW, the pseudopopulation is created. Within this pseudopopulation, the probability of having treatment or not is even for each individual patient. Therefore, the association between exposure and confounders is blocked. Similarly, IPCW can block the association between selection bias and exposure. Under this situation, both confounders and selection biases are only associated with outcome but 76 exposure. In a word, if the assumptions for causal inference are satisfied, the causality can be measured in the pseudopopulation since the confounders and selection biases have been adjusted. There are two methods to calculate the IPW: stabilized weights and unstabilized weights. All of the weights comprise a denominator and a numerator. The denominator remains the same between stabilized weights and unstabilized weights. The denominator estimates the probability of remaining off treatment for IPTW, or the probability of remaining uncensored for IPCW independently, by including a time-dependent intercept, baseline covariates V, and time-dependent covariates L. Actually, baseline covariates V are part of the time-dependent covariates. The numerator of the unstabilized weight calculation is more straightforward than for the stabilized weight, using 1 rather than estimating the probability of remaining off treatment for IPTW, or the probability of remaining uncensored for IPCW, by including a time-dependent intercept and baseline covariates V. Since time-dependent covariates (L) are captured only in the denominator after weighting in the pseudo-population, the time-dependent covariates are eliminated by blocking the association between exposure and time-dependent covariates, regardless of applying unstabilized or stabilized weight. The rest, baseline confounders, will be adjusted by the outcome model. Below are expressions for stabilized IPTW and IPCW. A patient may be right-censored because of failing to following the specific treatment protocol, the administrative end of the study, disenrollment from the patient registry, or death. Therefore, the indicator of right-censoring 𝐶𝐶𝑗𝑗 can be recorded by the joint function of three indicators, 𝐶𝐶𝑗𝑗𝑑𝑑𝑑𝑑𝑑𝑑 , 𝐶𝐶𝑗𝑗𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑ℎ , and 𝐶𝐶𝑗𝑗𝑒𝑒𝑒𝑒𝑒𝑒 ; each one of them represents one potential reason for right-censoring. It can be broken down into three parts: 𝑑𝑑𝑑𝑑𝑑𝑑 ̅ , 𝐿𝐿𝑗𝑗 ) ∏𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) = 0\|𝑌𝑌𝑗𝑗 = 0, 𝐴𝐴𝑗𝑗−1 𝑗𝑗=0 P(𝐶𝐶𝑗𝑗 77 (4.3) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑ℎ ̅ , 𝐿𝐿𝑗𝑗 , 𝐶𝐶𝑗𝑗𝑑𝑑𝑑𝑑𝑑𝑑 = 0) ∏𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) = 0\|𝑌𝑌𝑗𝑗 = 0, 𝐴𝐴𝑗𝑗−1 𝑗𝑗=0 P(𝐶𝐶𝑗𝑗 • • 𝑒𝑒𝑒𝑒𝑒𝑒 ̅ , 𝐿𝐿𝑗𝑗 , 𝐶𝐶𝑗𝑗𝑑𝑑𝑑𝑑𝑑𝑑 = 0, 𝐶𝐶𝑗𝑗𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑ℎ = 0) ∏𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) = 0\|𝑌𝑌𝑗𝑗 = 0, 𝐴𝐴𝑗𝑗−1 𝑗𝑗=0 P(𝐶𝐶𝑗𝑗 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) P(𝐴𝐴𝑗𝑗 =𝑎𝑎𝑖𝑖,𝑗𝑗 \|𝐴𝐴̅𝑗𝑗−1 =𝑎𝑎𝑖𝑖,𝑗𝑗−1 ,𝑉𝑉=𝑣𝑣𝑖𝑖 ) ̅ ) P(𝐴𝐴𝑗𝑗 =𝑎𝑎𝑖𝑖,𝑗𝑗 \|𝐴𝐴̅𝑗𝑗−1 =𝑎𝑎𝑖𝑖,𝑗𝑗−1 ,𝐿𝐿𝑗𝑗 =𝑙𝑙𝑖𝑖,𝑗𝑗 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖 (𝑡𝑡)=∏𝑗𝑗=0 (4.4) (4.5) (4.6) ̅ ̅ =𝐶𝐶𝑖𝑖.𝑗𝑗−1 ,𝐴𝐴̅𝑗𝑗−1 =𝑎𝑎𝑖𝑖,𝑗𝑗−1 ,𝑉𝑉=𝑣𝑣𝑖𝑖 ) 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) P(𝐶𝐶𝑗𝑗 =𝑐𝑐𝑖𝑖,𝑗𝑗 \|𝐶𝐶𝑗𝑗−1 ̅ ) ̅ ̅ =𝐶𝐶𝑖𝑖.𝑗𝑗−1 ,𝐴𝐴̅𝑗𝑗−1 =𝑎𝑎𝑖𝑖,𝑗𝑗−1 ,𝐿𝐿𝑗𝑗 =𝑙𝑙𝑖𝑖,𝑗𝑗 P(𝐶𝐶𝑗𝑗 =𝑐𝑐𝑖𝑖,𝑗𝑗 \|𝐶𝐶𝑗𝑗−1 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖 (𝑡𝑡)=∏𝑗𝑗=0 (4.7) Even though both methods are consistent on causal estimation, stabilized weights are preferred when calculating IPTW and IPCW since stabilized weights provide a narrow 95% CI, together with actual coverage rates that are closer to 95% compared to unstabilized weights.146 The statistical superiority of the stabilized weights could occur only when the outcome model is not saturated.122 Since the outcome models that have time-varying exposures are barely saturated, the stabilized weight was applied in this study. Last but not least, with baseline confounders (V) in the conditioning event for both the numerator and the denominator, the numerical values of numerator and denominator get closer, which results in added stabilization of (less variability in) IPW, further narrowing 95% confidence intervals.122 Initially, MSMs are proposed to estimate static treatment regimes. They are increasingly being applied to estimate optimal DTRs. Compared with estimating static treatment regimes, the most crucial part of an MSM is the estimation of the value function for a targeted regime d. Let's assume that a group of n subjects are sampled at random according to a fixed distribution denoted by 𝑃𝑃𝜋𝜋 . The distribution is composed of ̅ ), together with a fixed the unknown distribution of each 𝐿𝐿𝑗𝑗 conditional on (𝐿𝐿𝑗𝑗−1 , 𝐴𝐴𝑗𝑗−1 exploration strategy for generating the actions. Let the forgoing unknown conditional densities as [𝑓𝑓0 , 𝑓𝑓1 , 𝑓𝑓2 , … 𝑓𝑓𝐾𝐾 ] , and denote the exploratory strategy 78 by 𝜋𝜋 = (𝜋𝜋0 , 𝜋𝜋1 , 𝜋𝜋2 , … , 𝜋𝜋𝐾𝐾 ), where each one of 𝜋𝜋 represents an exploratory DTR at time 𝑘𝑘. The ̅ probability that treatment 𝑎𝑎𝑗𝑗 is taken given history of 𝐴𝐴𝑗𝑗−1 and 𝐿𝐿𝑗𝑗−1 is 𝜋𝜋𝑗𝑗 (𝑎𝑎𝑗𝑗 \|𝑎𝑎𝑗𝑗−1 , 𝑙𝑙𝑗𝑗̅ ) (𝑗𝑗 = 1, 2, 3, … , 𝑘𝑘) (𝜋𝜋0 (𝑎𝑎0 \|𝑙𝑙0 ) for 𝑗𝑗 = 0) . Therefore, the likelihood under 𝑃𝑃𝜋𝜋 of the trajectory [𝑙𝑙0 , 𝑎𝑎0 , 𝑙𝑙1 , 𝑎𝑎1 , … , 𝑙𝑙𝑘𝑘 , 𝑎𝑎𝑘𝑘 , 𝑙𝑙𝑘𝑘+1 ] is ̅ 𝑓𝑓0 (𝑙𝑙0 )𝜋𝜋0 (𝑎𝑎0 \|𝑙𝑙0 ) ∏𝐾𝐾 𝑗𝑗−1 )𝜋𝜋𝑗𝑗 (𝑎𝑎𝑗𝑗 \|𝑙𝑙𝑗𝑗̅ , 𝑎𝑎𝑗𝑗−1 )𝑓𝑓𝐾𝐾+1 (𝑙𝑙𝐾𝐾+1 \|𝑙𝑙𝐾𝐾̅ , 𝑎𝑎𝐾𝐾 ). 𝑗𝑗=1 𝑓𝑓𝑗𝑗 (𝑙𝑙𝑗𝑗 \|𝑙𝑙𝑗𝑗−1 , 𝑎𝑎 Similarly, let the 𝑃𝑃𝑑𝑑 denote the distribution of a trajectory where a targeted regime 𝑑𝑑 = (𝑑𝑑0 , 𝑑𝑑1 , 𝑑𝑑2 , … , 𝑑𝑑𝐾𝐾 ) is used to generate actions. If 𝑑𝑑 is a deterministic strategy, where 0 ≤ 𝑗𝑗 ≤ 𝐾𝐾, 𝑑𝑑𝑗𝑗 : (ℒ𝑗𝑗 , 𝒜𝒜𝑗𝑗−1 ) → 𝒜𝒜𝑗𝑗 is a mapping from the previous history space (ℒ𝑗𝑗 , 𝒜𝒜𝑗𝑗−1 ) to the action space 𝒜𝒜𝑗𝑗 , then the likelihood under 𝑃𝑃𝑑𝑑 of the trajectory [𝑙𝑙0 , 𝑎𝑎0 , 𝑙𝑙1 , 𝑎𝑎1 , … , 𝑙𝑙𝑘𝑘 , 𝑎𝑎𝑘𝑘 , 𝑙𝑙𝑘𝑘+1 ] is ̅ 𝑗𝑗−1 )𝕀𝕀[𝑎𝑎𝑗𝑗 = 𝑑𝑑𝑗𝑗 (𝑙𝑙𝑗𝑗̅ , 𝑎𝑎𝑗𝑗−1 )]𝑓𝑓𝐾𝐾+1 (𝑙𝑙𝐾𝐾+1 \|𝑙𝑙𝐾𝐾̅ , 𝑎𝑎𝐾𝐾 ). 𝑓𝑓0 (𝑙𝑙0 )𝕀𝕀[𝑎𝑎0 = 𝑑𝑑0 (𝑙𝑙0 )] ∏𝐾𝐾 𝑗𝑗=1 𝑓𝑓𝑗𝑗 (𝑙𝑙𝑗𝑗 \|𝑙𝑙𝑗𝑗−1 , 𝑎𝑎 In other words, the distribution of a dataset represents a sample of 𝑃𝑃𝜋𝜋 , and the distribution of 𝑃𝑃𝑑𝑑 is the one with targeted estimand. Since the value function (𝑉𝑉𝑓𝑓𝑑𝑑 ) of DTRs is estimated by where 𝑑𝑑𝑃𝑃𝑑𝑑 𝑑𝑑𝑃𝑃𝜋𝜋 𝑉𝑉𝑓𝑓𝑑𝑑 = 𝐸𝐸𝑑𝑑 𝑌𝑌 = ∫ 𝑌𝑌𝑌𝑌𝑃𝑃𝑑𝑑 = ∫ 𝑌𝑌 𝑑𝑑𝑃𝑃𝑑𝑑 𝑑𝑑𝑃𝑃𝜋𝜋 𝑑𝑑𝑃𝑃𝜋𝜋 , (4.8) is a version of the Radon-Nikodym derivative and is given by the ratio of the two likelihoods mentioned above, the ratio simplifies to 𝑤𝑤𝑑𝑑,𝜋𝜋 = ∏𝑘𝑘𝑗𝑗=1 Ι𝐴𝐴𝑗𝑗 =𝑑𝑑𝑗𝑗 𝐿𝐿𝑗𝑗 , 𝐴𝐴̅𝑗𝑗−1 𝑗𝑗 , 𝜋𝜋𝑗𝑗 𝐴𝐴𝑗𝑗 𝐿𝐿 ̅ . 𝐴𝐴𝑗𝑗−1 (4.9) It is a weight function depending on the entire data trajectory, as long as it matches the regimen 𝑑𝑑 from the beginning till time 𝑗𝑗.120 79 If the data are collected from an RCT and 𝑑𝑑 is one of the investigated regimes, the only procedure needed is identifying the subjects who follow regime 𝑑𝑑 exactly. Therefore, the estimation of optimal DTRs in an RCT is the same as a censoring question under causal inference, as long as the targeted regime is embedded in the trial. Both stabilized and unstabilized IPCW, which estimate the probability of a subject that keeps following regime 𝑑𝑑 from the index date till current visits, are able to handle selection bias. However, if the data are collected from an observational dataset or are not sequentially randomized, the 𝑃𝑃𝑑𝑑 is difficult to estimate. Several articles147-150 have generated and proved that the above weight is able to estimate the 𝑃𝑃𝑑𝑑 depending on the entire observed data trajectory. The numerator is a dummy variable, which indicates that the subject was still following regime 𝑑𝑑 at visit 𝑗𝑗. Furthermore, the denominator is the probability that a subject received observed treatment in the 𝑃𝑃𝜋𝜋 . Whenever the subject's observed treatment 𝐴𝐴𝑗𝑗 does not match the regime at time 𝑗𝑗, the probability is equal to 0. Thus, the numerator censors subjects that did not follow regime 𝑑𝑑. The censorship when the subject does not follow the regime is defined as artificial censoring to differentiate its mechanism with traditional censoring (disenrollment, end of study, or death). In fact, the weight effectively produces a stratified redistribution to the correct operation in which noncompliers to regime 𝑑𝑑 are censored the first time they do not comply and their contributions are redistributed among those who have the same variables and treatment history and who remain compliant. This redistribution produces the right estimand, because according to the sequential randomization assumption, compliance status at a given time among those with the same past is the result of a random mechanism that is independent of the future health outcomes that the subjects would experience if they were 80 to comply with regime 𝑑𝑑.120 Alternatively, the weight can also be treated as an unstabilized IPCW. The numerator is an indicator in which only subjects who kept following regime 𝑑𝑑 would be retained. The denominator is the probability of subjects who counterfactually followed regime 𝑑𝑑 in the observed dataset until time, k', when the patient fails to follow the related DTR. To simplify the notation, the 𝐶𝐶𝑗𝑗 here includes both traditional and artificial censoring 𝐶𝐶𝑗𝑗𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 , 𝐶𝐶𝑗𝑗𝑎𝑎𝑎𝑎𝑎𝑎 . ′ 𝑤𝑤𝑑𝑑,𝜋𝜋 = ∏𝑘𝑘𝑗𝑗=1 1 ̅ 𝜋𝜋𝑗𝑗 (𝐴𝐴𝑗𝑗 ,𝐶𝐶𝑗𝑗 =0\|𝐿𝐿𝑗𝑗 ,𝐴𝐴̅𝑗𝑗−1 ,𝑑𝑑𝑗𝑗 ) s.t. 𝑘𝑘 ′ = argmin 𝐶𝐶𝑘𝑘 = 1 𝑘𝑘 (4.10) As mentioned previously, DTRs can be identified within observational databases. The two main issues a researcher must take into account are 1) defining the rules or protocols and 2) randomization at each decision point. In causal inference, the IPW estimation of MSMs is the best solution for the above issues. After applying the IPW, a pseudopopulation can be created from the data at each decision point. Following the rules, the probability of assigning a patient to different treatment groups is exactly the same. In the previous notation, the regime 𝑑𝑑𝑗𝑗 was applied, which emphasized that the same regime could be varied at different times. To simplify the notation and emphasize the characteristics of each rule rather than the variation of the same regime at different time points, I use 𝑑𝑑𝜃𝜃 to represent the rule, which could be varied at different time points. The notation matches current treatments given the past observed characteristics or covariates 𝐿𝐿𝑗𝑗 ; for any treatments at time 𝑗𝑗, if it follows the rule, then 𝐴𝐴𝑗𝑗 = 𝑑𝑑𝜃𝜃 𝐿𝐿𝑗𝑗 . Alternatively, it denotes as 𝐶𝐶𝑗𝑗𝑎𝑎𝑎𝑎𝑎𝑎 = 0, which indicates that the patient was not artificially censored at time 𝑗𝑗 given the DTR 𝜃𝜃 . The variation of 𝜃𝜃 represents different rules. Since 𝑑𝑑𝜃𝜃 is a deterministic strategy, given the regime and whether the subject is censored in the current 81 visit, the treatment pattern in the current visit is fixed as long as the traditional censoring has been adjusted. Using my research question as an example, let 𝑆𝑆𝑗𝑗 ⊆ 𝐿𝐿𝑗𝑗 , 𝑚𝑚𝜃𝜃 is the threshold for the regime 𝑑𝑑𝜃𝜃 , define following 𝑑𝑑𝜃𝜃 (𝐶𝐶𝑗𝑗̅ 𝑎𝑎𝑎𝑎𝑎𝑎 = 0) as if 𝑆𝑆𝑗𝑗 > 𝑚𝑚𝜃𝜃 , then 𝐴𝐴𝑗𝑗 = 𝑎𝑎𝑗𝑗−1 ; if 𝑆𝑆𝑗𝑗 ≤ 𝑚𝑚𝜃𝜃 , then 𝐴𝐴𝑗𝑗 ≠ 𝑎𝑎𝑗𝑗−1 . Therefore, the weight for dynamic MSMs equals the censoring weight, which includes both traditional and artificial censoring parts. 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃: ̅ , 𝜃𝜃 = 𝑃𝑃𝐴𝐴𝑗𝑗 = 𝑎𝑎𝑗𝑗−1 𝐶𝐶𝑗𝑗̅ = 0, 𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃 ∗ 𝑃𝑃𝐴𝐴𝑗𝑗 = 𝑎𝑎𝑗𝑗−1 , 𝐶𝐶𝑗𝑗̅ = 0𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃); 𝑃𝑃(𝐶𝐶𝑗𝑗̅ = 0\|𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃 = 𝑃𝑃𝐴𝐴𝑗𝑗 ≠ 𝑎𝑎𝑗𝑗−1 𝐶𝐶𝑗𝑗̅ = 0, 𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃 ∗ 𝑃𝑃𝐴𝐴𝑗𝑗 ≠ 𝑎𝑎𝑗𝑗−1 , 𝐶𝐶𝑗𝑗̅ = 0𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃). 𝑃𝑃(𝐶𝐶𝑗𝑗̅ = 0\|𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 Assuming that traditional censoring has been adjusted (𝐶𝐶𝑗𝑗̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0), if 𝐶𝐶𝑗𝑗̅ 𝑎𝑎𝑎𝑎𝑎𝑎 = 0 ̅ , 𝜃𝜃 = 1. Similarly, if 𝐶𝐶𝑗𝑗̅ 𝑎𝑎𝑎𝑎𝑎𝑎 = 0 and 𝑆𝑆𝑗𝑗 ≤ 𝜃𝜃, and 𝑆𝑆𝑗𝑗 > 𝜃𝜃, then 𝑃𝑃𝐴𝐴𝑗𝑗 = 𝑎𝑎𝑗𝑗−1 𝐶𝐶𝑗𝑗̅ = 0, 𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃 = 1. Therefore, then 𝑃𝑃𝐴𝐴𝑗𝑗 ≠ 𝑎𝑎𝑗𝑗−1 𝐶𝐶𝑗𝑗̅ = 0, 𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃 = 𝑃𝑃𝐶𝐶𝑗𝑗̅ = 0𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 ̅ , 𝜃𝜃 = 𝑃𝑃𝐶𝐶𝑗𝑗̅ = 0𝐿𝐿𝑗𝑗 , 𝐶𝐶𝑗𝑗−1 ̅ = 0, 𝜃𝜃. 𝑃𝑃𝐴𝐴𝑗𝑗 , 𝐶𝐶𝑗𝑗̅ = 0𝐿𝐿𝑗𝑗 , 𝐴𝐴𝑗𝑗−1 Given the change of treatment denoted on DTRs, the previous stabilized weights for IPTW and IPCW can be adjusted as below: ̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =0,𝐴𝐴̅ 𝑗𝑗−1 ),𝑉𝑉=𝑣𝑣𝑖𝑖 ) 𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) P(𝐴𝐴𝑗𝑗 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗 )\|𝐶𝐶𝑗𝑗 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ̅ ) ̅ ̅ P(𝐴𝐴𝑗𝑗 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗 )\|𝐶𝐶𝑗𝑗 =0,𝐴𝐴𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗−1 ),𝐿𝐿𝑗𝑗 =𝑙𝑙𝑖𝑖,𝑗𝑗 • 𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡)=∏𝑗𝑗=0 (4.11) • 𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡)=∏𝑘𝑘=0 (4.12) 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =0,,𝐴𝐴̅𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗−1 ),𝑉𝑉=𝑣𝑣𝑖𝑖 ) =0\|𝐶𝐶𝑗𝑗−1 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) 𝑃𝑃𝑃𝑃(𝐶𝐶𝑗𝑗 . 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ̅ ) ̅ 𝑃𝑃𝑃𝑃(𝐶𝐶𝑗𝑗 =0\|𝐶𝐶𝑗𝑗−1 =0,𝐴𝐴̅𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗−1 ),𝐿𝐿𝑗𝑗 =𝑙𝑙𝑖𝑖,𝑗𝑗 More generally, the weights could also represent in the following format: • 𝑎𝑎𝑎𝑎𝑎𝑎 ̅ 𝑎𝑎𝑎𝑎𝑎𝑎 =0,𝑉𝑉=𝑣𝑣𝑖𝑖 ) =0\|𝐶𝐶𝑗𝑗̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =0,𝐶𝐶𝑗𝑗−1 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) P(𝐶𝐶𝑗𝑗̅ ̅ 𝑎𝑎𝑎𝑎𝑎𝑎 =0,𝐿𝐿 =𝑙𝑙 ̅ ) P(𝐶𝐶𝑗𝑗𝑎𝑎𝑎𝑎𝑎𝑎 =0\|𝐶𝐶𝑗𝑗̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =0,𝐶𝐶𝑗𝑗−1 𝑗𝑗 𝑖𝑖,𝑗𝑗 𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡)=∏𝑗𝑗=0 (4.13) • 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ̅ 𝑎𝑎𝑎𝑎𝑎𝑎 =0,𝑉𝑉=𝑣𝑣 ) ̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =0,𝐶𝐶𝑗𝑗−1 =0\|𝐶𝐶𝑗𝑗−1 𝑖𝑖 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) 𝑃𝑃𝑃𝑃(𝐶𝐶𝑗𝑗 . 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑎𝑎𝑎𝑎𝑎𝑎 ̅ ̅ ̅ 𝑃𝑃𝑃𝑃(𝐶𝐶𝑗𝑗 =0\|𝐶𝐶𝑗𝑗−1 =0,𝐶𝐶𝑗𝑗−1 =0,𝐿𝐿𝑗𝑗 =𝑙𝑙𝑖𝑖,𝑗𝑗 ) 𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡)=∏𝑘𝑘=0 82 (4.14) In summary, the weight for dynamic MSMs could be investigated by the censoring weight. Whenever time-dependent confounders are contained in the model, it is difficult 1 . All the to use the standard Cox model in any software to compute IPTW estimator β subject-specific weights, 𝑆𝑆𝑆𝑆𝑖𝑖𝑖𝑖 , vary over time, and most standard Cox models in any software programs, even those that allow for subject-specific weights, have a hard time handling subject-specific, time-varying weights. The best approach to overcome this software problem is to fit a weighted pooled logistic regression, assuming each participant has a repetitive routine observation for every fixed time period. The model is P𝑌𝑌𝑑𝑑,𝑗𝑗+1 = 1𝑌𝑌𝑑𝑑,𝑗𝑗 = 0, 𝑉𝑉, 𝜃𝜃 = 𝛽𝛽0 + 𝛽𝛽1 𝑉𝑉 + 𝛽𝛽2 𝜃𝜃, (4.15) where j is an integer that denotes each fixed time period since the start of follow-up. 4.5 Variable Selection According to the situation, the evaluation criteria for a prediction model's performance can differ. However, two aspects are always key: 1) the accuracy of predictions about future data and 2) the difficulty of interpreting the model. Obviously, a model that has limited external validity lacks persuasiveness. At the same time, the chance of being inappropriately applied is high if the model includes numerous parameters. Therefore, parsimony is an important virtue in the model-selection field. In order to balance the accuracy and interpretability of a model, penalization techniques, also called shrinkage or regularization methods, have been developed to improve models. Although shrinking parts of the regression coefficients toward zero may 83 bias the estimates, these coefficient estimates have smaller variances, enhancing the accuracy of prediction by reducing the mean squared error.151 Regression coefficients are shrunk by imposing a penalty on their size; this is achieved by adding a penalty function to the ordinary linear square (OLS) model. Several regularization methods exist, which are classified according to the structure of the penalty function. Some of them enable variable selection, which filters unimportant parameters out of the model. Ridge regression152 estimates regression coefficients through an L2-norm penalized least-squares criteria. As a continuous shrinkage method, ridge regression achieves its better predicting performance through a bias-variance trade-off. It not only shrinks the coefficient of each variable independently but also shrinks the coefficients of correlated variables toward each other.153 However, ridge regression cannot produce a parsimonious model by shrinking the coefficients to zero. Therefore, ridge regression is ideal if there are many predictors and all of them have enough influence on the dependent variable or if all predictors are enforced to keep in the model 1 min[ \|\|𝑦𝑦 − 𝑋𝑋𝑋𝑋\|\|2 + 𝜆𝜆\|\|𝛽𝛽\|\|2 ]. 𝛽𝛽 𝑁𝑁 To identify a parsimonious model given many predictors, the least absolute shrinkage and selection operator (LASSO) was proposed by Tibshirani.154 Unlike ridge regression, the LASSO imposes an L1-penalty on the regression coefficient, possessing the characters, continuous shrinkage, and automatic variable selection simultaneously. The LASSO does not outperform ridge regression in prediction performance.154,155 However, as variable selection becomes increasingly important in data analysis, the LASSO is much more appealing owing to its sparse representation.156 Although the LASSO has shown its superiority in many situations, it has some 84 limitations: 1) If the number of variable (p) is larger than the number of observation (n), the LASSO at most selects n variables before it saturates. At the same time, the LASSO is not well defined unless the bound on the L1-norm of the coefficients is smaller than a certain value. 2) If n>p, as long as there are high correlations among variables, it has been empirically observed that the ridge regression dominates the LASSO in prediction performances. 3) If the pairwise correlations are very high among a group of variables, then the LASSO tends to select only one variable from the group and does not care which one is selected.154,156 Below is the LASSO model, 1 min[ \|\|𝑦𝑦 − 𝑋𝑋𝑋𝑋\|\|2 + 𝜆𝜆\|\|𝛽𝛽\|\|1 ]. 𝛽𝛽 𝑁𝑁 Elastic net is an alternative regularization method that avoids the above limitations. It inherits the advantages of ridge regression and LASSO by imposing both the L1-norm and L2-norm penalties on the regression coefficient through balance factor 𝛼𝛼. The factor ranges from 0 to 1, which balances the characters between ridge regression and LASSO. The larger the factor is, the more it performs as a LASSO. In the most extreme situation, when 𝛼𝛼 equals 1, the factor loses the function of the L2-norm penalty and performs the same as LASSO. Conversely, if the factor equals 0, then it performs as a ridge regression. Since the probability of having highly correlated variables is high, the elastic net method, (1 − 𝛼𝛼)\|𝛽𝛽\| 1 2 min[ \|\|𝑦𝑦 − 𝑋𝑋𝑋𝑋\|\|2 + 𝜆𝜆[ + 𝛼𝛼\|𝛽𝛽\|1 ]], 𝛽𝛽 𝑁𝑁 2 was chosen to conduct variable selection. In order to apply elastic net successfully, the following four issues have to be solved: 1) Should mixed effects be taken into consideration? 2) Does the choice of 85 measurement for cross-validation affect the results and the number of variables that would be selected? If so, which measurement is associated with the optimal result? 3) Could the outcome be narrowed down? 4) What is the optimal combination of 𝛼𝛼 and 𝛽𝛽 for each outcome? 1) Should mixed effects be taken into consideration? In this study, only fixed effects were investigated for variable selection and finalized predictive score model. Obviously, mixed effects existed: the fixed effects were represented in patients' levels, and random effects were captured by multiple routine visits of the same patient. However, the decision of investigating only fixed effects was supported by two rationales. First, with the assistance of elastic net, p-value was not required for selecting variables. Moreover, in order to create a score that predicts the chance of having a rational treatment change, only the coefficients are needed for those variables that have been selected in the predictive model. However, the difference between the fixed and mixed effects models is trivial in terms of investigating the coefficients if there is no interaction between fixed-effects parameters and random-effects parameters, which is very likely the case in this study. The best example would be a patient with various values of clinical variables and treatment combinations among different visits. Last, the number of category for random effects is gigantic, which increase the computational burden. Each patient in the cohort of 4,760 represents one category of random effects. Currently, only one program is able to cross-validate mixedeffect models. For each imputed dataset with given 𝛼𝛼, it took about 6 hours to achieve the regression. However, 10 imputed datasets, 10 predetermined 𝛼𝛼, and 6 outcomes would be investigated. The overall computational time could be more than 3,600 hours (150 days). 86 Therefore, only fixed effects were investigated. Specifically, logistic regression with elastic net was applied on variable selection: 1 𝑇𝑇 𝑇𝑇 𝛽𝛽0 +𝑥𝑥𝑖𝑖 𝛽𝛽 min −[ ∑𝑁𝑁 )] + 𝜆𝜆 𝑖𝑖=1 𝑦𝑦𝑖𝑖 (𝛽𝛽0 + 𝑥𝑥𝑖𝑖 𝛽𝛽) − log(1 + 𝑒𝑒 𝛽𝛽0 ,𝛽𝛽 𝑁𝑁 (1−𝛼𝛼) 2 ‖𝛽𝛽‖22 + 𝛼𝛼‖𝛽𝛽‖1 . The predictive score was calculated according to the coefficients of those selected variables in the logistic regression. Even though random effect was not investigated, its influence could still bias the cross-validation results. Traditional cross-validation programs only partition each observation without clustering it on patient level. When applying such a program, it is very likely that same patient's visits would be partitioned into different subsamples, which definitely biases the cross-validation results. A new cross-validation program was coded to partition all visits belonging to the same patient into the same subsamples to avoid this issue. 2) Does the choice of measurement for cross-validation affect the results and the number of variables that would be selected? If so, which measurement is associated with the optimal result? The choice of measurement for cross-validation definitely affects the number of variables to be selected. According to both expert opinion and the results of an exploratory analysis, the deviance was used as the measurement for cross-validation. Cross-validation is a practical way of using computation in place of mathematical analysis to investigate how a predictive model performs on a validation set. K-fold validation is one way of conducting cross-validation. It automatically partitions the original dataset into k subsamples, using the k-1 subsamples as training data and the rest one subsample as the validation data. In order to identify the optimal penalty factor, 87 lambda, in the elastic net, k-fold cross-validations were conducted on each potential value of lambda. After conducting the cross-validation, the relationship between each lambda and the performance of its related model were generated. Then the lambda associated with the optimal performance was selected. However, several measurements could be applied to capture the performance of cross-validation. An exploratory analysis (Appendix C) was conducted to explore the impact of different measurements on the number of variables that would be selected using the 10 imputed datasets. Four measurements were chosen: deviance, misclassification error, ROC, and mean squared error. All results indicated that the number of variables selected would vary according to measurement type. To simplify the presentation, the result of only one outcome was presented in Appendix C, Figure C.1: rational treatment change under strict definition and not including bronchodilators (BD) use as a treatment class in imputed dataset 1, in which alpha equaled 0.7. Other than type of measurement for cross-validation, the choice of targeted lambda could also affect the result. Among several ways of investigating the targeted lambda, two methods are common and well identified: lambda.min and lambda.1se. Lambda.min is the value associated with minimum mean cross-validated error. Lambda.1se gives the most regularized model, such that error is within one standard error of the minimum mean cross-validated error. Compared with lambda.min, the value of lambda.1se is less likely to overfit the data. Therefore, lambda.min was initially applied to investigate the optimal alpha associated with the minimum mean cross-validated error among 10 imputed datasets. Then lambda.1se was applied to identify the optimal lambda given optimal alpha. 88 Each of the four figures in Appendix C, Figure C.1, represents a cross-validation figure given different types of measurements. Each figure has two dotted lines; the left and right ones represent how many variables would be chosen if lambda.1min and lambda.1se were applied, respectively. The reason the left dotted line always belonged to lambda.min is that it overfits the data compared to lambda.1se. If the AUC curve was the measurement, it would select the largest number of variables: 92 and 55 for lambda.min and lambda.1se, respectively. The numbers decreased to 83 and around 34 if deviance was applied, and 83 and 22 if MSE was used. The misclassification error would select even fewer variables: around 68 for lambda.min and no variable for lambda.1se. However, considering the trend of the misclassification error, which was consistent regardless of the number of variables chosen, it was not a qualified measurement for this model. Similarly, AUC had a fragmented trend, when the number of selected variables decreased to specific values. The difference between deviance and MSE was trivial; however, considering expert opinion, the deviance was chosen. 3) Could the outcome be narrowed down? There were six ways of identifying the outcomes, rational treatment change, according to whether considers BD use as one treatment class, and different assumptions. The three assumptions were defined according to the strictness of identifying rational treatment change. In the loose assumption, all treatment changes were treated as rational treatment changes regardless of the changes on clinical signals. In the neutral assumption, the termination of any treatment class had to match the changes on clinical signals, which indicate that a patient's health improved since previous visit. For the strict assumption, all rational treatment changes had to comply with the changes on clinical signals. More 89 specifically under the strict assumption, a rational treatment would occur only in the following two scenarios: 1) a patient received more treatments or more treatment classes when he had worse clinical signals compared with a previous visit; 2) a patient received fewer treatments or fewer treatment classes when he had better clinical signals compared with a previous visit. The worse clinical signals could be one of the following three: lower predicted FEV1%, more PExs, or more drug resistance. Similarly, the better clinical signals were identified in the reverse. If the treatment change did not match the related assumption, no rational treatment change would be marked under that assumption. Previously, an example was given to illustrate the definition of rational treatment change under the neutral assumption. Appendix C, Table C.1, gives an example of the variation of defining the rational treatment change under different assumptions. To simply the example, predicted FEV1% was assumed to be the only clinical signal that would affect the decision of rational treatment change. Because of the unique data that were collected, when a patient reported his treatment in a current visit, it reflected only the treatment he had received up until the visit. In order to identify the rational treatment change occurring in a current visit, a comparison between treatment combinations that a patient receives in a current visit and a subsequent visit is needed. However, the change of clinical signals in a current visit is determined by the difference of values between a previous visit and the current visit. For example, in visit 1, a patient reported that he had previously received only one mucolytic and had 52% of predicted FEV1. According to the treatment information in visit 2-one mucolytic, one inhaled antibiotic, and two BDs-he had a treatment change in visit 1. Compared with the clinical signals in visit 0, 75% of predicted FEV1, he had a huge decrease on clinical signals in visit 1. The hypothetical 90 scenario of disease progression matched all assumptions at visit 1; therefore, all rational treatment changes were marked as taking place in visit 1. In visit 2, he stopped using one BD with improved clinical signal, which still matched all assumptions of rational treatment change when taking BD use into consideration. Therefore, three assumptions, which included BD use as a treatment class, had rational treatment changes in visit 2. The clinical signal kept increasing, and the patient received an additional anti-inflammatory at visit 3, which conflicted with the strict assumption. When it came to visit 4, the patient had a slightly decreased clinical signal and terminated BD use, which conflicted with the neutral and strict assumptions. Moreover, the treatment change occurred only for BD use; therefore, other than the loose assumption that included BD use as a treatment class, the rest of the assumptions were marked as 0-no related rational treatment change. Regardless of assumptions, the rational treatment change was always missing at the first and last visit because neither the clinical signal that occurred before the first visit nor the future treatment information that occurred after the last visit was measureable. In other words, the more strict an assumption is, the more clinical signals are required to match the treatment change. The main purpose of this section is to investigate the chance of not considering BD use as a treatment class. Although the chronic use of BDs is associated with uncertain or negative benefits according to guidelines, a comparison of mean cross-validated error using deviance as the measurement has been conducted between treatment change that includes BD use and treatment change that does not include BD use under the strict assumption (compare Appendix C, Tables C.2 and C.3). The comparison was conducted in all 10 imputed datasets given deciles of alpha from 0 to 1. In each cell, the number represents the 91 minimum of mean cross-validated error given related alpha in the dataset. If including the BD use, the mean of deviance ranged from 0.880462 to 0.851288 on average, conditional on related alpha among 10 imputed datasets. With the increase in alpha, the mean of deviance decreased. The yellow cell indicated the minimum of deviance in each imputed dataset. Compared with other alphas, alpha equaled to 1 was always associated with the minimum of deviance. Excluding BD use, the mean of deviance ranged from 0.513688 to 0.506297 on average given related alpha among 10 imputed datasets. The trend between alphas and deviances was similar to the one including BD use. Alpha equaled to 1 was also associated with the minimum of deviance. Therefore, those two models shared several characteristics in terms of the balance between ridge regression and the LASSO. However, compared with excluding BD use, the deviances in another model were almost double, indicating a higher chance of inaccurately identifying the rational treatment change. Therefore, Appendix C, Tables C.2 and C.3, support the conclusion that BD use is associated with more irrational treatment change. According to the above reasons, together with the guideline marking BD as a treatment with low certainty of net benefit, only three treatment classes were considered in this study: inhaled antibiotics, mucolytics, and anti-inflammatories. 4) What is the optimal combination of α and β for each outcome? According to a series of decisions in the above sections, BD use was not considered as a treatment class to define rational treatment change, deviance was applied as the measurement of cross-validated error, and only fixed effects were estimated in objective 2. More specifically, the elastic net was applied to select variables in three steps, 92 identifying the optimal 𝛼𝛼 and optimal 𝜆𝜆̂ and choosing the variables given the optimal 𝛼𝛼 and 𝜆𝜆̂ using the following model: (1 − 𝛼𝛼)\|𝛽𝛽\| 1 2 min[ \|\|𝑦𝑦 − 𝑋𝑋𝑋𝑋\|\|2 + 𝜆𝜆[ + 𝛼𝛼\|𝛽𝛽\|1 ]]. 𝛽𝛽 𝑁𝑁 2 First, the optimal 𝛼𝛼 was identified, which was associated with the minimum mean cross-validated error using deviance as the measurement in each imputed dataset 𝑖𝑖 (𝑖𝑖=1, 2, …, 10). The 𝜆𝜆∗𝑖𝑖 represents the 𝜆𝜆 associated with the minimum mean cross-validated error, 𝜀𝜀̅𝑐𝑐𝑐𝑐,𝑖𝑖 , in the imputed dataset 𝑖𝑖. For each imputed dataset, a 10-fold cross-validation was conducted. Therefore, 𝛽𝛽𝜆𝜆∗𝑖𝑖 is a vector of 𝛽𝛽 given 𝜆𝜆∗𝑖𝑖 . The 𝛼𝛼 was determined by the median of 𝛼𝛼𝑖𝑖 among the imputed datasets, arg min 𝜀𝜀̅𝑐𝑐𝑐𝑐,𝑖𝑖 = arg min (𝑦𝑦 − 𝑋𝑋𝛽𝛽̂𝜆𝜆∗𝚤𝚤 )𝑐𝑐𝑐𝑐,𝚤𝚤 . 𝛼𝛼𝑖𝑖 𝛼𝛼𝑖𝑖 In order to prevent overfitting, 𝜆𝜆′𝑖𝑖 , which gives the most regularized model such that error is within one standard error of the minimum mean cross-validated error given 𝛼𝛼, was identified among each imputed dataset 𝑖𝑖. Similarly, 𝜆𝜆̂ was determined by the median of 𝜆𝜆̂′𝑖𝑖 among the imputed datasets. Therefore, the optimal 𝛼𝛼 and 𝜆𝜆̂ were generated, which balanced the relationship between minimizing the mean cross-validated error and overfitting the data. Given the 𝛼𝛼 and 𝜆𝜆̂, the 𝛽𝛽̂𝑙𝑙 was calculated for each imputed dataset 𝑖𝑖. The 𝛽𝛽̂𝑙𝑙 is a vector that includes the coefficients of all independent variables to predict rational treatment change. Variables were selected in the predictive model as long as the related element in 𝛽𝛽̂𝑙𝑙 was not equal to 0 in any imputed dataset 𝑖𝑖. A set included all variables that were selected by the elastic net is denoted by 𝑆𝑆. The 𝑋𝑋𝑠𝑠𝑠𝑠 is a vector of individuals' variables that are included in 𝑆𝑆, which were measured at all visits in the imputed dataset 𝑖𝑖. 93 The rational treatment change among all visits in each imputed dataset 𝑖𝑖 is marked as 𝑌𝑌𝑡𝑡𝑡𝑡,𝑖𝑖 . The generalized linear model with log link function was applied to predict the probability of having rational treatment change in each imputed dataset 𝑖𝑖. Following Rubin's rule,157 the 𝛽𝛽̂𝑠𝑠𝑠𝑠 were combined as 𝛽𝛽̂𝑠𝑠 . log Pr𝑌𝑌𝑡𝑡𝑡𝑡,𝑖𝑖 = 1\| 𝑋𝑋𝑠𝑠𝑠𝑠 = 𝛽𝛽𝑠𝑠𝑠𝑠 𝑋𝑋𝑠𝑠𝑠𝑠 + 𝜉𝜉𝑖𝑖 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑙𝑙 = log Pr𝑌𝑌𝑡𝑡𝑡𝑡,𝑖𝑖 = 1\| 𝑋𝑋𝑠𝑠𝑠𝑠 = 𝛽𝛽̂𝑠𝑠 𝑋𝑋𝑠𝑠𝑠𝑠 In order to closely mimic the strategy of rational treatment change, the predicted probability of rational treatment change, 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑙𝑙 , and the relative change of the predicted probability of rational treatment change between the current and previous visits, 𝑟𝑟𝑟𝑟 𝑡𝑡𝑡𝑡,𝑙𝑙 , for all visits in each imputed dataset 𝑖𝑖 were calculated. The 𝑝𝑝∗ and 𝑝𝑝∗∗ left corner of the ROC curve were chosen as the cutoff for 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑙𝑙 and 𝑟𝑟𝑟𝑟 𝑡𝑡𝑡𝑡,𝑙𝑙 , respectively, in all imputed datasets. The confidence interval of 𝑝𝑝∗ and 𝑝𝑝∗∗ were estimated using the nonparametric bootstrapping method. The quintile of 95% CI of 𝑝𝑝∗∗ was used to generate cutoffs of 𝑝𝑝∗∗ , ∗ represented by 𝑝𝑝𝑛𝑛∗∗ (n = 1, 2, … , 5). In order to have a larger range of 𝑝𝑝𝑚𝑚 (m=1, 2,…, 5), the distance between the lower boundary of 95% CI of 𝑝𝑝∗ and 𝑝𝑝∗ was applied to calculate ∗ , and 𝑝𝑝∗ was set as 𝑝𝑝3∗ . Therefore, from 𝑝𝑝1∗ to 𝑝𝑝5∗ , the value increases; 𝑝𝑝2∗ and 𝑝𝑝4∗ 𝑝𝑝𝑚𝑚 represent the lower and upper boundary of the 95% CI of 𝑝𝑝∗ . 4.6 Statistical Analyses To simplify the description, demographic characteristics, clinical variables, comorbidities, and treatment/pathogen-related variables denote the feature of a group of variables, respectively (Table 4.2). The feature, especially for clinical variables and treatment/pathogen-related variables, not only includes the value itself, but also includes 94 the time since index date or other clinical meaningful point, such as the occurrence of PEx or drug resistance. For instance, clinical variables denote predicted FEV1%, relative change of predicted FEV1% compared with the optimal value in the last year, and number of PEx in the previous year since current visit. Comorbidities include CFRD, pancreatic insufficiency, gastrointestinal symptoms, asthma, liver disease, etc. Treatment/pathogen-related variables indicate the previous treatment combinations/patterns, number of treatment change and type of treatment change in the previous year, time and result of culture test for airway infection, which was not caused by P. aeruginosa, and drug resistance. Aim 1. To describe treatment patterns and changes in the original cohort a) Described the characteristics of the cohort I. Investigated the FEV1% trajectory caused by different reasons during the hospitalization for patients in each calendar year using original database. II. Summarized patient's baseline demographic characteristics and clinical variables in the cohort and subgroups, which are categorized by mutation classes and initial treatments, respectively. III. Summarized the prevalence and incidence of death by different reasons in each calendar year. b) Described initial treatment, probability of transitioning to specific treatment combinations, and length of having specific treatment combinations. I. Identified treatment change by comparing each patient's treatment classes in current outpatient visit with the previous one. II. Identified treatment length for each patient and each treatment combination by 95 using the gap between when a patient changes to a specific treatment combination and the time the next treatment change occurs. III. Described the treatment combination at the baseline. IV. Investigated the relationship between the 1st treatment change and the potential treatment combinations that a patient could switch to by summarizing the length on current treatment, and the probability of transitioning to potential treatment combinations. V. Investigated the relationship between all treatment changes and the potential treatment combinations that a patient could switch to by summarizing the length on current treatment, and the probability of transitioning to potential treatment combinations. Aim 2. To create a lung treatment score that indicates suboptimal lung health management by when rational treatment changes occur. a) Predicted the probability of rational lung health maintenance treatment change given demographic characteristics and clinical outcome variables, such as predicted FEV1% of current visit, change of predicted FEV1%, additional occurrences of PEx in the last year, and additional indicators of drug resistance. I. Independent variable identification: a. Independent variables were identified by all the variables that existed in the cystic fibrosis related literature. b. All the unique variables that existed in the CFFPR were taken into consideration. c. If a variable, other than a pathogen/treatment-related variable, was 96 missing for more than 50% of the patients, then this variable was not included. d. If a pathogen/treatment-related variable existed more than once in a particular patient's record, and that patient had a related treatment after being diagnosed with that pathogen, as long as the frequency of having this variable was consistent with the frequency that the majority of the patients had this variable, then the variable was included even if more than half of the time it was missing from that patient's record. e. II. Cubic spline for time was included to fit model to data. Variable selection by elastic net: a. Identified the optimal balance factor, 𝛼𝛼, by investigating the probability of specific 𝛼𝛼 that was chosen among 10 imputed datasets according to the minimum of mean cross-validated error. b. Identified the optimal penalty factor, 𝜆𝜆, by investigating the minimum standard deviation of lambda given 𝛼𝛼 among 10 imputed datasets and the probability that 𝛼𝛼 had been chosen in step 1. c. Selected variables in the model by investigating the proportion of a variable that had been selected given 𝛼𝛼 and 𝜆𝜆 that had been chosen in the previous steps among 10 imputed datasets. d. Calculated the coefficient for each variable by combining the related coefficients that were identified among 10 imputed datasets. b) Identified timing strategies for treatment change according to different thresholds of predicted probability of having rational treatment change. Aim 3. To investigate the comparative effectiveness of different treatment strategies as 97 part of rational treatment changes to delay the acquisition of mucoid PaPI. a) Created an augmented dataset, in which each patient had 25 replicates. b) Artificially censored patient, if the patient was not following the related strategy, dθ (θ=1, 2, … , 25). c) Constructed the final stabilized weight (SW) for all visits in each replicate, respectively. I. Calculated stabilized treatment weight for all visits in the same replicate. 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =0,𝐴𝐴̅𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗−1 ),𝑉𝑉=𝑣𝑣𝑖𝑖 ) 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) P(𝐴𝐴𝑗𝑗 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗 )\|𝐶𝐶𝑗𝑗̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ̅ ) P(𝐴𝐴𝑗𝑗 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗 )\|𝐶𝐶𝑗𝑗̅ =0,𝐴𝐴̅𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗−1 ),𝐿𝐿𝑗𝑗 =𝑙𝑙𝑖𝑖,𝑗𝑗 𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡)=∏𝑗𝑗=0 II. Calculated stabilized censoring weight by each visit. 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ̅ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =0,,𝐴𝐴̅ 𝑗𝑗−1 ),𝑉𝑉=𝑣𝑣𝑖𝑖 ) =0\|𝐶𝐶𝑗𝑗−1 𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿 𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) 𝑃𝑃𝑃𝑃(𝐶𝐶𝑗𝑗 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ̅ ) ̅ ̅ 𝑃𝑃𝑃𝑃(𝐶𝐶𝑗𝑗 =0\|𝐶𝐶𝑗𝑗−1 =0,𝐴𝐴𝑗𝑗−1 =𝑑𝑑𝜃𝜃 (𝐿𝐿𝑗𝑗−1 ),𝐿𝐿𝑗𝑗 =𝑙𝑙𝑖𝑖,𝑗𝑗 𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡)=∏𝑘𝑘=0 III. d) Created the final stabilized weight. 𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡) =𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡)𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 (𝑡𝑡) Built the regression model and Kaplan-Meier Curve. I. II. Nonparametric Kaplan-Meier Curves. Fixed parameterization of the dynamic logistic MSMs with the constant-time hazards, P𝑌𝑌𝑑𝑑,𝑗𝑗+1 = 1𝑌𝑌𝑑𝑑,𝑗𝑗 = 0, 𝑉𝑉, 𝜃𝜃 = 𝛽𝛽0 + 𝛽𝛽1 𝑉𝑉 + 𝛽𝛽2 𝜃𝜃. III. Flexible parameterization of the dynamic logistic MSMs with the discrete- time hazards, P𝑌𝑌𝑑𝑑,𝑗𝑗+1 = 1𝑌𝑌𝑑𝑑,𝑗𝑗 = 0, 𝑉𝑉, 𝜃𝜃, 𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝑉𝑉 + 𝛽𝛽2 𝜃𝜃+𝛽𝛽3 𝜃𝜃𝜃𝜃. To simplify the figure, I have only included ΔFEV1% in Figure 4.1. It represents the combination or the matrix of all the clinical time-varying covariates, such as ΔFEV1%, FEV1%, and PEx, which are also the core dependent variables to predict the lung treatment score. Since the identification and classification of all those clinical time- 98 varying covariates are highly associated with lung function, the casual pathway would be same compared with this figure. Therefore, in Aim 3, when I mention ΔFEV1%, it represents the matrix of all time-varying covariates that affect decision-making on lung health maintenance treatment change, and alternatively the lung treatment score. The majority of clinical variables and treatment-related variables are time-varying covariates, which could act as time-dependent confounders and intermediate variables within different causal pathways. If ignoring the relationship within each group, then both demographic characteristics and comorbidity variables are associated with time-varying clinical covariates. As mentioned previously, even though time-varying covariates are the main issues in this model, they also represent the beauty of this model. Since majority of demographic and comorbidity variables influence the exposure and outcome indirectly through predicted FEV1% or other clinical time-varying covariates, as shown in Figure 4.1, after adjusting the time-varying FEV1%, I only need adjust other pathogen caused infections and any treatments related to those infections to generate the unbiased estimation. It definitely reduces the chance of having inappropriate adjustment and enhances the probability of having unbiased estimation at the same time. Age, gender, race, ethnicity, and height are variables that affect the predicted normal FEV1, which indirectly impact the ΔFEV1%. A study indicates pancreatic insufficiency also affects the FEV1 value.158 The genotype of CFTR not only affects the severity of lung function deterioration, but also impacts the time to mucoid P. aeruginosa colonization159 for CF patients. Similar to the prediction model in Aim 2, variables were categorized as two types, baseline variables and time-varying variables to calculate the numerator and denominator 99 for IPTW and IPCW. For each visit, there were three values at different time points: current visit, last visit, and the visit with an optimal value in the previous year, to illustrate time-varying variables. To focus on the causation of varied strategies for chronic lung health maintenance treatment on delay in acquisition of mucoid PaPI, the scenarios where lung function temporally fluctuates steeply was not considered. The majority of those scenarios occurred during PEx-caused hospitalizations. Therefore, a pseudo encounter visit was generated to represent the recovered lung function and an indicator was created to show a PEx was cured right before this visit. In clinical practice, the definition of a cure for PEx is to recover the predicted FEV1% back to 90% of the optimal predicted FEV1% in the previous year. If the patient is not able to fulfill that goal within 2 weeks, healthcare providers usually stop the treatment for that specific PEx, to avoid the drug resistance to the related i.v. antibiotic. During the PEx-caused hospitalizations, only the records that occurred at the last date were taken into consideration as the candidate of pseudo encounter, unless there was a record that indicates all variables were measured right after the PEx was cured. Then from demographic characteristics to clinical variables, all of the values in that visit was used regardless of whether there was any record that existed after it and before the last date of hospitalization. If neither a record of the last date of hospitalization nor a record indicating that all variables were measured right after the PEx was cured was available, then the last record during that hospitalization was used as the pseudo encounter. However, time-varying variables, such as FEV1, height, and weight were identified as missing regardless of the value measured at that visit. For all the chronic treatments, which should not vary over the short term, if an individual treatment 100 was prescribed in 1 day, then it was assumed that treatment was applying throughout the hospitalization. A 6-month follow-up after the PEx was also measured to indicate whether the PEx was cured. If disease status was stable: no more than 10% reduction on predicted FEV1%, and no moderate or severe PEx in any outpatient visit, then the mean value of all the predicted FEV1% during that 6-month period was used to represent the predicted FEV1% when the patient was out of hospital. Whenever a patient has a positive culture test for a new pathogen, or initially resists specific treatment, that visit was identified as the time when a new infection or drug resistance happened. Antifungals and clarithromycin could be applied as both shortterm and long-term treatment. Those two treatments were considered when they were used chronically to treat Aspergillosis species and MAI, respectively. In order to appropriately estimate the treatment effects of chronic lung health maintenance medication for PaPI, the above two medications were adjusted as confounders. 4.7 Data Reformatting In clinical practice, CF patients should have a routine visit at least every quarter, where all the clinical variables, such as FEV1 and FVC, are measured. The data in the CFFPR show the evidence that this practice is standard with every patient having, on average, a routine visit every 3 months. At the same time, I also conducted an exploratory analysis using an independent cohort to investigate the relationship between frequency of encounter visit and lung function deterioration using generalized linear model. The relative change of mean FEV1% between the first and the last year was applied as the dependent variable. Independent variables included mean of number of visits in the 101 cohort, age, length of follow-up since index date, gender, race, ethnicity, height, CFRD status, and number of treatments in each one of treatment classes at the first and last year, respectively. Even though the number of visits did affect lung function deterioration, the impact was trivial compared to the effect of treatments that a patient received at the first and last year in the cohort (Appendix D, Tables D.1 and D.2). More specifically, the longer follow-up a patient had in this cohort, the less impact the number of visit had to the lung function deterioration (Appendix D, Table D.2). Considering all patients had at least 2 years of follow-up in the original cohort, I believe that the data reformatting to have a quarterly visit for each patient was reasonable, and should not bias the identification of the optimal treatment regime. Supported by both data and experience in clinical practice, it is reasonable to restructure the database as each patient has a fixed number of outpatient visits per calendar year. Following real-world clinical practice, I restructured the data quarterly. The index date was identified as t=0, which is the latest date between date of diagnosis with nonmucoid PaPI and the first encounter date after Dec 31, 2005. It also acts as the first core date for each individual. Then I set up the rest core date for each time interval, which is 90 (91.3125) days away from the core date in the previous interval. Each time interval started at 60 (61.3125) days before the core date and ended at 30 days after the core date (Figure 4.2). If a patient had more than one visit in a quarter, the visit that happened in the interval before the core date, closest to the core date, was chosen as E3 rather than E2 or E4 during the T1 interval. In the figure, Ei denotes the i th encounter visit since index date, and Tj represents the j th quarter interval (j=1, 2, 3, …, 24). If there was no visit in advance, at most 61.3125 days away from the core date, then the closest 102 encounter visit occurred after the core date, at most 30 days away, was chosen as E5 rather than E6 during T2. If no encounter visit existed in the time interval, then the missing observation was used as the encounter visit of this time interval, and further imputation was conducted for this missed observation. 4.8 Missing Data Missing data is common for all types of databases in the healthcare field, from survey, EMR to claims databases. Generally speaking, there are three steps for analysis with missing data: identifying potential reasons for the data to be missing, investigating the mechanism of missing data, and applying the optimal method to impute missing values. In our study, there are four main reasons for missing data. First of all was attrition due to natural processes: these include patient death, loss of follow-up, progression to mucoid PaPI, and not following the specific treatment change regime. Data collection issues during outpatient visit could also result in missing data: for example, failing to or inappropriately measuring the demographic characteristics, clinical variables, or treatment-related variables during the encounter visit. Reformatting the data also caused missing information for patients, particularly those who visit infrequently or whose visits are unevenly distributed in the time intervals. Finally, given that the information in the CFFPR was all collected through a patient questionnaire, it is possible that a patient could have skipped or refused to answer some questions intentionally. However, the positive results from the external audit and the exploratory analysis, which investigated the quality of data in the CFFPR, made this final issue unlikely to be a significant problem. The mechanisms of missing data are categorized into three groups: missing 103 completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). MCAR means that the probability of having missing data on a variable is totally independent of any other variables or the value of itself. A considerably weaker assumption is made when the probability of having missing data for one variable is only conditional on other variables but the value of itself. MNAR is much harder to handle and even though many of models might be applicable, nothing in the data indicates which one of those models is correct.160 Considering that this study is an observational study, but a randomized clinical trial (RCT), hypothetically, the majority of the missing data should be MAR, varies by other variables, rather than MCAR. Moreover, the probability of having MNAR should be low, since the data in the patient registry was collected directly from patients, used for improving disease monitoring and future treatment. At the same time, some of the missing variables could have linear trends with time on patient level- such as height and other demographic variables that are not fluctuating by time. In order to confirm the mechanism of missing data, t-test, correlation, regression, and ANOVA tests were conducted.161 To yield the least biased estimate, three strategies-deletion techniques, single imputation techniques, and model-based techniques-are applicable. Likewise deletion and pairwise deletion are typical techniques to calculate correlation matrices by excluding all cases that have at least one missing value, or to calculate correlation matrices for each pair of variables that have valid data, respectively. The single imputation technique, which creates only a single dataset with the imputed missing value, includes methods such as last observation carried forward, arithmetic mean or median imputation, and single regression. Model-based techniques mainly cover multiple imputation and maximum likelihood method. 104 At first glance, the single imputation technique is advantageous compared to other techniques, since it makes use of data that the deletion technique would otherwise discard and it is much more straightforward than the model-based techniques. However, the single imputation technique has potentially serious drawbacks, producing biased parameter estimates, and attenuating standard errors, because of treating the imputed values as real data. The model-based technique can appropriately handle those issues, especially multiple imputations, appropriately adjusting the standard errors for missing data.162 In specific situations, the maximum likelihood method is superior to the multiple imputation method.160 This is because the consistency of having same parameters in both imputation model and analysis/outcome model prevents the arbitrary decision-making on choosing parameters for imputation model. In my imputation, I included both the single imputation and model-based technique. There are some drawbacks to using the simple imputation method, including the fact that it overfits the data, which leads to less generalizability than the original data would be. However, for the majority of the demographic characteristics, which are mostly complete with a nearly linear trend, simple imputation is the optimal technique. The arithmetic mean was calculated for time-varying demographic variables, such as height and weight, using the relative change of those variables among all visits that occurred 1 year before and 1 year after the current visit, which may contain missing information. For those visits that were completely missing caused by data reformatting, comorbidities, treatment-related variables, and fixed demographic characteristics, such as race and ethnicity, were captured using the last observation carried forward method. 105 IPCW was applied to measure the influence of censoring from death, loss of follow up, progression to outcome, and treatment that conflicted with hypothetical treatment change strategy. The model-based technique was applied for lung function imputation, considering the fluctuation, the potential for it to be influenced by other variables, and the importance of this variable to the result in Aim 3. When information about PEx was missing, it was assumed that PEx did not occur. If by any chance an acute pulmonary exacerbation had really occurred, there is no doubt that any patient participating in the CFFPR would have visited a CF-accredited hospital to get appropriate treatment, and all the information during that visit would definitely be recorded. In conclusion, with the complex imputation strategy for different variables, the imputed value would be reasonable with appropriate generalizability. 4.9 Assumptions In order to conduct this research, several assumptions have to be made in advance. Basically, they included the assumptions for study design, and for method, specifically causal inference. Those assumptions for study design fell into two categories: major and minor assumptions. Any assumption that requires internal or external tests is major assumption. Without making those prudent assumptions, such as using patients' selfreported treatment in the CFFPR as the physicians' prescribing behavior, the result would be biased or even have limited credibility. Other assumptions are minor assumptions. These assumptions have limited influence on the main hypothesis, but ignoring them could also bias the result. Generally speaking, all the assumptions that were made during data cleaning procedures are minor assumptions, such as assuming all the Pa culture tests 106 without clear phenotype results belong to nonmucoid Pa, as long as it occurred between positive culture tests of nonmucoid and mucoid Pa; assuming all the patients marked as other races have same predicted normal lung function as Caucasian; and assuming all treatments that were prescribed during PEx caused hospitalization would only temporarily affect lung function within that time interval. In this section, I will mainly focus on the major assumptions made in the study design section. Minor assumptions will be discussed in the section on data cleaning. Considering other than consistency, and conditional exchangeability, positivity is the only assumption that is testable, which was investigated by the result in the Aim 1, the assumption of the methodology, causal inference, will be only mentioned in the discussion. In my study, there are four major assumptions. First, I assume that the patients' self-reported treatment information can be applied as a proxy for physicians' prescribing patterns. Second, after a new drug is approved, there should be some related irrational treatment changes. Furthermore, it is appropriate to reformat routine outpatient visit into every quarter. Last, patients, who had their FEV1 measured at the index date, and who had their FEV1 measured within 6 months since the index date, share the similar baseline characteristics, disease progression, and prescription pattern. 4.9.1 Assumption 1 Since all the treatment information in the CFFPR is self-reported, it may not precisely reflect the prescription and treatment that a patient has received. Moreover, the way a healthcare provider asked or collected information may vary by the calendar year or by changes in the questionnaire used. Furthermore, all treatments on which this study 107 focuses are chronic treatments, which are supposed to be used continuously after initialization. However, in the CFFPR, after treatment initiation, there are many ‘not on treatment' or missing responses. Considering that treatment change is a huge component of this analysis, two preliminary exploratory analyses were conducted to test the quality of the data, and to investigate whether the self-reported treatment could be used as a proxy for the prescribing pattern or even refill pattern. I have tested the trends in treatment consistency by calendar year (Appendix A.1), as well as investigated the discordance between self-reported treatment information and refilled information in the CFFPR and the MORE2 claims database (Appendix A.2), independently. I have investigated the trends in treatment consistency in the following manner. Inconsistency was defined as a proportion, the number of visits during which patients did not report using a treatment after the treatment was initiated. In order to compare the result with another exploratory study, I included only those 9,958 patients who were recorded in both the CFFPR and the claims database. Moreover, I focused only on certain lung health maintenance treatments: dornase alfa, ivacaftor, inhaled tobramycin, tobramycin powder inhaler, and inhaled aztreonam. These treatments were chosen because none of them have OTC alternatives, simplifying the identification of medications from the claims database using the National Drug Code (NDC). For each treatment, several results were reported: self-reported records in each calendar year (Appendix A, Table A.1), proportion of patient inconsistency in each calendar year (Appendix A, Table A.2), proportion of patient inconsistency in each calendar year for patients had at least two visits that year (Appendix A, Table A.3), and proportion of patient inconsistency in each calendar year for patients who had at least two visits in any 108 calendar year (Appendix A, Table A.4). The rationale of adding the last two tables is to better estimate the inconsistency of self-reported treatment among target patients who have at least two visits per calendar year and are older than 6 years of age. The consistency test results supported the hypothesis. The more recent a calendar year was, the more consistent the data were, as long as the targeted treatment had been initialized. The quality of self-reported treatment significantly improved from 2004 to 2006 and held steady until 2012. Results in Table A.1.1, especially the results from dornase alfa and inhaled tobramycin, support this conclusion. Both drugs had about a 20% decrease in inconsistency for patients who claimed they were not on the treatment. Compared with dornase alfa and inhaled tobramycin, the other medications had later approval date as well as higher chances of failing to be collected; therefore, the number of patients who took them was not used to support the conclusion. Tables A.1.2, A.1.3, and A.1.4 also support the conclusion but from different perspectives. The stricter the inclusion criteria, the better the results. Tables A.1.2 through Table A.1.4 show that the absolute change in the proportion of inconsistency significantly increased between 2006 and 2011. For example, for inhaled tobramycin, I compared inconsistency before and after 2006. As a result, Table A.1.2 shows an approximate 20% absolute change. The number goes up to around 35% and 60% in Tables A.1.3 and A.1.4, respectively. Although the proportion of inconsistency for inhaled tobramycin was about double compared to that for the dornase alfa, the result makes sense because of the less frequent intake of inhaled tobramycin (every other month). The less frequently a patient took medication, the more missing or not on treatment responses the CFFPR had. Moreover, the distribution of the inconsistency proportion also supported my conclusion. Although 109 the lower quartiles were filled with positive numbers before 2006, they contained no other values but zero after 2006. Similar trends also occurred in other quartiles. Therefore, from the consistency perspective, self-reported treatment in the CFFPR can be applied as the proxy for the prescribing pattern in EMR as long as the self-reporting occurred after 2006. Additionally, the frequency of taking a treatment also affects patient-reported outcomes. For example, a patient may not report that he was on treatment during a break month if he was taking inhaled tobramycin every other month. In the discordance test, multiple rationales lead to discordance in treatment between the CFFPR and the claims database by each individual treatment during each calendar year. One of the rationales is that the claims database does not have claims records for the entire United States. Additionally, insurance sometimes provides limited reimbursement for specific treatments. Therefore, a patient may acquire the treatment through an alternative pathway, such as a patient assistance plan, which bypasses insurance. In this situation, it is no wonder that the claims database cannot capture all claims information for a patient. Without identifying the rationale for discordance, it would be arbitrary to use self-reported treatment as a proxy for refill information to represent adherence patterns. However, given the limited proportion of discordance when a patient does not report a treatment but has the refill information in the claims database, together with the result from the previous analysis, it is warranted to use self-reported treatment as a proxy for prescribing behavior. As mentioned previously, in the discordance test, I focused only on mucolytics, inhaled antibiotics, and CFTR modulators; in these categories are included dornase alfa, tobramycin, tobramycin powder inhaler, aztreonam, colistin, and ivacaftor. Since I had 110 only partial data for lumacaftor that were collected before the last date of 2014, I did not take this treatment into consideration. At the same time, because I focused only on investigating the discordance between the self-reporting and the refills information in the claims database, only those 9,958 patients who existed both in the claims database and the CFFPR were considered. Claims statuses were classified into five categories. Several adjustments according to claims status were made, and all claims with unknown statuses were also kept. Overall, 77,264 claims met my inclusion criteria. About 80% of the refills were for dornase alfa; the specific percentage of each drug is represented in Appendix A, Table A.5. In Appendix A, Table A.6, the overall number of claims in each calendar year is represented. From 2008 to 2013, every year had more than 10% of the total amount of refills between 2000 and 2015. The rest of the years had limited refills. Appendix A, Table A.7, describes the trend of number of refills per patient per calendar year. The trend was straightforward: as time passed, more patients had more annual visits that were covered by insurance. Back in 2000, very few patients had more than 10 claims in a year. The number jumped to about 17 in 2006. The largest overall number of visits in a year came in 2013, with an increase of 29 from 2012. One of the main issues is eliminating the influence of multiple visits during the same hospitalization. As in previous steps, after excluding the multiple-encounter records, I saved either the last encounter date or the one when clinical variables were measured, as the visit date during hospitalization. All demographic characteristics and clinical variables were collected at that date. For comorbidities and treatment variables, as long as they were reported once during the hospitalization, they were captured on that date. After 111 applying the above procedures, 179,078 records were left, which represented 3,736 unique patients who, according to the CFFPR, had refilled one of the targeted medications. In order to investigate the discordance between self-reported treatment in the CFFPR and refill information in the claims database, those two datasets were linked. For each treatment, if the encounter date fell within the range from a refill date to right before the patient finished the treatment, then a treatment possession variable was generated. Otherwise, a missing value was assigned, which means that according to the claims database, the patient did not receive any treatment on that encounter date. In reality, there could be a gap between the date when a patient finished treatment and the date of the next refill. At the same time, patients seldom have perfect adherence. I, therefore, added grace periods to estimate potential dates when patients would run out of treatment. The sum of refill date, supply days, and grace periods represent the last date after which a patient finished treatment. To better mimic reality, different grace period lengths were assigned, ranging from 0, 30, 60, 90 days, to the same length of supply days in the current refill. The hypotheses varied according to the lengths of grace periods. For example, if a grace period equals 0, it means that a gap between the date of finishing treatment and the next refill is unbearable. It also assumes that patients have perfect adherence. Therefore, any date that are not in the range of refill date to the refill date plus the supply days means the patient is "not on treatment." If the grace period equals the supply days, this means the patient has about 50% adherence for this treatment. Only the date that is not in the range from refill date to refill date plus double supply days is defined as "not on treatment" according to the claims database. 112 Other than linking those two datasets using the encounter data in the CFFPR and refills in the claims database, I also generated several variables indicating whether a patient possessed a treatment at a visit given the different lengths of grace periods. Moreover, I excluded the encounter visit for each patient if the encounter date did not fall within the range from the earliest to the latest date of the patient's claims. After this procedure, 71,019 encounters were left. I did not report the proportion of agreement when patients had negative responses both in the CFFPR and the claims database. The proportion of agreement on negative responses would be 100% minus the proportion of disagreement and the proportion of agreement on positive responses. Appendix A, Table A.8, shows that many visits have reported on treatment in the CFFPR but that the overall number of positive responses has been halved in the claims database. This could be explained by the shorter time intervals and fewer records in the claims database compared to the CFFPR; a patient may have switched his insurance a couple of times between 2000 and 2015, which may not be captured by the claims database. Moreover, both the CFFPR and the claims database have limited records for inhaled aztreonam, TOBI® Podhaler, and ivacaftor. This is probably caused by one of the following reasons: the small patient population that qualified for these treatments, the short time period since drug approval, or the tremendous cost of the treatments. Generally speaking, about 75% of all claims records were consistent with the encounter data regardless of the length of the grace periods. The longer a grace period, the more likely a claim matched the encounter records, and the proportion of agreement increased. With the increasing number of visits, when a patient claimed a treatment or 113 refilled a prescription, the proportion of discordance increased. For treatments such as TOBI® Podhaler and ivacaftor, which had limited patients who either was on the treatment or refilled the prescription, the proportion of agreement was about 99%. However, those results could not support our conclusion, since the majority of agreements were contributed by negative responses both in the CFFPR and the claims database. In order to keep missing and "not on treatment" responses from influencing the agreement proportion, I measured the discordance. Specifically, I focused on scenarios in which a patient reported he/she was not on treatment, while the claims database indicated that he/she had refilled the prescription and the supply days of the treatment were sufficient before to last until the encounter visit. The proportion of this specific discordance was really low among all encounter visits for each individual treatment (Appendix A, Table A.8, yellow section). Therefore, I draw the conclusion that patients seldom have a recall bias, or barely intentionally report that they are not on a treatment, which conflicts with reality. To better investigate the discordance between self-reported treatment in the CFFPR and the refill records in the claims database, I have conducted several analyses for each individual treatment, measuring different outcomes. Those outcomes include: the number of claims that matched the encounter data when the patient reported on the treatment in the CFFPR (Appendix A, Table A.9), the proportion of discordance by calendar year (Appendix A, Table A.10), and the proportion of discordance by individual patient and calendar year (Appendix A, Table A.11). Since each treatment has four tables and the trend was similar, I just used aztreonam to illustrate the results. Generally speaking, the results of this section support the previous conclusions. With increasing 114 grace periods, the proportion of discordance decreased first, then it increased after hitting bottom. After the discordance achieved the lowest proportion with a specific grace period, any additional grace period could only link extra claims data with encounter data. However, the majority of the time, the patient was not really on treatment. The minimum proportion of discordance could be achieved with different grace periods in each individual calendar year. For example, the minimum discordance was achieved with a 30day grace period in 2010, while the same minimum was achieved with a 90-day grace period from 2011 to 2014. There is only one issue we need to be aware of. For any medication approved between 2003 and 2013, the trend of discordance was close to 0 at 1 or 2 years before the approval and kept increasing until 1 or 2 years after approval. After that, discordance was fixed if the information was collected appropriately. This phenomenon is presented in Appendix A, Tables A.10 and A.11. Before 2010, when aztreonam got FDA approval, the discordance rate was low. Starting with 2010, the proportion of discordance went up and mostly held steady beginning in 2012. This phenomenon disclosed the rationale behind the fake low value of discordance: no patient could get the treatment a couple of years before the drug was approved since the medication did not exist in the market. The discordance should therefore be 0, which means perfect agreement. However, during that time period, some patients may have accessed the drug through RCTs. In that situation, utilization was captured by encounter data in the CFFPR but not in the claims database since the NDC code was not available and there was no related reimbursement. So, the proportion of discordance started to increase. Since only a limited number of patients were in the trial, the discordance rate should still have been low; the majority of patients 115 could not get treatment, which is reflected in both the CFFPR and the claims database. After the drug was approved, many patients could access the medication. The discordance rate, therefore, increased dramatically since some patients were reimbursed through patient support groups or other insurance companies. Those patients' information was captured not by the claims database but by the CFFPR. After 1 or 2 years, the patient population on this treatment was fixed, and the discordance rate would, therefore, stay the same from then on. In order to eliminate the influence of the above issues, I measured the proportion of discordance resulting from patients who reported in the CFFPR that they were "not on treatment" but who had refill information in the claims database during relative time intervals. The relative time interval is defined as the range from refill date to the sum of refill date, days supplied, and the grace period, which also covers the encounter date. The results indicate that the proportion of discordance, specifically for this situation, is relatively low, representing less than 10% of overall discordance (Appendix A, Table A.12). As can be seen, the discordance of treatment reported by the CFFPR and the claims database varied in terms of treatments and calendar years. It would be arbitrary to use self-reported treatment as a proxy for refill information in representing an adherence pattern. Given the limited proportion of discordance when a patient does not report on treatment but has refill information, together with the results from the first analysis, selfreporting could be applied as a proxy of prescribing behavior. 4.9.2 Assumption 2 116 After the efficacy of a new drug is demonstrated and published, or is approved, there is an increasing trend of prescribing it immediately. As mentioned previously, prescribing behaviors are complicated and could be determined by both internal and external factors. However, the belief that "newer is better" may still be able to affect physicians' prescribing behaviors both internally and externally, especially for a disease, like cystic fibrosis, which has limited tools in a healthcare providers' arsenal to treat the patient. But, if the patient is not severe enough to be qualified for the treatment, or the current treatment works better than the new treatment, after a short period of time using the new treatment, the patient may switch back to the previous treatment. If the belief affects the prescribing behavior in the above manners, tons of treatment changes should occur after a new treatment has its efficacy demonstrated and published or received approval. A majority of these changes are potentially irrational treatment changes given the patients' disease severity. I identify those important dates as "the composite date" or "the approval date", which represents either when a new treatment efficacy is demonstrated and published or when a new treatment is approved, and use the composite date or the approval date interchangeably to represent those two scenarios. Therefore, the drug approval date indicates the date when evidence was generated either through published articles or through drug approval in my dissertation. In order to investigate the association between a new treatment approval date and related irrational treatment changes, another descriptive exploratory analysis was conducted to investigate the difference of treatment changes in the range of 1, 2, or 3 years before and after the drug approval date. As mentioned previously in the study 117 design and population section, different methods were applied to adjust the influence of approval date of the treatment on irrational treatment changes, according to the degree of the influence. In order to achieve the above goal, the following methods were applied independently or together: narrowing the time interval to avoid the inclusion of approval date for all treatments; rigorously identifying rational treatment changes by excluding irrational treatment changes, specifically on ‘stop prescribing one/multiple treatments' when it conflicts with clinical variables; and including the drug approval date in the predictive and regression model. The results showed that the time of drug approval, or when publication demonstrated the efficacy of a new treatment, did affect physicians' prescribing behaviors, without untangling other confounders. Table B.2 in Appendix B supports this conclusion. Using tobramycin as an example, the number of related treatment change was almost fixed, 0.5 times per year, regardless of the time length before the approval date. However, it increased considerably after the approval date. The influence lasted around 1 year, which varied by treatment, and had more impact if no alternative treatment existed. However, there was no significant result that indicates to what extent the date affects irrational treatment change. Even so, the impact of drug approval on irrational treatment change should merely bias the result of those three aims. The exploratory analysis will be explained in the following paragraphs. The whole CFFPR cohort was applied to conduct this exploratory analysis. Among 1,217,848 records in the cohort, 124,447 reflect PEx-caused hospitalization. Since the multiple visits during a hospitalization were irrelevant to this exploratory analysis, they were excluded, and only the last date of the hospitalization or the last measured date during the care episode, whichever occurred later in recorded outpatient visits, was kept as the date 118 of the outpatient visit when the patient's lung function was cured after the hospitalization. After applying the above procedures, only 58,421 records were left. Overall, 1,151,822 records existed in the claims database after including only the last date of hospitalization. If lung function was measured on the last date of a care episode and this occurred earlier than the last date of the hospitalization, then that value was treated as lung function after PEx had been cured. If there was no measurement at the last date of a care episode, then the last observed lung function value at the last date of hospitalization was used. If neither of the above two scenarios were met, the mean lung function value during the next 6 months, when the disease was stabilized, would be applied. Otherwise, a missing value would be given; imputations were conducted to handle these missing values. A stable situation is defined by composite signals that include 1) not having a PEx-caused hospitalization, 2) a relative decrease of no more than 10% in the predicted FEV1% for patients with moderately or severely impaired lung function, and 3) not having moderate or severe exacerbation during an outpatient visit. Only 2,160 records had unique patient and encounter date combinations, which represented 1,770 patients who had stabilized situations within 6 months after the PEx and who also had their lung function measured during that time. The lung function records for these patients were considered to reflect cured lung function after PEx-caused hospitalization. For missing FEV1 and height values, the last observed value was carried forward. Some FEV1 values were still missing after the adjustment, since they occurred in advance of the records reflecting lung function measurement. I, therefore, excluded all FEV1 values that were missing at the beginning and were not caused by PEx. With this adjustment, only 889,081 records were left. All races recorded as "other" were treated as 119 Caucasian. Additionally, following ATS guidelines, the predicted FEV1 was adjusted for Asians, since they have 88% of the lung function compared with Caucasians when other variables are constant. A 1-year assumption (1.25 years) and a 6-month assumption were independently applied to the latest encounter that occurred before 01/01/2006, and the oldest encounter that occurred after 01/01/2006 to handle the misidentified and missing prescription issue. This study focuses on investigating the treatment effects on treatment class level; therefore, only treatment changes on class level have been captured. Treatments were categorized within four categories: airway clearance, inhaled antibiotics, antiinflammatories, and bronchodilators. Both dornase alfa and hypertonic saline were included in the airway clearance group. Three inhaled antibiotics, tobramycin, aztreonam, and colistin, were considered. Two medications, high-concentrate ibuprofen and azithromycin, belong to the anti-inflammatory group. Beta agonist and anticholinergics fit in the bronchodilator group. For each medication, the approval date or the composite date was determined by three components: 1) the date when the prospective RCT for the medication had demonstrated efficacy and was published, 2) the approval date of the medication in United States, and 3) the earliest date when the medication was reported in the database. If a treatment had both a published date and an approval date, then the earliest one was used. If a treatment did not have an approval date or it was difficult to identify the approval date, then the date when the medication was initially reported in the CFFPR, plus a 3-month grace period, was applied (Appendix B, Table B.1). To better investigate the potential influence that a treatment approval could have on related treatment changes, all treatment changes that occurred within 1, 2, or 3 years before and after the date were compared. 120 For all patients who had records both before and after the drug approval date, the following values were measured for each patient before and after the drug approval date: the number of encounter visits per year, the number of visits with treatment changes between treatment classes, the number of visits with treatment changes between treatment classes and that included the targeted treatment, and the mean length of time from the last visit until a change that related to the targeted treatment. I have captured the treatment changes between the treatment class levels, and all the detailed information is listed in Appendix B, Table B.2, which reflects the following trends. First of all, the results indicate that the right year was chosen, since the number of treatment changes, related to the targeted treatment rarely increased regardless of the length of time before the drug was approved. For example, no matter whether I chose 1, 2, or 3 years before tobramycin was approved, the number of treatment changes relative to tobramycin varied from only 0.47 to 0.5 times per patient. Moreover, patients tended to have more visits as time passed. As the oldest approved treatment, dornase alfa had the least number of visits (around 4.9), while azithromycin and hypertonic saline, as the latest medications, had a much greater number of visits per patient year (around 6.2) after they were approved. Last, the date-approval date, published date, or date when the first patient reported on a specific treatment-did affect the treatment change. It was supported by the results that the number of treatment changes relative to each targeted treatment reached a peak in the first year, decreased thereafter, and was almost fixed since the second year after the drug was approved. Consider tobramycin as an example: after the drug approval, there were about 0.37, 0.28, and 0.32 treatment changes in the 121 first, second, and third years, respectively, after it was newly approved. Specifically, when I reported the number of changes and the number of changes relative to the targeted treatment, I reported only by patient, not by calendar year. Therefore, the real numbers for the second and third calendar years would be close to the difference between 2 adjacent years based on the analysis I have conducted. However, because the sample size varied each year, this result is just an approximation. Moreover, with the limited differences between each year and the general trends of decrease in the number of treatment changes relative to the targeted treatment, it is arbitrary to draw the conclusion that the drug approval date, actually the composite date, is definitely associated with irrational treatment change. Even though the increase of number of treatment changes relative to the targeted treatment in the first year after medication approval is much larger for the three remaining medications, the results are clouded for the following reasons. First, there was no drug officially approved for treat CF specifically before dornase alfa. When it received approval, obviously many patients switched to this medication. Moreover, the date identifications were artificial for azithromycin, and hypertonic saline, respectively, which may influence the number of changes before the relative date and amplify the influence of drug approval on treatment change. At the same time, the quality of the data improved greatly after 2006, so the dates for azithromycin and hypertonic saline, which were defined around the beginning of the 2006 may influence the result. Finally, with the existence of patients who have extremely infrequent routine visits, it is hard to differentiate rational from irrational treatment changes using mean length from the previous visit to the visit in which there was a targeted treatment-related change. To summarize, physicians' prescribing behavior is affected by the date when a drug is 122 approved or when a publication demonstrates the efficacy of a new treatment. This influence on prescribing behavior could last about 1 year, which is varied by treatment and could have more impact if no alternative treatment exists. No significant results show to what extent the date could affect irrational treatment change. Fortunately, inhaled aztreonam was the only medication that received approval between 2006 and 2011, and it was always prescribed after initialization of inhaled tobramycin. Therefore, the additional irrational treatment changes caused by the approval of inhaled aztreonam would merely bias the results. 4.9.3 Assumption 3 For my dissertation, it is appropriate to reformat patient routine visits quarterly. According to clinical practice, CF patients should have a routine visit at least every quarter when all the clinical variables, such as FEV1 and FVC, are measured. The data in the CFFPR support the phenomenon that, on average, patients had a routine visit every 3 months. At the same time, an exploratory analysis (Appendix D), which investigated the relationship between frequency of encounter visit and lung function deterioration, also supports this assumption. To simplify the analysis, the focus was on investigating the influence of number of visits per year on annual proportion of lung function deterioration, conditional on each patient's demographic characteristics, comorbidities, and treatmentrelated variables at the first and the last year when patient existed in the CFFPR. The transitions of demographic characteristics, comorbidities, treatment/pathogen-related variables were not considered. The CFFPR was applied in investigating both assumptions 2 and 3. The data- 123 cleaning procedures were also similar between these two assumptions. However, unlike assumption 2, which took the full time frame into consideration, assumption 3 included existing records only from 2006 to 2011. After excluding multiple visits that occurred during the PEx-caused hospitalization and keeping only the last date of hospitalization or the last measured date during the care episode, the number of observations dropped from 31,130 to 14,646. If lung function was measured on the last date of a care episode that occurred earlier than the last date of hospitalization, then it was used as the lung function after PEx was cured. If there was no measurement on the last date of the care episode, then the last observed lung function value on the last date of hospitalization would be used. If neither of the above two scenarios was met or the FEV1 value was still missing, then the mean lung function value during the next 6 months, when the disease was stabilized, would be applied. Otherwise, missing values would be given and future imputation would handle these missing values. Finally, 324,815 observations exist in the database after including only the last date of hospitalization. At the same time, I also tried to eliminate multiple encounter records from the same hospitalization, which was caused by reasons other than pulmonary exacerbation. Basically, the procedure was similar to excluding the multiple visits during the PExcaused hospitalization. However, the mean lung function in the 6 months after the hospitalization was not calculated to represent the recovered lung function, even if the patient's disease was stable. Finally, lung function was adjusted for 170, 532, and 1,562 records, using the relative measurement from the last date of care episode, last encounter date, and missing record, respectively. To better estimate the assumption, records that inappropriately captured treatment 124 information have to be adjusted. For example, all inhaled antibiotics could be taken continuously, every other month, or at another frequency. In the CFFPR, some patients reported not being on inhaled antibiotics; in reality, they were on treatment, but the date of visits occasionally happened during the gap or break month. In order to better capture treatment changes, those reports of "not on treatment" were revised to "on the relative treatment". Previously, there were 60,432; 5,295; and 4,381 records on inhaled tobramycin, inhaled colistin, and inhaled aztreonam, respectively; after the adjustment, the numbers went up to 62,697; 5,589; and 4,733, respectively-an average increase of 5%. All in all, the number of visits was related to the proportion of lung function deterioration-the more visits a patient had on average, the greater a reduction in lung function he may suffer-but the contribution was pretty limited. If only considering patients that existed in the CFFPR for at least 1 year, for each additional outpatient visit that a patient had in a calendar year, his predicted FEV1% relatively decreased an additional 0.1% per year (Appendix D, Table D.1). For those patients who have existed in the CFFPR for more than 2 years, the impact relatively decreased to 0.08% per year (Appendix D, Table D.2). Considering the inclusion criteria for the core aims in this study required that the patient be present within the CFFPR for at least 2 years, the relation between the frequency of visit and lung function decline should be trivial. Treatments had far more impact on the change of lung function than the number of visits. At the first year when patient enrolled in the CFFPR, the less treatment a patient received, the less lung function deterioration he may suffer during the following years. This is especially the case for mucolytics and anti-inflammatories. Using the model, 125 where patients had more than 2 years follow-up, compared to patients who were on two anti-inflammatories, patients who did not receive any anti-inflammatory in the first year had about 2.6% less lung function deterioration in the future. At the last year, the effects of treatments were reversed from the first year. However, the majority of treatment effects were not statistical significant, and had much smaller impacts compared with related treatment effects in the first year (Appendix D, Table D.2). Therefore, even though there is a relationship between number of visits and lung function deterioration, it is reasonable to reformat the database as each patient has a fixed number of encounter visits per calendar year, given the trivial contribution to lung function deterioration. Together with the experience of clinical practice and result that on average patients have about 4.7 visits per calendar year in the CFFPR, the decision to reformat encounter visit records as occurs quarterly to standardize the capture of lung function change between two routine visits is definitely reasonable. 4.9.4 Assumption 4 The accuracy of prediction decreased dramatically for those patients with consecutively missing FEV1s, especially if the consecutive missing occurred after the index date. To explore the influence on baseline variables and outcomes of different methods of defining the index date, the following study has been conducted. All results in Appendix E, Tables E.1, E.2, E.3, and E.4 were summarized using the index date and the date when FEV1 was initially measured after the index date, respectively. The majority of results were consistent between the two tables, but there were some discrepancies. Generally speaking, the later FEV1 was first measured, the worse the patient's clinical 126 status was and the more likely he/she was on treatments. Compared with using the predetermined index date directly, if the date when FEV1 was first measured (Table E.2) was applied as the new index date, there were more treatment utilizations and shortened lengths of time to event (mucoid PaPI, disenrollment, or death) in the cohort. Therefore, the following four methods were proposed to handle missing FEV1s at the index date: 1) excluding all patients whose FEV1 did not measure at current index dates; 2) excluding all patients who had more than a specific grace period-for example, a 1-year gap between the index date and when FEV1 was first measured-and using the first measured date as the index date for the remaining patients; 3) using the first measured date after 2006 as the index date for the whole population; and 4) excluding all patients who had more than a specific grace period-for example, a 1-year gap between the index date and when FEV1 was first measured-and using the current index date and imputing missing FEV1 values for the remaining patients. The following results supported the second method with a 6-month grace period as the optimal way of identifying the new index date for Aim 2 and 3. The first encounter date after 01/01/2006 for patients older than six was identified as the index date for Aim 1. The decision seemed appropriate, until the missing values of lung function were imputed. Rather the missing value of FEV1 being imputed directly, multiple imputations were applied to calculate the change of FEV1 values between the current and future visit. The imputing strategy worked well for missing values that occurred independently. As long as either previous or future FEV1 was available, together with the imputed change of FEV1, the missing value of FEV1 at a current visit was imputable. However, with the increased number of consecutive missing values, the 127 accuracy decreased dramatically. This is because each calculation had an error that accumulated with the number of calculations needed for imputing FEV1. Without appropriate control, those accumulated errors would jeopardize the imputation. Fortunately, if the consecutive missing values occurred between two existing FEV1s, the error could still be adjusted by calculating those missing values from two directions: forward from the earliest measured FEV1 or backward from the latest measured FEV1. However, the accuracy of imputation would decrease tremendously if FEV1 were measured at only one end, especially at the latest visit. Actually, 769 patients did not have FEV1 measured at the index date using the current method. Therefore, the influence of using different methods to define the index date on baseline variables was investigated to examine whether patients who had FEV1 measured later are significantly different from patients who had it measured earlier (at the index date). Results were listed in two tables. Appendix E, Tables E.1 and E.2, represent the baseline information from using the current method to identify the index date for patients older than 6 after 01/01/2006. However, in Appendix E, Tables E.3 and E.4, the baseline information is reported on the date when FEV1 was measured, but not index date. In the following section, the "index date" specifically indicates the index date that was defined by the first method. To better investigate the results of using different methods to identify the index date, the cohort was categorized into four groups according to the gap between the index date and when FEV1 was first measured after the index date. Group 1 included patients whose FEV1 was measured at the index date. The gaps in groups 2, 3, and 4 were 0-6 months, 6-12 months, and more than a year in length, respectively. In Table E.2, patients were classified into the same groups as Table E.1 even though there was no 128 gap in Table E.2. Cells were highlighted in yellow as long as they were significantly statistically different for either the chi-square test or ANOVA test. If the number was different between Table E.1 and E.2, that result was marked in red. In Appendix E, Tables E.1 and E.2, generally speaking, the age distributions are different, and younger patients are prone to have later FEV1 measurements. The trend is also reflected in the height and weight section; the later FEV1 was first measured, the shorter and lighter a patient was (Appendix E, Table E.1) as long as they were younger than 14 years old. Hispanics tended to have later FEV1 measurement. Compared with patients who had their FEV1 measured at the index date, patients who had later measurement were more likely to have GERD, to have PEx in the previous year, and to take mycolytics and inhaled antibiotics. Patients who had later measurement were also more likely to have drug resistance, but considering the limited number of events, that result is not stable. The proportion of patients who would develop mucuoid PaPI or who would disenroll was not statistically different from other groups. But patients who did not have FEV1 measured for more than 1 year were more likely to die. Finally, time to disenrollment was significantly different among those groups. On Appendix E, Tables E.3 and E.4, with the delay in choosing the index date, the results of height and weight are larger than on Tables E.1 and E.2. Even so, Tables E.3 and E.4 show results and trends similar to those on Tables E.1 and E.2. However, some variables had significantly different results; for example, the later a patient had FEV1 measured, the worse lung function he had, the more likely he had a lung transplant or be on the waiting list, the more likely he was on anti-inflammatories and bronchodilators, and the quicker he would develop mucoid PaPI. 129 The difference of results between Tables E.1, E.2, and Tables E.3, E.4 were probably caused by the delay in choosing the index date. But three huge issues should not be ignored. First, lung function was significantly statistically different among those groups represented in Tables E.3 and E.4. Differences in height and weight may contribute to the difference in lung function, but it does not account for all differences. Therefore, patients in different groups may be fundamentally different even if their lung functions were counterfactually measured on the index date in Tables E.1 and E.2. Moreover, with the delay in choosing the index date, patients were more likely on treatments, which directly affected the exposure, treatment change, of the Aim 3. Finally, the delay in choosing the index date affected the time frame of developing the outcome (mucoid PaPI, death, disenrollment); the time frame change of developing the outcome has already been shown in the comparison between Tables E.1, E.2, and Tables E.3, E.4. Moreover, the time interval between the last FEV1 measurement and the index date was also investigated, as illustrated in Appendix E, Table E.5. Among the 796 patients who did not have FEV1 measured on the index date, 396 had FEV1 measured before the index date. Furthermore, 320 patients had their FEV1s measured within 6 months before the index date. Therefore, a prudent way of identifying the index date was needed, a way that would balance reliability and accuracy in handling those missing FEV1s. As mentioned previously, four methods were proposed. Each one had a unique rationale. For the first method, the assumption was that the physician should make a decision according to clinical variables, especially lung function. Prescribing decisions were different compared to targeted prescribing decisions if patients did not measure their FEV1 at the 130 index date. Rather than excluding all patients who did not measure FEV1 at the index date, the second method gave a grace period of probably 1 year before FEV1 was measured. The second method assumed that the failure of measuring FEV1 was caused by physicians' belief that the accuracy of measurement is low for younger patients. It also assumed that the prescribing decision after FEV1 is measured is equivalent regardless of the gap between the index date and when FEV1 was first measured. However, if the gap was longer than 1 year, it assumed that prescribing decisions were different for the remaining population. Unlike the first or second method, the third method just defined the index date as the date when FEV1 was first measured, ignoring the effect of the gap. The third method assumed that prescribing decisions were exactly the same as long as FEV1 was measured. The fourth method applied a statistical approach regardless of the different rationales. The second method was applied to define the index date, which was supported by a few reasons. First, by using this method, the chance of selection bias was much lower than using the first method, which directly excluded about one-sixth of all patients. At the same time, this method took the gap into consideration, preventing information bias. Since some patients had late FEV1 measurements, the time to outcome would be shortened if the first measured date was applied as the index date. Finally, compared with the fourth method, the second method emphasized rationale rather than relying on the power of statistics. The decision to accept a gap of at most 1 year was based on the huge change of time to death and time to mucoid PaPI shown in Appendix E, Tables E.3 and E.4, compared with Tables E.1 and E.2. Other grace periods with shorter lengths may 131 probably be applicable. However, even if the grace period were minimized to 6 months, the probability of receiving treatments at baseline was still significantly different between the same groups in Tables E.1, E.2 and Tables E.3, E.4. Even so, the 6-month grace period for the second method was more reasonable, which balanced the consistency of the population's baseline characteristics with generalizability. More importantly, by applying the 6-month grace period for the second method, at least 586 out of 796 patients in the cohort would be kept. Table 4.1. Definition of treatment change 132 Table 4.2. Variables that will be analyzed in the study Demographic Clinical variable Comorbidities Age CFRD FEV1% (ΔFEV1%) Gender # of PEx in previous year Pancreatic insufficiency Race Gastrointestinal symptoms Ethnicity Asthma Height Liver disease Weight Weight for age Z score Lung Transplant status Smoking status Second hand smoke status Pregnancy CFTR genotype Treatment/pathogen relative variables Previous tx patterns/combinations Tx change and time of change in the last 1 year Time and result of culture test for aiway infection (other pathogens) Drug resistance 133 Table 4.3 Reformatting the demographic characteristics. Variable Name eDWID Encounterdate DOB Deathdate Numeric Numeric Numeric Numeric Meaning Unique patient ID (encrypted) Encounter date Date of birth Date of death Age Agecat Numeric Numeric Age in category Sex Dichotomous FALSE "" Numeric Numeric Categorical 1 Height Weight Race Type Code 2 3 4 5 6 Male Female Height in cm Weight in kg White race Black or African American American Indian or Alaska Native Asian Native Hawaiian or Other Pacific Islander Others Reformat code Meaning Dataset Demographic data Encounter data Demographic data Demographic data (encounterdateDOB)/365.25 1 2 3 4 5 1 0 Age in years 6~8 yrs 9~11 yrs 12~14 yrs 15~17yrs >=18 yrs male female 1 Caucasian 2 Black 3 3 Asian Asian 3 4 Asian Others Encounter data Demographic data Encounter data Encounter data Demographic data 134 Table 4.3 (continued). Variable Name Hispanic Type Categorical Smoking** Categorical 4 5 U W 0 1 N U W Meaning Hispanic Non hispanic No Occasionally Yes, Regularly, less than 1 ppd Yes, Regularly, 1 ppd or more Declined to answer Not Known Not Applicable No Yes No Not Known Not Applicable 1 2 3 4 5 U N Live birth Still birth Spontaneous abortion Therpeutic abortion Undelivered Unknown Not Applicable 1 2 1 2 3 Pregnant Categorical Pregnancy_outco me Categorical Code 1 0 0 1 Reformat code Yes No No Yes Meaning 1 Yes 1 2 2 2 0 1 0 2 2 Yes Unknown Unknown Unknown No Yes No Unknown Unknown 1 2 3 4 5 6 0 Live birth Still birth Spontaneous abortion Therpeutic abortion Undelivered Unknown Not pregnant Dataset Demographic data Annual data Annual data Annual data 135 Table 4.3 (continued). Variable Name Type Transplant status Categorical 1 2 3 4 5 Mutation 1 Categorical 0 1 2 3 4 5 . Mutation 2 Categorical 0 1 2 3 4 5 . Code Meaning Not pertinent Accepted, on waiting list Evaluated, final decision pending Evaluated, rejected Had transplantation mutation doesn't belong to any class Class I Class II Class III Class IV Class V missing mutation doesn't belong to any class Class I Class II Class III Class IV Class V missing 0 2 0 0 1 6 1 2 3 4 5 9 6 1 2 3 4 5 9 Reformat code Meaning No Will have transplantation Dataset Annual data No No Had transplantation mutation doesn't belong to any class Class I Class II Class III Class IV Class V missing mutation doesn't belong to any class Class I Class II Class III Class IV Class V missing 136 Table 4.3 (continued). Variable Name Mutclass Type Code Categorical Meaning Reformat code 0 1 F508 Categorical 1 2 3 . 2 1 2 3 4 Meaning mutation class I-III group (both) mutation class IV/V group (any) genotyped but not identified in mutation class I-III or IV-V group or misisng (any) homozygots heterozygots none missing Dataset 137 homozygots heterozygots none missing Respiratory/cardiorespir Death Categorical 2 atory Liver disease/liver 3 failure 4 Trauma 5 Suicide Transplant related: 6 Bronchiolitis obliterans Transplant related: Other 7 8 Other 9 Unknown * For patients who have multiple races, the one with the lowest reference lung function will be applied. Generally speaking, Asian has worse lung function than Black and Caucasian ** Second smoke has same coding system Table 4.4 Reformatting the treatment information. Variable Name Type Tobi Dichotomous Tobifreq Categorical Code 1 2 3 Amino_other Aminofreq Colistin Colistinfreq Categorical 1 2 3 Dichotomous Categorical 1 2 Dichotomous Aztreonamfreq Categorical Azith Clarith Dornasealfa Dichotomous Dichotomous Dichotomous 2 3 4 Meaning Dataset Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data 138 Aztreonam 3 Meaning Reformat code On inhaled tobramycin 300 mg BID alternate month schedule 300 mg BID continuous Other regimen (different dose or freq) Other inhaled aminoglycoside (e.g. gentamicin, amikacin) Alternate Month Continuous Other regimen (different dose or freq) On colistin Alternate Month Continuous Other regimen (different dose or freq) On inhaled aztreonam 75 mg TID Alternate Month Schedule 75 mg TID Continuous Other regimen On azithromycin On clarithromycin On dornase alfa Table 4.4 (continued). Variable Name Type Dornasefreq Categorical High_ibuprofen Hypersaline Hyperconc 1 2 3 Combobroncho Corticosteroids1 Dichotomous Dichotomous Corticosteroids2 Dichotomous Hyperfreq Reformat code Meaning 2.5 mg QD 2.5 mg BID Other regimen (different dose or freq) On high-dose ibuprofen On hypertonic saline The concentration is 3% The concentration is 4% The concentration is 5% The concentration is 6% The concentration is 7% The concentration is 8% The concentration is 9% The concentration is 10% QD BID other Short acting beta agonist Long acting beta agonist Short acting Long acting Combination beta agonist and anticholinergic Oral (e.g. prednisone) Inhaled (e.g. fluticasone, Flovent, budesonide) Meaning Dataset Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data 139 Saba1 Laba Anticholinergics Anticholinergicl Dichotomous Dichotomous Categorical 1 2 3 4 5 6 7 8 Categorical 1 2 3 Dichotomous Dichotomous Dichotomous Dichotomous Code Table 4.4 (continued). Variable Name Corticosteroids3 Enzymes Antifungals Type Dichotomous Dichotomous Dichotomous Code Meaning Reformat code Inhaled combination wth bronchodilator (e.g. Advair) On any enzymes On any antifungals Beta agonist 0 1 Anticholinergic 0 1 Categorical 0 1 2 IA Categorical 0 1 2 3 Not on any beta agonist, includes saba, laba, combobroncho, and inhaled combination of corticosteroids with BD Used beta agonist Not on any anticholinergic, includes short acting, long acting anticholinergic and inhaled combination of corticosteroids with BD Used anticholinergic Not on any AC, includes dornase alfa and hypertonic saline Used 1 AC Used 2 AC Not on any IA, includes tobi, aztreonam, and colistin Used 1 IA Used 2 IA Used 3 IA Dataset Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data 140 AC Meaning Table 4.4 (continued). Variable Name Type Code Meaning Reformat code AI Categorical 0 1 2 BD Categorical 0 1 2 Meaning Dataset Not on any AI, includes high dose ibuprofen and azithromycin Encounter data Used 1 AI Used 2 AI Not on any BD, beta agonist and anticholinergic Encounter data Used 1 BD Used 2 BD 141 Table 4.5 Reformatting the clinical variables and comorbidities. Type Variable Name fvc Continuous fev1 Continuous fev1p fev1pcat Categorical apesassess Categorical ce_reasons1 Dichotomous ce_reasons2 Dichotomous ce_reasons3 Dichotomous ce_reasons4 ce_reasons5 Dichotomous Dichotomous ce_reasons6 ce_reasons7 ce_reasons8 Dichotomous Dichotomous Dichotomous Code Meaning fev1 percent predicted fev1 percent predicted in category 1 2 3 4 Absent PEx (assessed) Mild PEx (assessed) Moderate PEx (assessed) Severe PEx (assessed) # days/nights with reason pulmonary exacerbation # nights with reason pulmonary complication # nights with reason GI complications # nights with reason transplant related # nights with reason sinus infection # nights with reason non-transplant surgery # nights with reason other # nights with reason unknown Reformat code 1 2 3 4 Meaning Using NHANES equation to calculate the reference fev1p>70% 40<fev1p<=70% 10<fev1p<=40% <=10% Dataset Encounter data Encounter data Encounter data Care episode data Care episode data Care episode data Care episode data Care episode data Care episode data Care episode data Care episode data 142 Table 4.5 (continued). Variable Name Type allerdornase Dichotomous allertobi Dichotomous allercolistin Dichotomous allermacro Dichotomous allerhighibu Dichotomous allerhyper Dichotomous alleraztreonam arthro Dichotomous Dichotomous abpa Dichotomous rlresaminoglycosid es Categorical rlresbetalactams Dichotomous Code Meaning Has drug intolerance/allergies for dornase alfa Has drug intolerance/allergies for tobramycin Has drug intolerance/allergies for colistin Has drug intolerance/allergies for macrolide antibiotics Has drug intolerance/allergies for high-dose ibuprofen Has drug intolerance/allergies for hypertonic saline Has drug intolerance/allergies for aztreonam Arthritis/Arthropathy Allergic Bronchial Pulmonary Aspergillosis (ABPA) Reformat code Meaning Dataset Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data Encounter data 1 Resistant 0 No 2 Testing Not done Encounter data Resistant to All Beta Lactams Tested 1 (e.g., ceftazidime, imipenem) 2 No 3 Testing Not done 1 Resistant 0 No 2 Testing Not done Encounter data 143 Resistant to All Aminoglycosides 1 Tested (e.g., tobramycin, gentamicin) 2 No 3 Testing Not done Table 4.5 (continued). Variable Name Type rlresquinolones Dichotomous CFRD_status Categorical Code 1 2 3 0 2 3 DIOS Dichotomous GERD hemopt paninsuf Dichotomous Dichotomous Dichotomous pancreatitis ptx Dichotomous Dichotomous PEx Categorical Pexloose Categorical Meaning Resistant to All Quinolones Tested (e.g., ciprofloxacin, levofloxacin) No Testing Not done No CF related diabetes Impaired Glucose Tolerance (FBG < 126, 2-h PG 140-199) CFRD with or without fasting hyperglycemia Distal intestinal obstruction syndrome (DIOS, Meconium ileus equiv.) GERD (Gastro-Esophageal Reflux Disease) Hemoptysis, massive Pancreatic insufficiency Pancreatitis (defined by mutation class<=3) Pneumothorax Number of PEx in the previous one year (ce_reasons1>0) Number of PEx in the previous one year with loose definition (ce_reasons1>0 or apesassess>=3) Reformat code 1 0 2 0 Meaning Resistant No Testing Not done No CFRD Dataset Encounter data Encounter data 0 No CFRD 1 CFRD Encounter data Encounter data Encounter data Encounter data Diagnosis data Encounter data Encounter data Encounter data 144 Table 4.6 Reformatting the infections. 145 Figure 4.1: DAG for hypothetical causation between treatment changes and time to mucoid PaPI (after adjusting with the minimal 146 sufficient adjustment sets). Figure 4.2 Data reformatting to have quarterly routine visits for a patient. 147 CHAPTER 5 TREATMENT CHANGE PATTERN 5.1 Data Management The core data were collected from the encounter dataset in CFFPR spanning the years from 1988 to the end of 2011. The demographic dataset, care episode dataset, and annual dataset were also used to build the cohort. Overall there were 44,541 patients in the cohort. Visits that had only a bacteria culture test on Feb 15th of that year, but were lacking any other information, were treated as artificial data, since that's the last date of each annual report. So, a visit was excluded if all variables that related to encounter date, diagnosis, clinical variables, and prescriptions were missing. After excluding duplicates and all culture tests that were tested before birth, there were 2,371,532 visits left in both the drug resistance dataset and the encounter dataset. After linking those two datasets, the linked dataset was built. 5.1.1 Assumption on Unclear Pseudomonas aeruginosa Culture Test Results As shown in Figure 5.1, the results of the culture test are inconsistent. The inconsistency was generated from two levels: inconsistency between phenotype results and culture test results, and inconsistent results of phenotypes in the same visit. For 149 example, there were 90,119 visits with a diagnosis of Pseudomonas aeruginosa Pulmonary Infection (PaPI), but which failed to report phenotype-mucoid, nonmucoid, or unknown. Overall, 121,179 visits reported the phenotype as mucoid and nonmucoid PaPI at the same time. Therefore, two assumptions had to be made to adjust for each individual issue. For all positive culture test of pseudomonas aeruginosa that failed to report the phenotype, as long as it occurred before diagnosis with mucoid PaPI, it was assumed as nonmucoid. Otherwise, it was treated as mucoid PaPI. Moreover, whenever the phenotype test results were conflicted in the same visit, they were adjusted under the following order: mucoid PaPI, nonmucoid PaPI, unknown PaPI. For example, if a patient was diagnosed with a contradictory result, having both mucoid PaPI and nonmucoid PaPI, the phenotype result would be adjusted as mucoid. Overall, there were 263,254, 199,740, and 24,630 visits that were tested with mucoid, nonmucoid, and unknown phenotype of pseudomonas aeruginosa results independently. At the same time, 847 patients, whose phenotype remained unknown in the culture results during all encounter visits, were excluded. Another 2,417 patients who received positive mucoid PaPI test results earlier than positive nonmucoid PaPI results were also excluded, since the results either conflicted with the real world situation, or indicated that those tests were conducted too late to identify the true date of developing mucoid PaPI. After excluding them, 41,274 unique patients existed in the dataset. 5.1.2 Assumptions on Race to Predict Normal Lung Function 150 Given Other Demographic Characteristics In order to capture all information in the dataset, the linked dataset was further linked to a demographic dataset and a death dataset. If only Caucasians and Blacks were considered, it would affect the calculation of normal FEV1. Moreover, ATS guidelines gave an adjustment for Asians, so the original six categories of race were combined into four: Caucasian, Black, Asian, and others. When looking at patients who claimed to be of multiple races, and considering that Caucasians and Asians would have the highest and lowest predicted normal FEV1, respectively (given identical other demographic characteristics), the race category with the lowest predicted normal FEV1 was entered for those patients. For example, patients who identified as Black Asian or Caucasian Asian were simply identified as Asian. Individuals identifying as American Indian or Alaska Native, and Native Hawaiian or other Pacific Islander were also identified as Asian, since hypothetically their lung functions are lower than either Caucasian or Black. When there was conflicted demographic information for the same patient, the order list (Table 5.1) was followed to pull out information with a lower number first. For example, if a patient reported himself as Caucasian and Black among different visits, he was treated as Caucasian. 5.1.3 Other Exclusion Criteria After excluding those patients who did not have demographic information in the CFFPR, there were 41,043 unique patients left. After excluding those patients who did not have at least 1 year of visits before being diagnosed with nonmucoid PaPI, or did not 151 have at least 1 negative culture test result before being diagnosed with any phenotype of PaPI, the number decreased to 17,470, representing 1,217,848 visits. Another 9,429 patients were excluded according to the birth date, before Jan 1st 1988 or after Dec 31st 2005. Overall, 2,330 patients were diagnosed with mucoid PaPI before 2004, and an additional 29 patients received a transplant before they were initially diagnosed with nonmucoid PaPI; both of which conditions conflicted with inclusion criteria, so they were also excluded. Other exclusion criteria were: • patients who did not have any visits after 01/01/2006, • who had less than two visits annually, • who did not have FEV1 measured in any visit, • and visits when patients were younger than 6 years old. After applying all the exclusion criteria, 4,970 unique patients were left, encompassing 322,549 observations. This is the final cohort used for the majority of Objective 1. 5.1.4 Assumptions on Recovered Lung Function after Hospitalization Since this study mainly focuses on investigating dynamic treatment regimes in order to optimize treatment effects, short-term lung function deterioration during hospitalization was not taken into consideration. To support the decision, a preliminary analysis was conducted to investigate the predicted FEV1% trajectory during hospitalization for each patient using the cohort. In the analysis, all hospitalizations that occurred in each calendar year were captured independently. The number of events, mean duration of hospitalization, mean value of predicted FEV1% at the first date and last date 152 of the hospitalization, mean relative change of predicted FEV1% during hospitalization, and mean relative change of predicted FEV1% per day during hospitalization for each patient were measured according to the different reasons listed for the hospitalization. Because a patient's lung function at initial hospitalization was not reported before 2010, only 2010 and 2011 results were used in this preliminary analysis, which investigated the predicted FEV1% trajectory during hospitalization for each patient. Table 5.2 shows the predicted FEV1% trajectory in 2010. More than a quarter of the patients were hospitalized because of pulmonary exacerbation (PEx) annually (1338/4970 = 26.92%). On average, this condition caused more hospitalizations than any other reasons for each patient (1.61 events). During each hospitalization, the mean durations varied from 6.98 to 14.80 days. The top 4 reasons for longer hospitalizations were pulmonary complications other than PEx, PEx, others, and sinus infection. Hospitalizations on average lasted 18.43, 14.80, 11.99, and 11.96 days, respectively. The range of predicted FEV1% at both the first and last date were huge, and PEx-caused hospitalizations had the widest range (13.03, 152.43) and (15.09, 161.65) at the first and last date, respectively. Generally speaking, patients had worse lung function on the first date than the last date regardless of the reason for hospitalization. PEx and non-transplant surgery were the only two reasons that had moderate impaired lung function at the index date in the median (66.84% and 65.30%). Compared with other reasons, PEx, sinus infection, and pulmonary complications other than PEx had more improvement on relative change of predicted FEV1% during hospitalization, which on average were 0.22, 0.15, and 0.14 times higher. The relative change of predicted FEV1% decreased to 0.02 times higher per day during hospitalization for the above three reasons. Table 5.3 153 indicates similar trends. In summary, the predicted FEV1% trajectory increased during the hospitalization. According to different reasons, on average, the improvement ranged from 0.01 to 0.22 times. Patients hospitalized for lung-related reasons, such as PEx, pulmonary complications other than PEx, and sinus infection, had more severe lung function on the first date of the hospitalization. Therefore, the FEV1 value that was measured on the last date of hospitalization is more reasonable value to be applied for the purposes of the study as the value that the patient would have when not sick enough to need hospitalization. According to this result, multiple visits that occurred during the same hospitalization were combined as one "cured visit," when the reason that caused hospitalization was cured. The rationale was that whenever the hospitalization was terminated, the patient's lung function achieved a peak as the short-term deterioration was cured. It was handled in the following manner: 1) excluded the multiple visits during the same hospitalization and corrected encounter date to the last hospitalization date (when the reason that caused hospitalization was cured), 2) calculated the cured FEV1 after the related hospitalization, 3) adjusted the chronic treatment that the patient had received during the hospitalization. Specifically, only the last date of hospitalization, or the last measured date in the care episode, was kept; the other visits that occurred during the same hospitalization were deleted. A majority of the time, the patient only had one visit at the end of the hospitalization, either the last date of hospitalization or the last measured date in the care episode, and the last measured date in the care episode always occurred earlier than the last date of hospitalization. However, if the patient had visits recorded at both of those 154 two dates, then the last date of hospitalization was kept, unless FEV1 was measured at the last measured date in the care episode. In order to calculate the cured FEV1, the following procedures were conducted: 1) If there was a measurement of lung function at the last date of care episode, which also occurred earlier than the last date of that hospitalization, then that value was used; 2) If there was no measurement at the last date of the care episode, then the last measured lung function at the last date of hospitalization was used; 3) If none of the above scenarios were met, then the mean FEV1 that occurred within the next 6 months, when the disease was stabilized, was applied; 4) Otherwise, a missing value was assigned. Stabilized disease status was defined as not currently experiencing a PEx, or not experiencing more than a 10% decrease on predicted FEV1% compared to the maximum value in the past year for patients with moderately or severely impaired lung function, or not currently assessed with moderate or severe exacerbation. Finally, as long as a patient had received their chronic treatments at any visit during the hospitalization, the related treatments were marked as occurring during the "cured visit." Overall, there were 305,409 visits left. In order to differentiate the influence of PEx and any other reasons that caused hospitalizations, the number of changed FEV1 was reported independently. For PEx-caused hospitalizations, there were 2,835, 3,804, 1,220, and 6,331 visits that had a new FEV1 generated according to the above four scenarios, respectively. The numbers were even lower for hospitalizations caused by other reasons: 162, 399, 497, and 1,071 for each individual scenario, respectively. In a word, only about half of the "cured visits" 155 had a FEV1, the rest of the FEV1s in the "cured visit" category were imputed in Aim 2. Lung function values in 349 visits were collected at a consecutive future visit, since the last date that lung function was measured during the care episode occurred later than the last date of hospitalization and equaled the values in the consecutive visit. However, without further information, those values were not adjusted. Finally, those visits that inappropriately captured treatment information were adjusted. All the inhaled antibiotics can be used continuously, every alternative month, or at some other frequency level. In the CFFPR, patients were reported as not on treatment when the date of the visit fell within the gap of a break month. In order to better capture treatment change patterns, those "not on treatment" were adjusted as "on treatment" as long as the encounter date was located in a scheduled gap within the treatment protocol. Previously, there were 58,020, 5,189, 4,299 visits on inhaled tobramycin, inhaled colistin, and inhaled aztreonam, respectively. After the adjustment, the number went up to 60,233, 5,480, and 4,646, respectively, which represents a 5% increase on average. All above assumptions only affected the number of visit, decreasing from 322,549 to 305,409, but not number of unique patients, which were still 4,970 in the cohort. 5.1.5 Assumptions about Imputing Height and Weight Generally speaking, the majority of time-independent demographic characteristics were imputable without making any assumptions. As long as they were recorded once among all the visits for the same patient, variables such as age, gender, race are imputable. On the other hand, the majority of time-dependent covariates are not imputable without further assumptions. For example, previously, the FEV1 in the "cured visit" was imputed 156 with a composite assumption. However, the more assumptions a study has, the more likely it is biased. Therefore, it is important to balance the number of assumptions and the number of missing values. In order to calculate the predicted FEV1%, a clinical signal for making a treatment change decision, height and weight are required. The arithmetic mean was calculated for the time-dependent demographic variables, height and weight, using the values of those variables that occurred right before and after the missing visit and conditional on time. If there were consecutive missing visits, then the fixed changing trend was assumed among those visits and those missing values were imputed independently. If the missing data happened at the index date, then the value that occurred in the original database right before the index date, together with the value that occurred right after the index date, were used to calculate the value at the index date. Here, the index date was defined as the date of the first visit in the cohort or the initial diagnosis with nonmucoid PaPI after the beginning of 2006 for each individual patient. Out of 309 patients who didn't have a measurement of height in the index date, only 11 of them lacked a prior measured height and only 4 of them did not have a posterior height recorded in the entire 3-year interval. All the patients who didn't have posterior height measured were caused by short follow-up time until the development of outcome, or end of study, or hypothetically stopped growing, older than 18. For those patients who did not have any information on height during the 3-year interval prior to the index date, if they were older than 18 years old, and a posterior measurement occurred within 1 year, or the posterior height was measured within 6 months after the index date regardless of age, the posterior height was used as the height at the index date. If the patient did not have any 157 posterior height measurement in the 3-year interval after index date, and a height was measured within 1 year prior to the index date, then the prior height was used. After the above procedures, 2 patients still had missing values on height. Since both of them had a huge gap of time from a previous measurement, the growing trends of a patient with similar disease severity (transplant status and F508) and demographic information (age, gender, race) were employed to impute their missed heights. The same procedures were also applied on the imputation for weight. After finishing the data management, all visits that occurred prior to the index date were excluded. 5.2 Results To fulfill Aim 1, this section described the results from four parts. The first part described the baseline characteristics of patients in the cohort. Then, the results of subgroup analyses were present, which investigated the association between the baseline characteristics and the volume of CFTR function that patients had, and the association between the baseline characteristics and the number of treatment classes that patients received. At the same time, the annual competing risks of death were summarized. Last, the medication trends were described by summarized the baseline treatment combinations that patients received and the treatment change patterns in the cohort. 5.2.1 Baseline Characteristics of Patients in the Cohort Overall, there were 4,970 unique patients. After excluded the visit that occurred before index date, the number of visits decreased from 305,409 to 108,567 in the cohort (Figure 5.2). Table 5.4 represents the baseline demographic characteristics. More than 158 50% of patients were younger than 8 when enrolled in the cohort. There were 17.95%, 15.63%, and 10.93% patients in the 9~11, 12~14, and 15~17 age group, respectively. Only 120 patients were older than 18 years old at the index date. Slightly more female patients, 50.62%, were in the cohort. The dominant race in the patient cohort was Caucasian. Black and Asian patients only represented about 5% of all patients. The majority of patients were non-Hispanic, 92.66%. Generally speaking, height and weight tended to increase consistently in the 6~9, 9~11, 12~14, and 15~18 age groups regardless of the measurement, mean, median, or value in the 1st or 3rd quartile. For those age groups older than 18 years old, the trend of growing height and gaining weight decreased. Some patients in the 9~11 age group had extreme high weight, even higher than the patients in the 12~14 age group. However, the CF patients tended to be shorter and slimmer than the normal population. For example, the mean height and weight were only 167.95 cm and 60.33 kg for patients who were older than 18 years old. Less than 1% of patients had ever smoked. Only 6 patients were pregnant. Patients had few comorbidities, such as CFRD (3.86%), DIOS (1.41%), pancreatitis (0.30%), and ABPA (2.37%) at the baseline. Nobody had hemoptysis at the index date. Other than pancreatic insufficiency, which bothered 93.54% patients, GERD was the only comorbidity that affected more than 10% of the population. Table 5.5 indicates the distribution of mutation class. From mutation class I to V, both the function of CFTR and patient's lung function increased, as long as other demographic characteristics and clinical values were fixed. For each patient, at most two mutations were reported. The majority of patients were classified in mutation class II regardless of using the 1st or the 2nd mutation. Compared to the 2nd mutation, the 159 proportion of patients who were classified as class II in the 1st mutation was higher (86.50% vs. 56.02%). There were few class IV and V mutations in the 1st mutation. In the 2nd mutation, the proportion increased to 1.91% and 1.81% for class IV and V, respectively. Unlike the 1st mutation, in the 2nd mutation, more patients' mutations could not be classified in any existing mutation class, because of the uncertain CFTR functions that were associated. Following Green et al.,163 the mutation classes were further categorized into three groups. Specifically, patients with two mutations in class I, II, or III were grouped together, because their mutations typically lead to little or no CFTR function. Patients with one or two mutations in class IV or V were grouped together because these mutations are associated with residual CFTR function. If the class of any mutation was unsure or failed to measure, those patients were grouped, which had unsure CFTR function. The majority of patients had little or no CFTR function (77.55%); 4.25% and 18.21% patients had residual and unsure CFTR function, respectively. In the following sections, the little or no CFTR function group is abbreviated as the no CFTR function group. Table 5.6 presents the baseline clinical information of this cohort. About ¾ of patients had mild impaired lung functions. Only 55 patients (1.11%) had severely impaired lung function at the baseline. Many patients did not have their lung function measured at the index date, which caused a higher missing rate (16.02%). In the mildly impaired lung function group, the median value is 100.73% with 114.61% as the 3rd quartile. For patients who had severely impaired lung function, the median value is 35.22% with 27.23% as the 1st quartile. More than ¾ patients did not have any PEx in the 160 previous year before the index date. As the number of PEx episodes increased from 1 to 3, the proportion of patients who had specific number of PEx decreased from 15.63% to 1.33%. There were less than 1% patients who had more than 3 PEx in the previous year. The number of PEx with loose definition had more number of PEx. However, similar to PEx, with the increase of the number of PEx under loose definition, the proportion decreased. Patients who had more than 5 PEx with loose definition represented less than 1% of patients of the cohort. The proportion of patients who had drug resistance were also low, 1.57%, 0.66%, and 0.82% for aminoglycoside, beta-lactum, and quinolone, respectively. 5.2.2 Baseline Characteristics of the Subgroup Patients in the Cohort Subgroup analyses of the baseline information were conducted conditional on mutation class and number of initial treatment class that a patient received. As shown in Table 5.7, there were associations between mutation classes and some demographic characteristics, such as age, race, ethnicity, and pregnancy, at the index date (chi-square p-value < 0.05). If a patient had no CFTR function, that patient was more likely to be documented when young (55.27% vs. 46.45% and 45.30% in 6~8 age group). More patients identified as Black in the unsure group (9.28% vs. 2.78% and 3.32%). With the increase in CFTR function, more patients identified themselves as Hispanic (5.66% vs 9.95%), and the proportion of Hispanic patients was highest in the unsure group (13.92%). Compared with patients in the no CFTR function group, patients who had residual CFTR function were more likely to be pregnant (0.47% vs. 0.10%). Other than young adults who had similar mean height, the other patient groups had statistically significant 161 differences on means of height and weight (ANOVA p-value < 0.05). Patients who had residual CFTR functions on average were taller and heavier, and patients who had no CFTR function on average had the slowest physical development. As age increased, the difference in weight between patients with no CFTR function and residual CFTR function, on average, grew. It reached the peak in the 15~18 year-old group (64.22- 55.33 = 8.89 cm). There was no statistically significant association between the rest of the demographic characteristics and the mutation classes in the baseline. Mutation classes were also associated with some comorbidities, such as CFRD, GERD, and pancreatitis. GERD was one of the most common comorbidities at the index date, which affected more than 10% of the patients. Other than pancreatitis and ABPA, the more CFTR function a patient had, the less likely they were to suffer from comorbidities at the index date. Table 5.8 shows similar trends; the more CFTR function a patient had, the healthier he was. However, the differences were not statistically significant (Chi-square p-value > 0.05). Patients with residual CFTR function had both higher proportions in the mildly impaired lung function group (74.88% vs. 73.82% and 72.15%) and higher predicted FEV1% on average (107.14% vs. 102.83% and 101.92). However, the missing rate was also higher in the residual CFTR function group. Patients with residual CFTR function were less likely to have PEx in the past year regardless of which definitions were applied (84.36% and 63.03% patients did not have PEx and PEx with loose definition). The probabilities of having drug resistance were low in all three groups. Other than drug resistance of aminoglycosides, which affected about 1.5% patients, other drug resistances affected less than 1% of patients in the related group, respectively. 162 Table 5.9 indicates that there were associations between the number of treatment classes that a patient received and some demographic characteristics, such as age, race, and ethnicity at the baseline. Only 3 treatment classes were taken into consideration: mucolytics (ML), anti-inflammatories (AI), and inhaled antibiotics (IA). Compared with the no initial treatment group, the proportions of patients aged 6 to 8 were higher in patient groups who received 1, 2, or 3 treatment classes. Conversely, the no initial treatment group had higher proportions of patients aged 9 to 11, 12 to 14, and 15 to 17 compared with the rest of the groups. Adult patients represent a small proportion in each group ranging from 1.19% to 4.58%. The majority of patients were Caucasian, representing more than 92% of patients in each individual group. However, the distributions of other races were not consistent. The proportion of Black patients was the lowest (3.07%) for patients who received 2 classes of treatments, and the highest (4.55%) for patients who received 1 class treatment. The proportion of Asians doubled in the 3 classes groups, compared with the rest of the groups (2.94% vs. 1.23%, 1.27%, and 1.02%). The more treatment classes that a patient received, the more likely the patient was Hispanic, increasing from 5.41% to 14.05%. There were associations between the number of treatment classes and physical development, such as height and weight, when the patient was aged 6~8 or 15~18. The more treatment classes a patient had received, the more likely on average they were to have had a slow physical development. The trend could be identified even for those age groups which did not have statistically significant different means. There were also associations between the number of treatment classes that a patient received and comorbidities such as CFRD, GERD, and ABPA. The more treatment classes a patient received, the more likely they were to have had one of the 163 above comorbidities. However, patients who did not receive any treatment had a higher chance of suffering from CFRD (4.23% vs. 2.59%) and ABPA (2.14% vs. 1.90%), compared to patients who received one class of treatment. Table 5.10 shows the associations between the number of treatment classes that a patient received and clinical variables such as predicted FEV1%, PEx, and drug resistance. From 3 to 1, the fewer treatment classes a patient was receiving, the more likely the patient was to have less impaired lung function. No initial treatment group shared similar distribution on the severity of impaired lung function with the 1 treatment class group, but the 1 treatment class group had a higher proportion of patients who suffered from moderately impaired lung function and a lower proportion of missing values. Moreover, the more treatment classes that a patient received, the lower the predicted FEV1% on average. However, the mean predicted FEV1% was lower for patients who did not receive any treatment, compared to patients who received 2 classes of treatments if they had mild (100.83% vs. 102.76%) or severely impaired lung function (31.91 % vs. 34.23%). Similarly, with the increase of number of treatments, the chance of having more PEx also increased regardless of which definitions were applied. However, patients in the no initial treatment group had a lower chance of not having any or having only one PEx compared to a patient who received 1 class of treatment (80.77% vs. 82.67 and 13.58% vs. 13.92%). Among each individual drug resistance, the more treatment classes that a patient received, the more likely they were to develop drug resistance. 5.2.3 Competing Risks of Death by Calendar Year 164 Considering that CF is a progressive, genetic, long-term disorder, other than lung function deterioration, several other reasons could also trigger death. Therefore, the prevalences and incidences of death that relate to different reasons were investigated. Table 5.11 shows that respiratory/cardiorespiratory was the main cause of death, with 7.64 person/1000 patient-years as the incidence. Among those competing risks, transplant-related death was the only one that consistently caused more than 1 person/1000 patient-years. The other causes of death were lower in prevalence. Because of the low incidence rate, there was barely any difference between prevalence and incidence rate among all other groups. 5.2.4 Treatment Combinations and Treatment Change Patterns Figure 5.3 represents the proportion of patients who were on different treatment combinations at the baseline. In order to simplify the tick, a three-digit number was created. From the left to the right, the value represents the number of treatments that a patient received in the inhaled antibiotics, mucolytics, and anti-inflammatory classes. For example, a ‘000' means the patient didn't receive any treatment, and ‘111' means the patient received 1 inhaled antibiotics, 1 mucolytics, and 1 anti-inflammatory in the visit. Overall, there were 3,702 patients who had at least 1 treatment change. More than 1/3 of patients (33.55%) were treatment naïve at the baseline. Other than no treatment, the top 4 treatment combinations were ‘010' (24.58%), ‘110' (12.48%), ‘011' (6.21%), and ‘111' (5.24%), which overall included 50.01% of patients. There were fewer patients on other treatment combinations. Overall, 1,059 patients (28.61%) received one or more inhaled 165 antibiotics in the baseline visit. Given other treatments remained fixed, with the increase on the number of inhaled antibiotics a patient received, the proportion decreased. Regardless of the number of inhaled antibiotics that a patient received, the top 4 common combinations of mucolytics and anti-inflammatories were ‘00' (38.01%), ‘10' (37.36%), ‘11' (11.83%), ‘20' (6.56%). To better visualize the proportion of switching to other potential treatment combinations and the length of using the current treatment, four heat maps were created (Figure 5.4, 5.5, 5.6, 5.7). Unlike Figures 5.4 and 5.5, which only capture the 1st switch, Figures 5.6 and 5.7 capture all changes during the study. Figure 5.4 and Figure 5.6 indicate the relationship among current treatment, potential treatment combination that a patient could switch to, and the proportion of patients who switched from the current treatment to a related targeted treatment combination among all switches. The x-axis represents the targeted treatment combination that a patient could switch to, and the yaxis represents the treatment that a patient was currently on. The color of the square on the crossing represents the proportion of patients that would switch from the current treatment to the related targeted treatment combination among all the switches. The darker a square is, the more likely a patient would follow this switching path, which was defined by the combination of a current treatment and related potential treatment combination that a patient could switch to. Similarly, in both of Figure 5.5 and Figure 5.7, the color indicates the mean length of using the current treatment. The darker a square is, the longer a patient used that treatment on average. All the white parts indicate that no patient had followed that switching path. Since the treatment that a patient received in the last visit could only be identified at a future visit, which was not measurable, treatment 166 change was not considered at the last visit. When the first treatment change occurred, there were 24 treatment combinations for the current treatment, and 30 potential treatment combinations that a patient could switch to. As shown in Figure 5.4, the darkest red occurred when patients switched from ‘000' to ‘010', which indicated that about 14% (13.56%) of the 1st switch was to initiate 1 mucolytic. Other than this switching path, ‘010' to ‘110', and ‘010' to ‘020', were also darker than the rest, which represents 10.02 % and 8.05% of the 1st switch, respectively. The fewer treatment classes a patient received, the more potential treatment combinations he could switch to. For example, treatment naïve patients (‘000') had 22 combinations they could switch to, which represents 33.55% of the 1st switch. For patients who received only 1 mucolytic, there were 9 potential treatment combinations, and the overall proportion was 24.58%. The number of potential treatment combination decreased to 4, and the overall proportion decreased to only 3.70% for patients who were only on 2 mucolytics. Each oblique line represents a specific treatment change pattern regardless of what current treatment a patient received. For example, the oblique line that includes switching path from ‘000' to ‘010' indicating that the physician prescribed an additional mucolytic whenever a treatment change decision had been made. The squares on 3 oblique lines were more likely to have darker colors, which indicated the potential trends on treatment change decisions that a physician was prone to make. In each one of the oblique lines, the physician was prone to prescribing an additional treatment from one of the three treatment classes, respectively. All those three lines had several squares in darker color. However, the color on the oblique line of prescribing one additional mucolytic was on average darker than the other two. So, among those 3 treatment change 167 patterns, physicians were prone to prescribe 1 additional mucolytic. Other than the squares that were on those three oblique lines, the other switching paths were less likely to be followed. Nobody stopped using a treatment in any treatment class, which was indicated by the empty of any square in the left corner of Figure 5.4. Figure 5.5 represents the length of using the current treatment. In this study, yellow was applied to indicate an extreme length of using the current treatment, more than 1,461 days-4 years. One switching path had this color: patients who received 2 inhaled antiobiotics and 1 mucolytic in the current treatment and were going to receive an additional inhaled antibiotics. Other than this path, patients who switched from only 1 anti-inflammatory to 1 inhaled antibiotic, 1 mucolytic, and 2 anti-inflammatories, had the longest length: 1,371 days. Unlike the distribution of darker squares in Figure 5.4, the majority of the squares were evenly distributed in Figure 5.5. However, from bottom to top, the color became darker. Alternatively, the more treatments that a patient received in the current treatment, the longer the patient was likely to stay on it. In Figures 5.6 and 5.7, the number of current treatment combinations increased to 30, and the number of potential treatment combinations increased to 33. The upper range of proportion decreased from 14% to 7.5%. So did length of using current treatment, which did not have any patient who was on their current treatment for longer than 4 years. The switching path from ‘000' to ‘010' was still in the top 5 (7.00%), but the path that had the highest proportion among all the switches in the cohort was the one in which patients used 1 inhaled antibiotic and 1 mucolytic in the current treatment and received an additional mucolytic in the future (7.33%). Several other squares also had darker color, such as ‘010' to ‘110' (7.01%), ‘111' to ‘121' (6.62%), and ‘010' to ‘020' (6.06%)'. 168 Similar to Figure 5.4, prescribing an additional inhaled antibiotic, or mucolytic, or antiinflammatory, were still the top three treatment change decisions that a physician was prone to make in Figure 5.6. Furthermore, the chance of prescribing an additional mucolytic was still higher than the other two. Unlike Figure 5.5, in Figure 5.7, the previous two paths, ‘210' to ‘310' and ‘001' to ‘112', that were associated with the longest length of using the current treatment were much lighter. The number decreased from 1,659, and 1,371 to 813 and 994, respectively. The following paths had the darkest color in Figure 5.7, ‘100' to ‘210', ‘110' to ‘212', and ‘012' to ‘122', which had 1,274, 1,281, and 1,296 days, respectively. Figure 5.7 also marked the trend where, as the number of treatments that a patient received in their current treatment increased, the length of using the current treatment was prolonged. 5.3 Discussions The discussion section is organized in the following manner. The first part focuses on the discussion of the mechanism, direction and extent of bias that each assumption may induce. The second part compares the baseline characteristics of patients in this cohort with the related statistics data in the CFF annual report. Then, the discussion summarizes the issues around subgroup analyses, especially focused on why the clinical information was statistically significant associated with the number of treatment that a patient received, but not associated with CFTR function that a patient had. After the subgroup analyses section, the reasons of increase in competing risks of mortality are analyzed. In the medication trends section, the reasons for not including bronchodilator as a treatment class, the indication of treatment combinations that patients 169 received at the baseline, and the issues around the results of treatment change patterns are discussed. At the end, advantages and disadvantages of this cohort, together with the impact of results in Aim 1 is summarized. 5.3.1 Summary Regarding Assumptions In order to better manage the data, several assumptions were made. Generally speaking, without appropriate controls, assumptions could induce uncertainty and bias in the final result. However, for the results in this study, the chance of being biased by those assumptions is low, since the majority of them were determined either based on wellaccepted clinical evidence, or investigated and supported by preliminary tests. At the same time, those assumptions were made conservatively. Therefore, even had those assumptions biased the results, it only underestimated the result; the real estimate could only be larger than the current results. In the following paragraph, three examples of this are explained. First, whenever the culture test results were conflicted, the more severe results were always identified. The rationale is that the time of developing mucoid PaPI is an important clinical signal of having severe lung function deterioration, which was well captured by the healthcare provider. Therefore, the assumption that all positive culture test results were assumed as nonmucoid before diagnosis with mucoid PaPI is reasonable. This assumption would not affect the identification of outcome-first date diagnosed with mucoid PaPI-unless the chance of misdiagnosed mucoid PaPI as an unknown phenotype of PaPI is really high. If by any chance it was misclassified, it only reduced the sample size, shortened the length of follow-up until developing mucoid PaPI, the 170 outcome of Aim 3, and underestimated the time to event. Moreover, for a multiracial patient, the race that was associated with less lung function was used to calculate the reference lung function. Under this situation, the predicted FEV1% would be higher than it should be, if the normal multiracial person actually had better lung function than the one that was used for them in the analysis. Alternatively, patients were marked healthier than they should be, which decreased the potential sample size that followed the treatment change strategy in Aim 2 and 3. Last, a series of conservative identifications and calculations were conducted to measure the recovered lung function after a hospitalization, which decreased the change of predicted FEV1% between ‘cured visit' and follow-up visit. If by any chance there was a rational treatment change in the followup visit, then the estimate, using relative change of predicted FEV1% as a clinical signal, would be underestimated. Therefore, the estimates in all 3 objectives were minimum values. 5.3.2 Baseline Characteristics of Patients in the Cohort As was demonstrated in Table 5.4, young patients constituted the cohort, with less comorbidities. Compared with the CFFPR 2011 annual data report, which summarized the whole CF patient cohort in the US, that around half of patients were older than 18 years old, only 2.41% of patients were in the same category in this cohort. Even though, majority of patients were extremely young in this cohort, the distribution of gender, race, and ethnicity were consistent between patients in the report and in this cohort. The majority of patients were Caucasian and non-Hispanic with equal chance of either being a male or a female. Because of the younger distribution of age, patients had less 171 comorbidities compared with the report. Using CFRD, GERD, and DIOS as an example, only 3.86%, 1.41%, and 13.52% of patients suffered from above comorbidities, respectively. The related number went up to 19.0%, 28.9%, and 4.7% in the report. However, their physical developments, especially height and weight, were extremely slower than that of people without CF. Because of this, the generalizability of Aim 1 and the following 2 aims are mainly on the early stage of patients, rather than the entire population of CF patients. This scenario and narrower generalizability is caused by the inclusion criteria that a patient was either only diagnosed with nonmucoid PaPI but mucoid PaPI before 2006 and alive till 2006, or initially diagnosed with nonmucoid PaPI after 2006. Regardless of the inclusion criteria that a patient followed, the chance of being young was high. At the same time, the results of time irrelevant variables were consistent in this cohort and the annual report, which also supported the conclusion that the narrower generalizability is acceptable. The distribution of patients' disease severity is consistent between the report and this cohort, using the functions of CFTR protein as the measurement. The majority of patients had little or no CFTR function, and only about 5% and 20% of patients had residual and unsure CFTR function, respectively. Even though the report failed to indicate the disease severity by another measurement, the class of mutation on two chromosomes, considering the consistency using the functions of CFTR protein as the measurement between the report and this cohort, the disease severity in this cohort is generalizable and acceptable with a small chance of being biased. Table 5.5 demonstrates that compared with 2nd mutation, the 1st mutation had a higher proportion of mutation class II (86.50% vs. 56.02%). However, the 1st mutation had lower proportions of 172 mutation class I and unsure class (4.89% vs. 17.83% and 1.85% vs. 13.52%), which absorbed the difference on mutation class II. There were 18.21% patients that were classified into group of having unsure CFTR function. The majority of them were determined by the mutation class on the 2nd mutation. In the 1st mutation, only 6.3% mutations belonged to the unsure class; however, the number went up to 17.97% in the 2nd mutation. This may cause information bias, since some patients who only had one mutation within unsure class could be assigned to either no function or residual function group. According to current knowledge, without considering the unsure class, a patient would be classified as no function if both mutations are in class I, II, or III. As long as one mutation belongs to class IV or V, then the patient has residual function. Therefore, a patient, who had one mutation in class IV or V and another one in the unsure class, should be classified into the group with residual function. A patient who had one mutation in class I, II, or III and another one in the unsure class, without further information about the unsure mutation, could be either classified as no function or residual function. It seems that the creation of the unsure group would bias the result. However, the chance is trivial. First, the whole assumption is based on a determination that the mutation in the unsure class has a similar function as the one in 1 of the 5 mutation classes, if it has superior function that class V or inferior function than class I, then the assumption is violated, and more classes are needed. Even if the unsure mutation belongs to 1 of the 5 classes, according to current information, only 12 patients could be reclassified into a group with residual but unsure function, which represents only 1.33% patients in the unsure group. Therefore, the disease severity in this cohort is generalizable and acceptable with a small chance of being biased. 173 In general, patients were healthy at the baseline. The median of predicted FEV1% was higher than 100%. This was probably caused by the following three reasons: 1) the low accuracy of lung function measurement had overestimated the FEV1; 2) the reference lung function, which was predicted by using NHANES method, had underestimated the FEV1; 3) the majority of patients were young, aged 6 to 8, whose lung function did not deteriorate too much. Considering the result of lung function was similar to the one in the CFFPR annual report, which included all CF patients in the U.S., the chance of having biased lung function was decreased. The proportion of missing values on predicted FEV1% was high, which was caused by failing to measure FEV1. It could be a reflection of not trusting the accuracy of lung function measurement on children, together with the reality that those young patients had good lung function. The definition of PEx was vague. In order to capture all the events, the categories PEx and PEx with loose definition were created. Unlike PEx, which was determined by the number of PEx-caused hospitalizations alone, the number of visits when the patient reported having moderate to severe PEx was also taken into consideration to define PEx with loose definition. Compared with PEx, PEx with loose definition identified more PEx incidents. However, the proportion of patients who had less than 2 PEx events were similar to patients who had less than 4 PEx events using loose definition (93.98% vs. 95.98%). 5.3.3 Baseline Characteristics of the Subgroup Patients in the Cohort The unsure group was a mixture of patients who had little or no CFTR function and who had residual functions, which was supported by several baseline characteristics. 174 The mean of height and weight in Table 5.7 indicated the conclusion. Other than when a patient was older than 18 years old, the unsure group always had a mean height and weight that was in the middle between the residual function group and the no function group. The range of height and weight in the unsure group was even wider than the residual CFTR function group. The proportion of patients who had comorbidities such as CFRD, DIOS, and GERD also supported this trend. The more CFTR function that a group had, the less likely it had specific comorbidities. The proportion of having specific comorbidities in the unsure group was located in the middle of the residual function and the no function group. The associations in Tables 5.7 and 5.9 were consistent in demographic characteristics such as age, race, ethnicity, and comorbidities such as CFRD, and GERD. They were statistically significantly associated with both mutation class and number of treatment class that a patient received at the baseline. Similarly, the less CFTR function that a patient had, or the more treatment class a patient received at the baseline, the slower physical development the patient had experienced. However, there were some differences; several associations were only identified with the mutation class. Patients who had residual CFTR function were more likely pregnant or had pancreatitis at the baseline. Considering that the probability of having each one of those two events was low, it did not affect the conclusion. However, several variables or specific categories were conflicted with the trend that the more treatment classes a patient received at the baseline, the more likely the patient was to suffer from a specific comorbidity. For example, the proportion of having pancreatitis decreased as the number of treatment classes that a patient received at the baseline increased. Compared with a patient who received one 175 class of treatment, patients who did not initiate any treatment at baseline had a higher probability of being diagnosed with CFRD. However, there was a huge difference of associations in Tables 5.8 and 5.10. Generally speaking, the clinical information was statistically significant associated with the number of treatment that a patient received, but not associated with CFTR function. The better clinical outcomes that a patient had, the more likely he would receive less class of treatment in related visit. Alternatively, it could be treated as a signal that the above clinical information was applied in the decision-making of prescribing, at least the number of treatment class that a patient received at the baseline. Even though those were associations, not causations, which cannot directly prove the hypothesis, as a descriptive objective, the above analyses had already indirectly supported the hypothesis that clinical information could be applied to determine rational treatment changes on the treatment class level and proved that the study aimed in the right direction. Surprisingly, there were fewer younger patients in the no initial treatment group than the other groups, which probably indicated that the young patients were overtreated. Considering the effect of CFTR function, the mutation class should be considered in treatment decision-making. This is explored further in Aim 2. 5.3.4 Competing Risks of Death by Calendar Year The majority of deaths were caused by respiratory-related comorbidities; the competing risks of death for other comorbidities were low. As shown in Table 5.11, there were increasing trends of death in several groups, which could be explained by the following reasons. First of all, the quality of data was better after 2006, supported by a 176 preliminary analysis. Given the improvement in collecting information, it was highly likely that more deaths and reasons for death would be captured. Moreover, the improvements on length of survival also contributed to this trend. The longer a patient lives, the more likely he/she would be to die from reasons other than lung function deterioration. 5.3.5 Treatment Combinations and Treatment Change Patterns At first, bronchodilators were also considered as another treatment class. However, considering the result of another preliminary analysis, the unstable, irrational treatment change that was caused by including BD as another class, and the limited treatment effects of BD, this study focused on only the 3 treatment classes. There were 24 treatment combinations in the baseline visit. However, the top 5 treatment combinations encompassed more than 4/5 (82.06%) of the patient population. More than 2/3 (71.39%) of the patients did not receive any inhaled antibiotics at the baseline. More than 80% (81.93%) of the patients did not receive any anti- inflammatories. Mycolytics is the treatment class that most patients were on (59.32%). The above results indicated that patients in the cohort either had acceptable lung function or were undertreated. Considering the results of lung function in Tables 5.6, 5.8, and 5.10, the results match the results here, which support the former conclusion that patients had acceptable lung functions at the baseline. It was rare for a patient to receive more than 1 anti-inflammatory or more than 1 inhaled antibiotic. Several interesting results were identified by heat-maps. First, physicians were prone to prescribe an additional inhaled antibiotic, or mucolytic, or anti-inflammatory to 177 maintain patients' lung functions. However, the chance of prescribing additional mucolytics was higher than either of the other two. It was probably caused by the certainty and estimate of net benefit together with the more approachable price of medications in that class. This result is significant, in terms of supporting the search for evidence-based decision-making and precision medicine. The chronic treatment guideline only recommends a treatment according to the certainty and estimate of the net benefit, but fails to suggest a treatment change pattern and probability of following a specific switching path. With the successful identification of the above results, together with the optimal rational treatment change strategy that was identified in Aim 3, this gap can be bridged. In the future, a physician should be able to determine the optimal treatment strategy and the path of treatment change according to the patient's characteristics, clinical values, and treatment history. Moreover, the trend of length of time on current treatment was also consistent in Figures 5.5 and 5.7. The more treatments that a patient received, the longer the patient would keep using the current treatment. The conclusion was supported by the trends in those figures, from the bottom to the top, the color of squares got darker on average. This pattern was particularly prevalent when the patient was treatment naïve or only receiving 1 anti-inflammatory. Compared to Figure 5.5, Figure 5.7 is more solid, which was supported by the following reasons. First, there were more patients in each square, which decreased the chance of having extreme treatment length. As an example, patients who had the top 2 longest treatment length in Figure 5.4 had much lighter color in Figure 5.6. Previously, there was only one patient in each one of the switching paths ‘210' to ‘310' and ‘001' to ‘112'. However, the number of patients increased to 7 and 2, respectively. 178 Therefore, the chance of having an extreme length would be significantly eliminated by having more patients in each switching path. This explanation also applies to the squares, which had extreme short length but advanced current treatments. Moreover, both figures failed to measure treatment change at the last visit. However, considering there were more treatment changes in the second figure than the first, the chance of having a biased result in Figure 5.7 should be lower than in Figure 5.5. Last but not least, Figure 5.5 only measured the 1st treatment change in the cohort, in which patients would definitely be healthier than later when declines in health prompted the recorded treatment changes. There is no doubt that Figure 5.5 would fail to measure severe situations. For example, none of the patients had received more than 2 inhaled antibiotics as the current treatment in Figure 5.5. In summary, the more advanced treatments that a patient received, the longer he would keep using the current treatment. Last but not least, the fewer treatments that a patient received, the more treatment combinations were available to switch to. As mentioned previously, as the current treatment combination improved from ‘000', ‘100', to ‘200', the potential number of treatment combinations that a patient could switch to decreased from 22, 8, to 3, which exactly matched current knowledge. There were limited chronic treatments for CF patients, three inhaled antibiotics, two mucolytics, and two anti-inflammatories. At the beginning, when patients were treatment naive or only received one class of treatment, there were plenty of choices. As the disease progresses, after a patient has received advanced treatments, say 2 inhaled antibiotics, 1 mucolytic, and 1 anti-inflammatory, there are few choices left. Together with the economic burden, a majority of the time a patient may keep using the same advanced treatment combination, but having more 179 additional treatments, even the suboptimal status are reached. However, several treatment combinations rarely occur. There were only 33 treatment combinations in the cohort, other than no treatment, yet nobody reported that he was only on 3 inhaled antibiotics, or 3 inhaled antibiotics and 2 anti-inflammatories in the potential treatment combination. All of the above information portrays a cross-sectional treatment pattern of patients who were diagnosed with nonmucoid PaPI. Even though several assumptions were made, the chance of the results that was being biased by those assumptions is low, since the majority of them were determined either based on well-accepted clinical evidence, or investigated and supported by preliminary tests. In the baseline visit, patients in this cohort had acceptable generalizability for all CF patients in the U.S., but were younger, healthier with less comorbidities. The results from both the baseline visit and follow-up visits indicated that the clinical values were applied in the decision-making of prescribing. The better clinical outcomes that a patient had, the more likely he would receive less class of treatment in related visit. Even though there were 24 treatment combinations in the baseline visit, physicians were more likely (82.06%) to prescribe one of five treatment combinations. During follow-ups, physicians were prone to prescribe an additional inhaled antibiotic, or mucolytic, or anti-inflammatory to maintain patients' lung functions. The chance of prescribing an additional mucolytic was higher than either of the other two. At the same time, the more treatments that a patient received, the longer the patient would keep using the current treatment. Furthermore, the fewer treatments that a patient received, the more treatment combinations were available to switch to in future visit. This is the largest cohort of United States CF patients who were diagnosed with 180 nonmucoid PaPI and had not developed mucoid PaPI from 2006 to 2011. Because of the large sample size, diverse center in U.S., long-term follow-up, and good quality of the data, there was an excellent opportunity to comprehensively analyze this subgroup of CF patients. These results could not only indicate the variations of disease stage that this subpopulation exhibits, but also support the decision-making around having rational treatment changes. However, the drawback of this cohort is that because of the composite inclusion criteria, the cohort is a mixed population including both newly diagnosed patients and patients who had been diagnosed with nonmucoid PaPI for years. Because of this, the index date was determined to be either the first date when the patient was diagnosed with nonmucoid PaPI, or the first visit in 2006 for those patients who had been previously diagnosed with nonmucoid PaPI. Under this situation, the result is more generalizable for all patients who had been diagnosed with nonmucoid PaPI. However, the stability of the decision support these results provide is lower than one that could be identified using the subcohort, in which patients have identical disease stages. Therefore, this study is the first step toward analyzing how to make rational treatment change decisions, which maximize the delay of developing mucoid PaPI. Further analyses for more specific patient populations are needed. 5.4 Conclusions Even though several assumptions were made prior to the investigation, the chance of the results that was being biased by those assumptions is low, since the majority of them were determined either based on well-accepted clinical evidence, or investigated and supported by preliminary tests. 181 This is the largest cohort of United States CF patients who were diagnosed with nonmucoid PaPI and had not developed mucoid PaPI from 2006 to 2011. Among the 4,970 unique patients, the majority of them were Caucasian and younger than 12 years old. Since the age of this cohort was young, patients were healthy: they were barely affected by comorbidities, other than pancreatic insufficiency and GERD, at the baseline; majority of patients only had mild impaired lung function, did not have any PEx in the previous 1 year, and barely had any drug resistance. However, according to the result of genetic testing, more than ¾ of those patients had dysfunction of CFTR protein, which indicated more aggressive disease progression. Subgroup analyses indicated that the clinical signals were applied in the decision-making of prescribing, at least the number of treatment class that a patient received at the baseline. Because patients were young and healthy at baseline, they barely received advanced treatment combinations: more than half of patients either received no treatment or one mucolytic. Regardless of whether only considering the first treatment change or all treatment changes in the cohort, physicians were prone to change treatment prudently by only prescribing one additional treatment from any one of the three treatment classes. At the same time, the fewer treatment classes a patient received, the more potential treatment combinations he could switch to. Last but not least, the more treatments that a patient received in the current treatment, the longer the patient would keep using the current treatment. Table 5.1. Order list for dealing with variables with conflicted results Variable Order Race Caucasian 1 Black 2 Other 3 Gender Male 1 Female 2 Hispanic Yes 1 No 2 Death date Earliest 1 182 183 Figure 5.1. The strategy of adjusting inconsistent culture test results. Table 5.2. FEV1% trajectory during hospitalization by patient in 2010 2010 Reasons PEx Pulmonary comlications other than PEx GI complications Transplant related Sinus infection Non transplant surgery Other Unknown 2010 Reasons PEx Pulmonary comlications other than PEx GI complications Transplant related Sinus infection Non transplant surgery Other Unknown Number of event Number of 1st 3rd Mean patients Quartile Median Quartile 1338 1.61 1 1 2 46 1.13 1 1 Mean duration of hospitalization Number of 1st 3rd Range Mean patients Quartile Median Quartile (1, 10) 1338 14.80 10 14 17 1 (1, 2) 46 1.07 1 1 1 1.00 1 1 1 1.21 1 1 1 1.00 1 1 1 1.15 1 1 1 1.00 1 1 1 Mean value of FEV1% at first date Number of 1st 3rd Mean patients Quartile Median Quartile 1137 66.62 51.00 66.84 81.76 (1, 2) (1, 1) (1, 2) (1, 1) (1, 5) (1, 1) 73 6 14 26 106 34 73 6 14 26 106 34 25 76.09 52.52 82.26 95.77 25 2 7 5 40 0 76.13 79.78 80.42 78.05 84.97 . 61.90 79.78 62.88 63.43 69.34 . 75.47 79.78 75.90 65.30 86.76 . 102.16 79.78 103.23 81.56 99.32 . 15 (1, 145) 6.98 2 3 7 7.17 1 4.5 10 11.96 9 13 15 8.38 0 1 10 11.99 2 5.5 14 7.91 0 0 7 Mean value of FEV1% at last date Number of 1st 3rd Range Mean patients Quartile Median Quartile (13.03, 152.43) 1213 79.64 64.09 81.14 95.16 (0, 97) (1, 22) (0, 21) (0, 65) (0, 243) (0, 91) (33.48, 105.33) (25.91, 119.85) (79.78, 79.78) (45.26, 121.07) (62.29, 117.69) (31.72, 123.99) . 18.43 5 10.5 Range (0, 336) Range (15.09, 161.65) 37 81.13 68.37 89.43 94.91 (20.34, 124.57) 47 1 10 16 81 18 93.74 83.27 102.88 98.05 88.58 100.22 80.62 83.27 92.05 79.83 76.49 90.57 95.92 83.27 102.15 99.97 89.63 99.54 106.73 83.27 113.16 118.66 106.69 113.79 (33.28, 158.10) (72.03, 141.80) (46.50, 146.92) (14.15, 144.42) (46.83, 130.50) 184 Table 5.2. (continued) 2010 Reasons PEx Pulmonary comlications other than PEx GI complications Transplant related Sinus infection Non transplant surgery Other Unknown Mean relative change of FEV1% during hospitalization Mean relative change of FEV1% per day during hospitalization Number of 1st 3rd Number of 1st 3rd patients Quartile Median Quartile Mean Range patients Mean Quartile Median Quartile Range 1059 0.22 0.05 0.17 0.33 (-0.37, 1.85) 1054 0.02 0.00 0.01 0.03 (-0.04, 0.32) 22 0.14 0.02 0.08 0.18 19 1 4 5 33 0 0.05 0.04 0.15 0.01 0.03 . -0.02 0.04 -0.13 -0.02 -0.01 . 0.02 0.04 0.11 0.00 0.05 . 0.09 0.04 0.42 0.10 0.09 . (-0.10, 0.64) (-0.16, 0.44) (-0.23, 0.60) (-0.27, 0.25) (-0.76, 0.43) . 22 0.02 0.00 0.00 0.03 18 1 4 4 32 0 0.01 0.00 0.02 0.02 0.01 . 0.00 0.00 -0.01 -0.02 0.00 . 0.00 0.00 0.01 0.01 0.00 . 0.02 0.00 0.04 0.06 0.01 . (-0.005, 0.100) (-0.08, 0.07) (-0.02, 0.07) (-0.04, 0.10) (-0.03, 0.09) . 185 Table 5.3. FEV1% trajectory during hospitalization by patient in 2011 2011 Reasons PEx Pulmonary comlications other than PEx GI complications Transplant related Sinus infection Non transplant surgery Other Unknown 2011 Reasons PEx Pulmonary comlications other than PEx GI complications Transplant related Sinus infection Non transplant surgery Other Unknown Number of event Number of 1st 3rd patients Mean Quartile Median Quartile 1538 1.74 1 1 2 33 1.06 1 1 1 Mean duration of hospitalization Number of 1st 3rd Range patients Mean Quartile Median Quartile (1, 10) 1538 14.51 10 13 16 (1, 2) 33 10.12 3 11 14 Range (0, 517) (0, 29) 76 5 23 32 113 57 1.22 1 1 1 (1, 4) 76 4.64 2 2.25 6 (0, 22) (0, 21.80) 2.00 1 1 2 (1, 5) 5 8.36 1 6 13 1.09 1 1 1 (1, 2) 23 7.74 1 7 14 (0, 19) 1.06 1 1 1 (1, 2) 32 4.25 0 0 5 (0, 46) 1.28 1 1 1 (1, 18) 113 8.21 2 3 8 (0, 162) 1.05 1 1 1 (1, 3) 57 4.73 0 0 4 (0, 60) Mean value of FEV1% at first date Mean value of FEV1% at last date Number of 3rd Number of 1st 1st 3rd patients Mean Range patients Mean Range Quartile Median Quartile Quartile Median Quartile 1360 66.40 49.59 66.97 82.37 (12.01, 150.56) 1350 79.32 63.78 80.14 96.07 (14.33, 157.11) 25 74.71 63.93 77.91 86.49 24 2 13 11 46 0 81.43 77.46 86.31 94.52 73.82 . 65.70 66.80 61.79 83.11 53.38 . 86.59 77.46 82.50 99.20 75.60 . 98.40 88.12 95.29 103.95 90.71 . (27.36, 117.23) (31.01, 121.08) (66.80, 88.12) (54.05, 162.64) (36.17, 122.67) (21.39, 131.75) . 26 79.24 66.09 83.35 52 3 21 19 78 22 90.17 73.57 92.69 92.40 79.22 95.80 83.26 39.54 76.11 85.73 69.83 82.05 93.68 88.12 89.25 93.19 80.73 97.15 91.73 (23.35, 130.44) 103.69 93.06 114.67 103.14 96.86 107.52 (16.81, 126.00) (39.54, 93.06) (43.70, 160.77) (36.17, 117.56) (29.13, 128.79) (49.55, 135.41) 186 Table 5.3. (continued) 2011 Reasons PEx Pulmonary comlications other than PEx GI complications Transplant related Sinus infection Non transplant surgery Other Unknown Mean relative change of FEV1% during hospitalization Mean relative change of FEV1% per day during hospitalization Number of 1st 3rd Number of 1st 3rd patients Mean Range patients Mean Quartile Median Quartile Range Quartile Median Quartile 1218 0.22 0.06 0.17 0.32 (-0.67, 2.07) 1210 0.02 0.00 0.01 0.03 (-0.08, 0.30) 22 0.13 0.00 0.13 0.25 17 1 12 7 39 0 0.05 0.00 0.07 0.00 0.03 . -0.01 0.00 -0.09 -0.03 -0.05 . 0.07 0.00 -0.03 0.00 0.03 . 0.09 0.00 0.07 0.01 0.11 . (-0.13, 0.54) (-0.08, 0.22) (-0.13, 1.01) (-0.10, 0.14) (-0.26, 0.46) . 20 0.03 0.00 0.01 0.05 17 0 12 3 37 0 0.01 . 0.02 0.05 0.01 . 0.00 . -0.01 0.00 -0.01 . 0.01 . -0.01 0.00 0.00 . 0.02 . 0.01 0.14 0.02 . (-0.03, 0.12) (-0.03, 0.09) . (-0.09, 0.34) (0.00, 0.14) (-0.05, 0.12) . 187 188 Table 5.4. Baseline demographic characteristics N Age 6~8 yrs 9~11yrs 12~14yrs 15~17yrs >=18yrs Total Gender Male Female Total Race Caucasian Black Asian Other Total Ethnicity Hispanic Non-hispanic Total Height (cm) 6~8 yrs 9~11yrs 12~14yrs 15~18yrs >18yrs Total Weight (kg) 6~8 yrs 9~11yrs 12~14yrs 15~18yrs >18yrs Total Smoking No Yes Not known/declined to a Total Transplant status No Had transplant Accepted, on waiting list Total Pregnancy No Yes Unknown Total % 2638 892 777 543 120 4970 Mean (range) 1st Quartile Median 3rd Quartile 6.72 (6.00, 9.00) 10.45 (9.00, 12.00) 13.42 (12.00, 14.99) 16.35 (15.00, 17.98) 19.42 (18.00, 22.78) 6.09 9.73 12.67 15.66 18.43 6.21 10.39 13.43 16.24 19.13 7.23 11.22 14.13 17.04 20.12 2454 2516 4970 53.08% 17.95% 15.63% 10.93% 2.41% 100.00% 0.00% 49.38% 50.62% 100.00% 4685 198 65 22 4970 94.27% 3.98% 1.31% 0.44% 100.00% 365 4605 4970 7.34% 92.66% 100.00% 2638 892 777 543 120 4970 53.08% 17.95% 15.63% 10.93% 2.41% 100.00% 116.83 (86.02, 146.00) 136.87 (114.00, 163.00) 153.64 (130.00, 186.00) 164.84 (140.00, 190.00) 167.95 (143.93, 188.2) 111.00 131.00 147.00 158.00 161.50 116.00 136.00 154.00 165.00 168.00 121.20 142.46 159.00 171.00 173.00 2638 892 777 543 120 4970 21.98 (12.30, 50.50) 33.18 (19.90, 100.00) 45.23 (22.70, 84.56) 56.22 (30.80, 105.00) 60.33 (39.60, 110.00) 19.10 27.80 37.50 48.80 53.51 21.20 31.40 44.10 55.00 58.60 23.80 36.40 51.10 61.70 65.30 4633 30 307 4970 53.08% 17.95% 15.63% 10.93% 2.41% 100.00% 0.00% 93.22% 0.60% 6.18% 100.00% 4954 9 7 4970 99.68% 0.18% 0.14% 100.00% 4949 6 15 4970 99.58% 0.12% 0.30% 100.00% 189 Table 5.4. (continued) N Comorbidities CFRD Pancreatic insufficiency Gastrointestinal symptoms DIOS GERD Pancreatitis Pulmonary ABPA Hemoptysis % 192 4649 3.86% 93.54% 70 672 15 1.41% 13.52% 0.30% 118 0 2.37% 0.00% Mean (range) 1st Quartile Median 3rd Quartile Table 5.5. Baseline demographic characteristics 190 Table 5.6. Baseline clinical information 191 Table 5.7. Baseline demographic characteristics by mutation class Mutation class I/II/III (little CFTR funtion) N Age 6~8 yrs 9~11yrs 12~14yrs 15~17yrs >=18yrs Total Gender Male Female Total Race Caucasian Black Asian Other Total Ethnicity Hispanic Non-hispanic Total Height (cm) 6~8 yrs 9~11yrs 12~14yrs 15~18yrs >18yrs Total % Mean (range) Mutation class IV/V (residual CFTR function) N % Mean (range) Unsure CFTR function N % Mean (range) Chisq/ ANOVA <0.0001 2130 679 581 383 81 3854 55.27% 17.62% 15.08% 9.94% 2.10% 100.00% 1914 1940 3854 6.72 (6.00, 9.00) 10.45 (9.00, 12.00) 13.36 (12.00, 14.99) 16.35 (15.00, 17.98) 19.25 (18.00, 22.25) 98 39 30 33 11 211 46.45% 18.48% 14.22% 15.64% 5.21% 100.00% 49.66% 50.34% 100.00% 105 106 211 3690 107 47 10 3854 95.74% 2.78% 1.22% 0.26% 100.00% 218 3636 3854 5.66% 94.34% 100.00% 2130 679 581 383 81 3854 55.27% 17.62% 15.08% 9.94% 2.10% 100.00% 6.58 (6.00, 9.00) 10.43 (9.01, 11.79) 13.52 (12.27, 14.85) 16.45 (15.03, 17.94) 20.18 (18.46, 22.27) 410 174 166 127 28 905 45.30% 19.23% 18.34% 14.03% 3.09% 100.00% 49.76% 50.24% 100.00% 435 470 905 48.07% 51.93% 100.00% 199 7 3 2 211 94.31% 3.32% 1.42% 0.95% 100.00% 796 84 15 10 905 87.96% 9.28% 1.66% 1.10% 100.00% 21 190 211 9.95% 90.05% 100.00% 126 779 905 13.92% 86.08% 100.00% 98 39 30 33 11 211 46.45% 18.48% 14.22% 15.64% 5.21% 100.00% 410 174 166 127 28 905 45.30% 19.23% 18.34% 14.03% 3.09% 100.00% 6.75 (6.00. 9.00) 10.47 (9.02, 11.91) 13.57 (12.01, 14.97) 16.34 (15.00, 17.96) 19.65 (18.01, 22.78) 0.6837 <0.0001 <0.0001 116.85 (86.02, 146.00) 136.46 (114.00, 162.00) 153.29 (132.99, 181.94) 164.10 (140.00, 190.00) 168.58 (152.00, 188.20) 118.27 (105.00, 137.00) 141.09 (125.00. 158.00) 157.75 (137.00, 186.00) 170.62 (152.00, 187.00) 169.55 (158.00, 185.00) 116.38 (100.00, 141.00) 137.52 (118.00, 163.00) 154.11 (130.00, 179.00) 165.57 (147.00, 188.00) 165.52 (143.93, 185.00) 0.077 0.0019 0.0226 0.0003 0.2312 192 Table 5.7. (continued) Mutation class I/II/III (little CFTR funtion) N Weight (kg) 6~8 yrs 9~11yrs 12~14yrs 15~18yrs >18yrs Total Smoking No Yes Not known/ declined to answer/ missing Total Transplant status No Had transplant Accepted, on waiting list Total Pregnancy No Yes Unknown Total % 2130 679 581 383 81 3854 55.27% 17.62% 15.08% 9.94% 2.10% 100.00% 3593 23 Mean (range) 21.90 (12.80, 50.50) 32.43 (19.90, 75.50) 44.45 (22.70, 84.56) 55.33 (30.80, 100.00) 59.77 (39.60, 78.30) Mutation class IV/V (residual CFTR function) N % 98 39 30 33 11 211 46.45% 18.48% 14.22% 15.64% 5.21% 100.00% 93.23% 0.60% 200 2 238 6.18% 3854 Mean (range) 23.54 (16.00, 41.50) 39.28 (25.40, 74.70) 53.57 (34.40, 80. 40) 64.22 (45.35, 105.00) 68.53 (51.90, 110.00) Unsure CFTR function N % 410 174 166 127 28 905 45.30% 19.23% 18.34% 14.03% 3.09% 100.00% 94.79% 0.95% 840 5 92.82% 0.55% 9 4.27% 60 6.63% 100.00% 211 100.00% 905 100.00% 3840 9 99.64% 0.23% 210 0 99.53% 0.00% 904 0 99.89% 0.00% 5 0.13% 1 0.47% 1 0.11% 3854 100.00% 211 100.00% 905 100.00% 3844 4 6 3854 99.74% 0.10% 0.16% 100.00% 208 1 2 211 98.58% 0.47% 0.95% 100.00% 897 1 7 905 99.12% 0.11% 0.77% 100.00% Mean (range) 21.99 (12.30, 45.00) 34.75 (20.50, 100.00) 46.45 (26.20, 83.89) 56.80 (34.00, 102.00) 58.71 (43.50, 91.20) Chisq/ ANOVA 0.0008 <0.0001 <0.0001 <0.0001 0.0222 0.6408 0.3149 0.0033 193 Table 5.7. (continued) Mutation class I/II/III (little CFTR funtion) N Comorbidities CFRD Gastrointestinal symptoms DIOS GERD Pancreatitis Pulmonary ABPA % Mean (range) Mutation class IV/V (residual CFTR function) N % Mean (range) Unsure CFTR function N % Mean (range) Chisq/ ANOVA 156 4.05% 2 0.95% 34 3.76% 0.046 59 550 5 1.53% 14.27% 0.13% 2 21 9 0.95% 9.95% 4.27% 9 101 1 0.99% 11.16% 0.11% 0.4564 0.0145 <0.0001 85 2.21% 7 3.32% 26 2.87% 0.3239 194 Table 5.8. Baseline clinical information by mutation class Mutation class I/II/III % Mean (range) N FEV1% >70% 40~70% 10~40% missing Total # of Pex 2845 358 41 610 3854 73.82% 9.29% 1.06% 15.83% 100.00% 0 1 2 3 4 5+ 3014 614 149 52 14 11 3854 0 1 2 3 4 5 6 7 8+ 102.83 (70.01, 258.71) 59.12 (40.02, 69.93) 32.65 (18.50, 39.90) Mutation class IV/V % Mean (range) N 158 15 1 37 211 74.88% 7.11% 0.47% 17.54% 100.00% 78.20% 15.93% 3.87% 1.35% 0.36% 0.29% 100.00% 178 24 7 0 1 1 211 2196 943 368 192 84 45 14 4 8 3854 56.98% 24.47% 9.55% 4.98% 2.18% 1.17% 0.36% 0.10% 0.21% 100.00% 60 26 32 1.56% 0.67% 0.83% 107.14 (70.65, 187.82) 59.71 (45.85 69.46) 36.20 Unsure N % 653 90 13 149 905 72.15% 9.94% 1.44% 16.46% 100.00% 84.36% 11.37% 3.32% 0.00% 0.47% 0.47% 100.00% 702 139 43 14 4 3 905 77.57% 15.36% 4.75% 1.55% 0.44% 0.33% 100.00% 133 35 24 10 3 5 0 1 0 211 63.03% 16.59% 11.37% 4.74% 1.42% 2.37% 0.00% 0.47% 0.00% 100.00% 531 213 90 35 13 14 5 1 3 905 58.67% 23.54% 9.94% 3.87% 1.44% 1.55% 0.55% 0.11% 0.33% 100.00% 3 1 2 1.42% 0.47% 0.95% 15 6 7 1.66% 0.66% 0.77% Chisq/ ANOVA 0.7081 101.92 (70.07, 189.05) 0.0095 58.78 (40.16, 69.89) 0.901 32.34 (22.51, 39.21) 0.8609 Mean (range) 0.5288 Total # of Pex (loose) 0.2449 Total Drug resistance Aminoglycoside Beta-lactum Quinolone 0.8494 0.9358 0.8494 195 Table 5.9. Baseline demographic characteristics by initial treatment classes, ML, AI, IA No initial tx N Age 6~8 yrs 9~11yrs 12~14yrs 15~17yrs >=18yrs Total Gender Male Female Total Race Caucasian Black Asian Other Total Ethnicity Hispanic Non-hispanic Total Height (cm) 6~8 yrs 9~11yrs 12~14yrs 15~18yrs >18yrs Total Weight (kg) 6~8 yrs 9~11yrs 12~14yrs 15~18yrs >18yrs Total % Mean (range) One class N % Mean (range) Two classes N % Mean (range) Three classes N % Mean (range) Chisq/ ANOVA <0.0001 934 460 389 298 25 2106 44.35% 21.84% 18.47% 14.15% 1.19% 100.00% 993 235 205 109 39 1581 62.81% 14.86% 12.97% 6.89% 2.47% 100.00% 556 154 137 88 42 977 56.91% 15.76% 14.02% 9.01% 4.30% 100.00% 155 43 46 48 14 306 50.65% 14.05% 15.03% 15.69% 4.58% 100.00% 1003 1103 2106 47.63% 52.37% 100.00% 782 799 1581 49.46% 50.54% 100.00% 514 463 977 52.61% 47.39% 100.00% 155 151 306 50.65% 49.35% 100.00% 1991 84 26 5 2106 94.54% 3.99% 1.23% 0.24% 100.00% 1483 72 20 6 1581 93.80% 4.55% 1.27% 0.38% 100.00% 929 30 10 8 977 95.09% 3.07% 1.02% 0.82% 100.00% 282 12 9 3 306 92.16% 3.92% 2.94% 0.98% 100.00% 114 1992 2106 5.41% 94.59% 100.00% 104 1477 1581 6.58% 93.42% 100.00% 104 873 977 10.64% 89.36% 100.00% 43 263 306 14.05% 85.95% 100.00% 934 460 389 298 25 2106 44.35% 21.84% 18.47% 14.15% 1.19% 100.00% 118.62 (86.02, 142.00) 137.04 (114.00, 163.00) 154.46 (133.00, 186.00) 164.67 (140.00, 190.00) 171.49 (157.00, 186.00) 993 235 205 109 39 1581 62.81% 14.86% 12.97% 6.89% 2.47% 100.00% 116.02 (91.10, 146.00) 137.05 (118.00, 162.00) 153.22 (132.99, 176.00) 166.69 (147.00, 186.00) 167.44 (149.00, 188.20) 556 154 137 88 42 977 56.91% 15.76% 14.02% 9.01% 4.30% 100.00% 115.56 (100.00, 138.00) 136.52 (118.00, 162.00) 152.52 (130.00, 178.00) 164.44 (145.00, 188.00) 167.16 (143.93, 188.00) 155 43 46 48 14 306 50.65% 14.05% 15.03% 15.69% 4.58% 100.00% 115.73 (100.00, 143.00) 135.40 (115.00, 157.00) 151.94 (133.00, 169.00) 162.47 (145.00, 189.00) 165.43 (152.00, 177.00) <0.0001 0.5991 0.0588 0.0473 0.1243 934 460 389 298 25 2106 44.35% 21.84% 18.47% 14.15% 1.19% 100.00% 22.79 (13.32, 48.70) 33.53 (20.50, 100.00) 46.17 (25.50, 84.56) 56.36 (30.80, 105.00) 62.93 (43.50, 91.20) 993 235 205 109 39 1581 62.81% 14.86% 12.97% 6.89% 2.47% 100.00% 21.64 (12.80, 50.50) 32.98 (21.10, 63.70) 44.40 (26.23, 83.89) 58.33 (38.10, 97.10) 61.48 (42.80, 110.00) 556 154 137 88 42 977 56.91% 15.76% 14.02% 9.01% 4.30% 100.00% 21.29 (13.40, 40.40) 32.87 (21.70, 60.20) 44.40 (27.00, 74.90) 54.70 (34.00, 102.00) 60.16 (44.81, 87.50) 155 43 46 48 14 306 50.65% 14.05% 15.03% 15.69% 4.58% 100.00% 21.65 (12.30, 36.10) 31.62 (19.90, 52.10) 43.50 (22.70, 66.70) 53.29 (36.64, 100.00) 52.98 (39.60, 69.00) <0.0001 0.4294 0.0652 0.0242 0.0305 0.076 0.0372 <0.0001 196 Table 5.9. (continued) No initial tx N Smoking No Yes Not known/ declined to answer/ missing Total Transplant status No Had transplant Accepted, on waiting list Total Pregnancy No Yes Unknown Total Comorbidities CFRD Pancreatic insufficiency Gastrointestinal symptoms DIOS GERD Pancreatitis Pulmonary ABPA % Mean (range) One class N % Mean (range) Two classes N % Mean (range) Three classes N % Mean (range) Chisq/ ANOVA 0.2035 1967 19 93.40% 0.90% 1479 6 93.55% 0.38% 905 3 92.63% 0.31% 282 2 92.16% 0.65% 120 5.70% 96 6.07% 69 7.06% 22 7.19% 2106 100.00% 1581 100.00% 977 100.00% 306 100.00% 2098 5 99.62% 0.24% 1578 3 99.81% 0.19% 974 1 99.69% 0.10% 304 0 99.35% 0.00% 0.1711 3 0.14% 0 0.00% 2 0.20% 2 0.65% 2106 100.00% 1581 100.00% 977 100.00% 306 100.00% 2102 3 1 2106 99.81% 0.14% 0.05% 100.00% 1572 2 7 1581 99.43% 0.13% 0.44% 100.00% 970 1 6 977 99.28% 0.10% 0.61% 100.00% 305 0 1 306 99.67% 0.00% 0.33% 100.00% 0.0613 89 4.23% 41 2.59% 37 3.79% 25 8.17% <0.0001 1958 92.97% 1491 94.31% 914 93.55% 286 93.46% 0.4459 30 189 9 1.42% 8.97% 0.43% 21 258 5 1.33% 16.32% 0.32% 16 168 1 1.64% 17.20% 0.10% 3 57 0 0.98% 18.63% 0.00% 0.8695 <0.0001 0.4681 45 2.14% 30 1.90% 29 2.97% 14 4.58% 0.019 197 Table 5.10. Baseline clinical information by initial treatment classes, ML, AI, IA N FEV1% >70% 40~70% 10~40% missing Total # of Pex % No initial tx Mean (range) 1606 195 19 286 2106 76.26% 9.26% 0.90% 13.58% 100.00% 0 1 2 3 4 5+ 1701 286 82 24 9 4 2106 0 1 2 3 4 5 6 7 8+ 100.83 (70.04, 181.51) 59.15 (40.02, 69.89) 31.91 (18.58, 39.21) N % One class Mean (range) 1207 100 6 268 1581 76.34% 6.33% 0.38% 16.95% 100.00% 80.77% 13.58% 3.89% 1.14% 0.43% 0.19% 100.00% 1307 220 38 13 2 1 1581 1452 441 139 49 15 6 3 1 0 2106 68.95% 20.94% 6.60% 2.33% 0.71% 0.28% 0.14% 0.05% 0.00% 100.00% 22 10 9 1.04% 0.47% 0.43% 105.83 (70.21, 195.42) 61.20 (40.32, 69.91) 34.87 (26.67, 39.72) N % Two classes Mean (range) 652 124 18 183 977 66.73% 12.69% 1.84% 18.73% 100.00% 82.67% 13.92% 2.40% 0.82% 0.13% 0.06% 100.00% 702 193 55 18 6 3 977 830 425 175 78 34 28 7 1 3 1581 52.50% 26.88% 11.07% 4.93% 2.15% 1.77% 0.44% 0.06% 0.19% 100.00% 20 8 15 1.27% 0.51% 0.95% 102.76 (70.01, 258.71) 58.61 (40.37, 69.93) 34.23 (21.47, 39.90) Three classes % Mean (range) N 191 44 12 59 306 62.42% 14.38% 3.92% 19.28% 100.00% 71.85% 19.75% 5.63% 1.84% 0.61% 0.31% 100.00% 184 78 24 11 2 7 306 60.13% 25.49% 7.84% 3.59% 0.65% 2.29% 100.00% 456 242 134 75 35 20 8 2 5 977 46.67% 24.77% 13.72% 7.68% 3.58% 2.05% 0.82% 0.20% 0.51% 100.00% 122 83 34 35 16 10 1 2 3 306 39.87% 27.12% 11.11% 11.44% 5.23% 3.27% 0.33% 0.65% 0.98% 100.00% 21 10 9 2.15% 1.02% 0.92% 15 5 8 4.90% 1.63% 2.61% Chisq/ ANOVA <0.0001 101.33 (70.07, 158.98) <0.0001 55.22 (40.11, 69.36) 0.001 30.32 (18.50, 39.89) 0.3472 <0.0001 Total # of Pex (loose) <0.0001 Total Drug resistance Aminoglycoside Beta-lactum Quinolone <0.0001 0.0134 0.0001 198 Table 5.11. Prevalence and incidence of each reason for death by calendar year 2006 Prevalence 2007 Incidence Prevalence 2008 Incidence Prevalence 2009 Incidence Prevalence 2010 Incidence Prevalence 2011 Incidence Prevalence Incidence Respiratory/cardiorespiratory 7.62 7.64 8.37 8.40 8.38 8.41 9.02 9.06 8.14 8.17 8.42 8.46 Liver disease/liver failure 0.33 0.33 0.27 0.27 0.29 0.29 0.29 0.29 0.31 0.31 0.34 0.34 Trauma 0.15 0.15 0.06 0.06 0.09 0.09 0.14 0.14 0.06 0.06 0.14 0.14 Suicide Transplant related: Bronchiolitis obliterans 0.09 0.09 0.09 0.09 0.00 0.00 0.11 0.11 0.06 0.06 0.06 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.40 0.54 0.54 Transplant related: Other 1.40 1.40 1.50 1.50 1.66 1.66 1.64 1.64 1.11 1.11 1.42 1.42 Other 0.93 0.93 0.97 0.97 1.13 1.14 1.06 1.06 0.71 0.71 0.85 0.85 Unknown 0.51 0.51 0.47 0.47 0.96 0.96 0.78 0.78 1.05 1.05 0.82 0.82 199 200 Figure 5.2. Flow chart 201 Figure 5.3. The proportion of patients who were on different treatment combinations at the baseline. 202 Table 5.12. The proportion of patients who were on specific treatment combinations at the baseline. 203 Figure 5.4. Proportion of switching to other potential treatment combinations among all the 1st time switching. Figure 5.5. Length of using current treatment conditional on potential treatment combination that could switch to among all the 1st 204 time switching. 205 Figure 5.6. Proportion of switching to other potential treatment combinations among all the switching. Figure 5.7. Length of using current treatment conditional on potential treatment combination that could switch to among all the 206 switching. CHAPTER 6 PREDICTIVE MODEL 6.1 Results To fulfill Aim 2, a complex strategy of missing value imputation was conducted, which was explained in detail in Appendix F. This section described the results using the 10 imputed datasets from three parts. The first part described how each independent variable was identified. Then, the result of variable selection using elastic net was described in four procedures: 1) identifying the optimal balance factor, α (by investigating the probability of specific α that was chosen among 10 imputed datasets according to the minimum of mean cross-validated error); 2) identifying the optimal penalty factor, λ (by investigating the minimum standard deviation of lambda given α among 10 imputed datasets and the probability that α had been chosen in step 1); 3) selecting the variables in the model (by investigating the proportion of a variable that had been selected given α and λ that had been chosen in the previous steps among 10 imputed datasets); 4) calculating the coefficient for each variable (by combining the related coefficients that were identified among 10 imputed datasets). After calculating the overall coefficient for each variable among 10 imputed datasets, the predicted probability and relative change of predicted probability of having rational treatment change were imputed for each visit. Last, given the above predicted probability, relative change of predicted 208 probability of having rational treatment change, together with different thresholds, 25 varied timing strategies for treatment change were created. As mentioned previously, bronchodilator (BD) use was not considered as a treatment class. At the same time, from clinical experience, the neutral definition has limited generalizability, which assumed the termination of any treatment class had to match changes in clinical signals-a sign that a patient's health is improving. Therefore, three treatment classes, inhaled antibiotics (IA), mycolytics (ML), and antiinflammatories (AI), were taken into consideration to define the rational treatment change under either the loose or strict assumptions. Compared to the loose assumption, in which all treatment changes were treated as rational treatment changes regardless of the changes in clinical signals, the strict definition is more rigorous in that all rational treatment changes had to comply with changes in the clinical signals. Because of the difference in identifying rational treatment change, all following results, such as predicting the probability of having a rational treatment change, and identifying optimal treatment change strategies, would be different. It could be taken as conducting two models with the same procedures. To simplify the presentation, in the following section, the main focus is on the results of applying the strict definition. The result of applying the loose definition is also described with explanation. 6.1.1 Independent Variable Identification Independent variables were identified as all the variables that were investigated in the related literature. At the same time, all the unique variables that were recorded in the CFFPR were taken into consideration. Since no variable had more than 50% missing 209 values for the majority of patients, all variables in the CFFPR were considered for variable selection. Cubic spline of time of visit with three knots was also created for each visit. 6.1.2 Variable Selection by Elastic Net Following the procedures that were mentioned in the section on methods, an elastic net was conducted to investigate the model that balanced the accuracy of prediction and the parsimoniousness of the variables. First, the optimal 𝛼𝛼 was identified. Table 6.1 presents the minimum of the mean cross-validated error using deviance as the measurement in each imputed dataset given different 𝛼𝛼 . The minimum deviance, regardless of the value of 𝛼𝛼 , was identified in each imputed dataset, and marked as yellow. In Table 6.1, other than two imputed datasets where the minimum deviance could be reached when 𝛼𝛼 equaled 0.9 or 1, the minimum deviance is only reached when 𝛼𝛼 is equal to 1. The difference in minimum deviance between imputed datasets was small. According to the result, optimal 𝛼𝛼 is equal to either 0.9 or 1. Unlike Table 6.1, which uses the strict definition to capture the rational treatment change, the rational treatment change in Table 6.2 is captured by using the loose definition. To identify the unique optimal 𝛼𝛼 and prevent overfitting, rather than using 𝜆𝜆∗𝑖𝑖 , which was associated with the minimum of mean cross-validated error, 𝜆𝜆′𝑖𝑖 , which gave the most regularized model such that error is within one standard error of the minimum of mean cross-validated error given 𝛼𝛼, was identified among each imputed dataset. The related 𝜆𝜆′𝚤𝚤 was marked in yellow in Table 6.3 for each imputed dataset given different optimal 𝛼𝛼 . The 𝛼𝛼 that was associated with the minimum SD of 𝜆𝜆′𝚤𝚤 should be the optimal one. Compared with 𝛼𝛼 = 0.9, 210 𝛼𝛼 = 1 was associated with a smaller SD (0.000100 vs. 0.000104) of 𝜆𝜆′𝑖𝑖 across the imputed datasets. This indicated less variance of 𝜆𝜆̂, and less chance of overfitting among imputed datasets. Even though when 𝛼𝛼 = 0.8, the SD of 𝜆𝜆′𝚤𝚤 across the imputed datasets was smaller (0.000080), considering that the optimal 𝛼𝛼 could only equal to 0.9 or 1 as identified previously, the result of 0.8 was ignored. Therefore, the optimal 𝛼𝛼 was 1, and the related optimal 𝜆𝜆̂ was 0.002009, the median of 𝜆𝜆′𝚤𝚤 when 𝛼𝛼 = 1. Unlike the strict definition, the optimal 𝛼𝛼 was identified in a more straightforward manner using the loose definition. As shown in Table 6.2, using the loose definition, 𝛼𝛼 = 1 is always associated with the minimum deviance in all imputed datasets. Even though the minimum SD of 𝜆𝜆′𝚤𝚤 was not reached when 𝛼𝛼 = 1, the difference was trivial (4.336809E-19 vs. 0) in Table 6.4. Therefore, the optimal 𝛼𝛼 = 1, and the related optimal 𝜆𝜆̂ was 0.002223, the 𝜆𝜆′𝚤𝚤 in the first imputed dataset when 𝛼𝛼 = 1. Table 6.5 presents the choices of optimal 𝛼𝛼 and 𝜆𝜆̂ combinations for different outcomes. The only difference between the left and right column is the consideration of the number of PEx as either a categorical variable or a continuous variable in the prediction model. The probability indicated the chance of identifying the minimum deviance by using the related 𝛼𝛼 among imputed datasets. The resource indicated from where the optimal 𝜆𝜆̂ was identified, it could be either from the imputed dataset 1 or the median of 𝜆𝜆′𝚤𝚤 . The optimal 𝜆𝜆̂ would only be identified from the imputed dataset 1, if the related SD of 𝜆𝜆′𝚤𝚤 was close to 0, otherwise the median of 𝜆𝜆′𝚤𝚤 would be applied. Regardless of the outcome, it was more likely that the optimal 𝛼𝛼 = 1. Because of this, in the following analyses, optimal 𝛼𝛼 was always set up as 1. The related optimal 𝜆𝜆̂ was identified according to the outcome in each prediction model, which was marked in yellow in Table 6.5. 211 After identifying the optimal combination of 𝛼𝛼 and 𝜆𝜆̂, the proportion of a variable that had been selected in each model among 10 imputed datasets was reported in Table 6.6. Unlike continuous variables, which reported the proportion of being selected directly, the categorical variables reported the highest proportion in one of its category in Table 6.7. The influence of including PEx defined loosely (‘PExloose') was investigated. The left side of Tables 6.6 and 6.7 does not include PExloose while the right side does include PExloose in the model. Similar to Table 6.5, the number of PEx in the past year, as either a categorical or continuous variable, was assigned in different models to investigate its influence on variable selection. The influence of variable selection according to different definitions of outcomes were also investigated. The result of the influence of including PExloose and whether the number of PEx in the past year should be treated as a categorical variable will be analyzed in the following two paragraphs. Red font indicates that there was a difference in the proportion of a variable that had been selected between the models that did and did not include PExloose. However, this difference does not affect the result of the variable selection as long as the number was larger than 0, since all variables that were larger than 0 were selected into the prediction model. For example, under the loose definition of having a rational treatment change, if the number of PEx was treated as a continuous variable, compared to the model that did not include PExloose, the chance of selecting predicted FEV1 in the previous visit as a predictor for the prediction model decreased from 0.8 to 0.6 in the model that included PExloose. In other words, the predicted FEV1 in the previous visit was selected as the predictor in 6 imputed datasets if PExloose was considered and the outcome was defined loosely. Other than predicted FEV1 in the 212 previous visit, after considering PExloose, whether the patient was infected by Aspergillus was the only variable that was associated with less chance of being selected. Conversely, whether the patient had pancreatic insufficiency, pancreatitis, nonmucoid PaPI, or drug resistance to beta lactams in the previous visit were variables that had a higher chance of being selected after taking the number of PExloose into consideration. However, none of the above changes in proportion affected the variable selection in related models. There was only one variable which affected the pattern of selected variables after considering PExloose. This is marked in blue in Table 6.6. If the rational treatment change was defined loosely or neutrally, after considering PExloose, and treating the number of PEx and PExloose as continuous variables, then the chance of being selected for the variable that indicated drug resistance to quinolone in the current visit decreased from 0.5 to 0. Considering the similarity between the variable of the number of PEx and PExloose, together with the limited difference in the pattern of selected variables, there was limited difference between including and not including the variable of number of PExloose. Therefore, the final model did not include the number of PExloose. Compared to the model that treated the number of PEx as a continuous variable, the pattern of the selected variable is different for three variables if the number of PEx was treated as a categorical variable. Whether the patient had hemoptysis, whether the patient was infected by aspergillus, and the number of PEx in the past year at the previous visit are those three variables; they are marked in orange in Table 6.6. These variables could not be selected as predictors if the number of PEx was treated as a continuous variable. Table 6.7 presents more details on several categorical variables. 213 Mutation 1 class was never selected in any model. However, mutation 2 class was selected in the majority of categories, even for the one that ‘doesn't belong to any class.' Three and four numbers were selected for the number of PEx in the past year in previous visit and current visit, respectively. Those numbers were 4, 5, 9 for the previous visit, and 1, 2, 3, 5 for the current visit, which supported the conclusion that the number of PEx barely has any influence when it is beyond 5. At the same time, the variable selection, more specifically, category selections for the same variable, were consistent among the models regardless of including the number of PExloose or not. Considering the ceiling effect was reached when number of PEx reached 5, together with the variable of number of PEx in the past year at the previous visit, the clinical signal would not be included if the number of PEx was treated as a continuous variable; thus there is no doubt that the number of PEx in the past year should be treated a categorical variable with the maximum of number of PEx in the past year set as 5. Given the 𝛼𝛼 and 𝜆𝜆̂ , together with 𝑆𝑆 , which included all variables that were selected by elastic net, the generalized linear model with log link function was applied to predict the probability of having a rational treatment change in each imputed dataset 𝑖𝑖. Following "Rubin's rule," the coefficients for the same variable, 𝛽𝛽̂𝑠𝑠𝑠𝑠 , among all imputed datasets were combined as 𝛽𝛽̂𝑠𝑠 . Therefore, a combined coefficient for each variable was identified, rather than having 10 coefficients for the same variable in each imputed dataset. Tables 6.8 and 6.9 represent the combined coefficients, 𝛽𝛽̂𝑠𝑠 , under strict and loose definitions, respectively. Each table includes four parts: coefficient, standard error, 95% CI, and proportion of the total variance that is attributable to the missingness (‘percentage of missing'). According to the results shown in Table 6.8, for each additional year of age, 214 the chance of receiving a rational treatment change decreased 0.0302. For each additional 1% of predicted FEV1 in the current visit, the probability of having a rational treatment change decreased 0.0285. However, if the 1% increase occurred at a previous visit, then the probability increased 0.0170. Compared to class I on mutation 2, a patient was more likely to receive a rational treatment change if the class was II or III, but the influence was not statistically significant. Conversely, a patient who had class IV or V on mutation 2 had only 70.35% and 62.43% chance of having a rational treatment change, respectively. Those patients identified as Asian were more likely (1.0901) to receive a rational treatment change than those identified as Caucasian, which was not statistically significant. Compared to patients identified as Caucasian, patients identified as Black were 16.52% less likely to receive a rational treatment change. Patients who were infected by Aspergillus and B.cepacia were more likely to receive a rational treatment change with 13.05% and 105.41% increases, respectively. However, a patient who was infected by MSSA had an 18.88% lower chance of being appropriately treated. If the patient was diagnosed with nonmucoid PaPI or mucoid PaPI, the probability of receiving a rational treatment change increased by 36.30% and 70.78%, respectively. The more PEx that a patient had in the past year at the previous visit, the less likely the patient would receive a rational treatment change. However, if the number increased in the current visit, then the likelihood increased. A patient with drug resistance to aminoglycosides or quinolones in the current visit would have an increased chance of receiving a rational treatment change, but the chance would decrease if drug resistance to beta lactams occurred in the previous visit. Generally speaking, the more treatment that a patient received in the current visit, the less likely the patient would receive a rational 215 treatment change, especially for mucolytics and inhaled antibiotics. There is limited difference in the chance of having a rational treatment change between patients who received 1 or 2 treatments in the anti-inflammatories class or the bronchodilators class. Finally, the proportion of the total variance that is attributable to the missingness was low, ranging from 0-6% in this model. Compared to the strict model, there were several similarities and differences in considering the loose model. Several variables in Table 6.9 share influences on the probabilities of having rational treatment changes as the related ones in Table 6.8, such as age; predicted FEV1 in the current visit; mutation 2 when class is IV or V; being Black; whether the patient was infected by Aspergillus, B.cepacia, or MSSA; whether the patient was diagnosed with mucoid PaPI; the number of PEx in the past year as recorded at the previous and current visit; and drug resistance to quinolones at the current visit. However, there were several differences between the two models. First, several variables were only selected in one model but not the other, especially in the comorbidities and infections category. For example, whether the patient had pancreatic insufficiency, pancreatitis, or was diagnosed with nonmucoid PaPI was included in the model that used a strict definition of outcome; whether the patient had hemoptysis was only included in the model that loosely defined outcome. Moreover, several variables had statistically significant effects in one model but not the other, such as predicted FEV1 at the previous visit, when the number of PEx in the past year at the previous visit was greater than 2; drug resistance to aminoglycosides in the current visit; and the use of mucolytics, inhaled antibiotics, anti-inflammatories, and bronchodilators had statistically significant effects in the strict model but not the loose model. Similar to the results in Table 6.8, the proportion 216 of the total variance that is attributable to the missingness is also low, ranging from 0 to 5%. 6.1.3 Calculating the Predicted Probability of Having Rational Treatment Change and Identifying Strategies for Treatment Change According to Different Thresholds In order to closely mimic the strategy of having a rational treatment change, the predicted probability of having a rational treatment change, 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 , and the relative change of predicted probability of having a rational treatment change between the current and previous visit, 𝑟𝑟𝑟𝑟 𝑡𝑡𝑡𝑡,𝑖𝑖 , for all visits in each imputed dataset 𝑖𝑖 were calculated. The 𝑝𝑝∗ , and 𝑡𝑡𝑡𝑡,𝑖𝑖 , 𝑝𝑝∗∗ , left corner of the ROC curve, was chosen as the cut-off for 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 and 𝑟𝑟𝑟𝑟 respectively, in all imputed datasets. After 1000 times nonparametric bootstrapping, confidence intervals were generated for 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 and 𝑟𝑟𝑟𝑟 𝑡𝑡𝑡𝑡,𝑖𝑖 , respectively. Figures 6.1 and 6.2 present those results in one imputed dataset (imputed1) using the strictly defined outcome and the loosely defined outcome, respectively. In the strict model, the point estimate and 95% CI for 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 and 𝑟𝑟𝑟𝑟 𝑡𝑡𝑡𝑡,𝑖𝑖 were 0.080 (0.076, 0.084) and 1.831% (0.222, 3.440). In the loose model, the values changed to 0.090 (0.087, 0.093) and 0.475% (-0.124%, 1.074%) 𝑡𝑡𝑡𝑡,𝑖𝑖 , respectively. Compared to 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 , the 1000 times bootstrapping of 𝑟𝑟𝑟𝑟 𝑡𝑡𝑡𝑡,𝑖𝑖 for 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 and 𝑟𝑟𝑟𝑟 was more normally distributed, which was supported by the normally distributed histogram and the location of the majority of the dots on the line in the quantile-quantile plot in Figure 6.1. Even though the distribution of 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 skewed to the left, according to the result in the quantile-quantile plot, this distribution is still acceptable. However, more potential cut-offs need to be investigated. Similar trends are shown in Figure 6.2. 217 The quintile of 95% CI of 𝑝𝑝∗∗ was used to generate cut-offs of 𝑝𝑝∗∗ , represented by ∗ 𝑝𝑝𝑛𝑛∗∗ (n = 1, 2, … , 5). In order to have a larger range of 𝑝𝑝𝑚𝑚 (m=1, 2, …, 5), the distance ∗ between the lower boundary of 95% CI of 𝑝𝑝∗ and 𝑝𝑝∗ was applied to calculate the 𝑝𝑝𝑚𝑚 , and 𝑝𝑝∗ was set as 𝑝𝑝3∗ . Therefore, there were five cutoffs for both 𝑝𝑝∗ , and 𝑝𝑝∗∗ , which were 0.072, 0.076, 0.080, 0.084, 0.088 and 0.222%, 1.0265%, 1.831% , 2.6355%, 3.440%, respectively. In the beginning, 5 dynamic treatment strategies were created relying only on the 5 cut-offs of 𝑝𝑝∗ . However, supported by the results shown in Figure 6.3, using 𝑝𝑝∗ alone failed to generate an appropriate strategy. Figure 6.3 represents the distribution of predicted probability of having a rational treatment change according to the treatment change that was actually given. The red and blue histograms represent patients who did and did not receive a rational treatment change, respectively. About 50% of the entire area overlaps. In other words, it was hard to clearly differentiate the behavior of receiving or not receiving a rational treatment change, unless 0.3 was used as the cut-off. However, if 0.3 was chosen as the cut-off for the dynamic treatment strategy, then the misclassification error is really high, since about 90% of visits would be identified as not having a rational treatment change, and around half of these conflict with reality. After digging into the data and consulting clinical experts, I included the relative change of the predicted probability of having a rational treatment change between the previous and current visits, and added a grace period (0.01) for predicted probability. From the clinical perspective, the rationale of including relative change was that: 1) there was a delay between being sick and having a visit, which delayed the occurrence of a rational treatment change; 2) preventing the potential disease deterioration after current 218 visit, even the current disease severity is controlled by the treatments. For the first scenario, some patients do not want to take any new medications when they are sick, but a couple of weeks later, they gave up and finally had a visit and received treatment. Patients in the second scenario are worried about their future health status, since it dramatically deteriorates compared with the health status in previous visit; even the current health status is controlled by the treatments. Therefore, those patients may discuss with their healthcare providers and have a rational treatment change. At the same time, the grace period of predicted probability gives some flexibilities to the strategy when the evidence of whether a treatment change is needed is uncertain given current knowledge. Overall, compared with the strategy that did not take 𝑝𝑝∗∗ into consideration, after including the threshold of 𝑝𝑝∗∗ into the strategy, the main difference occurred at two scenarios: 1) a patient does not necessarily have a rational treatment change, if the 𝑝𝑝∗ was higher than the threshold, but 𝑝𝑝∗∗ was lower than related threshold; 2) a patient either has or does not have a rational treatment change was both acceptable, if the 𝑝𝑝∗ was lower than the threshold, but 𝑝𝑝∗∗ was higher than related threshold. From the data perspective, it looks like the above assumptions were met. Figures 6.4 and 6.5 support the decision to include both 𝑝𝑝∗ and 𝑝𝑝∗∗ in generating the dynamic treatment strategy. Figure 6.4 presents the proportion of patients following each of the different treatment change strategies over time. Three strategies were investigated. At the 1st visit, around 80% patients still followed the related strategy. The proportion decreased to around 20% at the 5th visit, and to about 0% at the 23rd visit. Compared to the proportion in Figure 6.4, after including 𝑝𝑝∗∗ in the dynamic treatment strategy, the proportion has a huge increase among all visits in Figure 6.5. The proportion increased to about 100% and 60% at the 1st and 5th visits, 219 respectively. Even though the proportion was around 3% by the 23rd visit, it was still high, around 10%, at the 22nd visit. Considering the clinical meaning, together with the huge improvements in the proportion of patients who were able to follow the strategy, 𝑝𝑝∗∗ should be included in the determination of a dynamic treatment strategy. Overall, there were 25 dynamic treatment strategies that consisted of different combinations of 𝑝𝑝∗ and 𝑝𝑝∗∗ . Using 𝑝𝑝∗ = 0.080, and 𝑝𝑝∗∗ = 1.831% as an example to illustrate how the dynamic treatment strategy works, if in the current visit, a patient has a predicted probability for a rational treatment change less than 0.080 and the relative change of the predicted probability for a rational treatment change between the previous and current visit is smaller than 1.831%, then according to the dynamic treatment strategy, counterfactually, the patient should not receive any rational treatment change. If the patient did receive a treatment change in the real world, the patient record was treated as artificially censored, because of the failure to follow the dynamic treatment strategy. Similarly, if the patient's predicted probability of having a rational treatment change is greater than 0.090 (0.080 + 0.010) and the relative change of the predicted probability of having a rational treatment change between the previous and current visit is greater than 1.831%, then the patient should receive a rational treatment change. If the patient did not, then the record for the visit was artificially censored. For all other scenarios, artificial censoring was not considered. The rationale behind this method is that a patient should have a rational treatment change if the health status has worsened, and should not receive a rational treatment change if the situation has improved. For uncertain health status, either improvement or deterioration, having a rational treatment change or not is acceptable. The effects of those 25 dynamic treatment strategies were investigated as part 220 of Aim 3. 6.2 Discussions The discussion section is organized in the following manner. The first part focuses on the discussion of the advantages and disadvantages of the complex strategy of imputing missing values. Then, the strengths of this aim are discussed from different perspectives: innovative methodologies are applied; all potential scenarios that would shorten the transition from research to real world practice are considered prudently; several hidden trends are consistent with our knowledge. After the strengths section, all limitations are also discussed. 6.2.1 Data Management of Missing Values The model that included preexisting lung function variables and did not include the indicator under the neutral assumption using the MCMC method was chosen to impute the missing delFEV1. Basically, this was determined by answering four questions: which method, MCMC or FCS, should be chosen; whether to include the indicator or not; whether to include preexisting lung function variables or not; and which assumption (strict, loose, or neutral) to choose. First, the MCMC method was chosen for MI. Both MCMC and FCS methods have their advantages and disadvantages. MCMC assumes that all variables in the imputation model have a joint multivariate normal distribution. Instead of assuming a joint distribution, FCS uses a separate conditional distribution for each imputed variable. Because of the unique assumption, FCS has more reliable estimates if the value of the 221 imputed variable follows a specific distribution, such as a binary outcome for a logistic model or a count variable for a poisson model. In simulation studies,164,165 the FCS has been shown to produce estimates that are comparable to the MCMC method if the distribution was appropriately specified. Unlike MCMC, which provides reliable estimates even if the assumption of multivariate normal distribution is violated, as long as the sample size is large enough,164,166 in FCS the chance of providing reliable estimates is small if any distribution of imputed variable is misspecified. Considering the chance of misspecified a distribution, MCMC is preferred. Moreover, the sample size of this cohort is large, and the fraction of missingness is low. Out of 79,724 visits, there were only 5,001 missing values for FEV1, which increases the chance of acquiring reliable estimates by using the MCMC method. Finally, the delFEV1, the change of FEV1 that was measured between the current and future visit, is a continuous variable, which fits the MCMC method better. The answer of whether to include the indicator is negative, which was supported by the following reasons. As shown in Figure F.2, Appendix F, after including the indicator, the imputation model was not converged, especially under the loose and strict assumptions. Under those scenarios, it was highly likely to have autocorrelation between the iterations of imputation (Figure F.4, Appendix F), which was indicated by a giant magnitude of the observed dependency of imputed delFEV1 across iterations. In other words, there was a strong correlation between the imputed values in adjacent imputed datasets. Things were even worse if using the differences of imputed delFEV1 between the model that included the indicator and the model that did not include the indicator as the standard for decision. The results matched the expectation that the differences only 222 existed in the missing value that occurred at an artificial visit (Figure F.7, Appendix F) rather than an existing visit (Figure F.8, Appendix F), since the indicator only marked the missing delFEV1 that was caused by creating an artificial current visit. However, when missing values occurred at artificial visits, the differences were not only huge in some models, but also had converse directions under different assumptions (Figure F.7, Appendix F). Even though, after including the indicator, the difference of FEV1 between forward and backward calculations in the same visit decreased (Table F.2, Appendix F), the inclusion of the indicator was still problematic. All results supported the inclusion of preexisting lung function variables. After including those variables, the range of imputed delFEV1 shrunk. Figures F.5 and F.6 in Appendix F show that the upper boundary of percentage increased after including those variables; the imputed delFEV1 was more likely to concentrate around 0. Similar results are also shown in Figures F.12 and F.13 in Appendix F. At the same time, in Table F.2 in Appendix F, after including the preexisting lung function variables, the differences in FEV1 between forward and backward calculation in the same visit decreased in all models. Last, the neutral assumption was chosen, as it was associated with better performance from several perspectives. Both Figures F.2 and F.4 in Appendix F show that the loose and strict assumptions were not reliable if the indicator was included in the model; the model lost convergence and revealed correlations among imputed values in adjacent imputed datasets. Even though the neutral assumption was not associated with a higher chance of having small values for imputed delFEV1, the distribution of imputed delFEV1 was consistent and normally distributed in all models. At the same time, it had 223 fewer differences of FEV1 between forward and backward calculations in the same visit in all models (Table F.2 in Appendix F). According to the above results and the rationale of this assumption, the neutral assumption was chosen for the MI. Ideally, this MI model should be applied on independent data to investigate its external validity. However, considering the sample size of the cohort, which included all CF patients that met the inclusion criteria in the U.S., the external validity of this model should be acceptable. Unlike the traditional method, which has fewer requirements for the missing pattern, casual inference has a strong request in terms of the missing pattern. Without appropriate adjustment of the missing variables, it could not only jeopardize the assumption of conditional exchangeability, but also bias the result by amplifying the inappropriate estimates through IPW. Therefore, a complex strategy was conducted to impute different types of missing values. According to the mechanism and rationales of the missing values, different methods under both the single imputation technique and model-based technique were applied. After applying a reasonable, comprehensive, and complex strategy to impute the missing values, the imputed datasets should be able to provide the same estimates as the ideal data. 6.2.2 Strengths of the Predictive Model From the methodology perspective, this objective was conducted innovatively. Innovations include the application of the multiple imputation method and the use of the elastic net method, conducting cross-validation and bootstrapping on patient level rather than by visits, and investigating the dynamic treatment strategy of rational treatment changes. As mentioned previously, the application of multiple imputation is able to 224 capture the fluctuation of lung function data. Rather than using the traditional method of selecting variables, the elastic net was applied in this study to balance the accuracy of prediction and the parsimonious of the model. As more than 60 variables had to be selected into the prediction model, the traditional stepwise regression would not only consume more time, but also cause biases unless all potential interactions and cubic splines were investigated. From a clinical perspective, aside from the accuracy of prediction, the parsimonious of a model is also crucial, since the physician may not have enough time to measure all patient's clinical variables comprehensively. To prevent random effects for the same patient among different visits, in both cross-validation and bootstrapping, all visits were clustered by patient, then related analyses were conducted by patients instead of visits. Last but not least, two variables were considered to organize the dynamic treatment strategy. There are limited publications on the topic of dynamic treatment regimes. Most of these publications investigated only treatment initiation issues. This study is the first to investigate the dynamic treatment strategy of having rational treatment change. Compared to the initiation question, this research question has more hurdles around investigating the causality of the rational changes involved in the dynamic treatment strategy, which will be discussed later. At the same time, both predicted probability for rational treatment change and relative change of predicted probability between current and previous visit were included to build the dynamic treatment strategy, which did not force patients with uncertain health status to either have or not have a rational treatment change. Two scenarios were included for patients with uncertain health status: 1) who had a worse health status (beyond the threshold of predicted probability) in current visit, but the health status barely deteriorated compared with previous visit; 2) 225 who had an acceptable health status (below the threshold of predicted probability) in current visit, but the health status dramatically deteriorated compared with previous visit (beyond the threshold of relative change of predicted probability between current and previous visit). Other than applying innovative methods, all potential scenarios that would shorten the transition from research to real-world practice were considered. Rather than fully trusting the multiple imputation, a complex imputation strategy was applied, which included both the single imputation techniques such as last observation carried forward, arithmetic mean, as well as model-based techniques such as multiple imputation using FCS or MCMC. Reformatting the cohort to have a routine visit quarterly is another good example. There were limited methods to appropriately investigate the treatment effects within visits occurring at an irregular frequency. To apply the most stable method, and considering the frequency for routine visits suggested by treatment guidelines and data in the cohort, the cohort was reformatted. Moreover, the definition of what constituted a rational treatment change was unclear. Rather than make an arbitrary decision, three different ways of defining rational treatment change, depending on the strictness of the relationship between a treatment change and the clinical signals for a treatment change, were investigated. Finally, using predicted probability alone to define a dynamic treatment strategy would eliminate many patients. After checking the distribution of the estimate through 1000 bootstraps, and consulting with clinical experts, together with the proportion of patients following each treatment change strategy over time, the relative change of predicted probability of having a rational treatment change between previous and current visit was included. 226 The path for selecting the appropriate model and variables to predict the probability of having a rational treatment change is fascinating. It also indicates several hidden trends. First of all, regardless of which outcome was applied, the elastic net was prone to choose a LASSO regression. Alternatively, there were limited strong correlations among predictors. At the same time, the optimal combination of 𝛼𝛼 and 𝜆𝜆̂ was the same for both the neutral definition and the loose definition (Table 6.5), which supported the similarity between those two outcomes from another perspective. In other words, CF patients would rarely terminate the use of one class treatment without improving health status. The number of PEx in the past year by the previous visit would not be selected when treating the number of PEx as a continuous variable; this is a strong sign that indicates that those variables should be treated as categorical variables rather than continuous variables. Several interesting effects were identified in the prediction model regardless of which outcome was applied. Those clinical signals, predicted FEV1, drug resistance, and number of treatments in the related treatment class, do consistently affect the probability of having a rational treatment change. If the same variable was chosen in both the current visit and the previous visit, then the effect of those two variables would be reversed. For example, the greater the number of PEx in the past year at the previous visit, the less likely the patient would be received a rational treatment change. However, the trend reversed when it considering the current visit. All drug resistances were chosen in the model, but the same type of drug resistance would only be chosen once, either in the previous visit or the current visit. The majority of the coefficients shared the same directions as common knowledge; however, several of them were conflicted. For example, patients who identified as Black had lower expected lung 227 functions compared to patients who identified as Caucasian, but their chance of receiving a rational treatment change was lower. Similarly, unlike the other two drug resistances, which had higher chances of receiving rational treatment changes, having drug resistance to beta lactams was associated with a lower chance of receiving a rational treatment change. It was hard to judge the performance of the prediction model between using the strictly defined outcome and the loosely defined outcome in the current stage. However, the strict model may be better according to the direction and significance of the coefficients for several variables such as treatment class, and number of PEx in the past year as recorded at a previous visit. 6.2.3 Limitations of the Predictive Model This analysis has several limitations. First, the reformatting may have biased the results. Even though several assumptions were investigated to fulfill the multiple imputation of those missing lung functions, and a complex imputation method was applied for the rest of the missing values, the real rationales of the patients for not having those visits are unknown, which may or may not be consistent with the assumptions made in the analysis. Therefore, the ideal situation would be to conduct the analysis again using original data. However, until a mature method that is able to handle time-dependent confounders in an irregular visit frequency is available, the current method of analysis is one of the best for the data that are available. Moreover, in terms of choosing the optimal 𝛼𝛼 for the elastic net, the difference of deviance given different 𝛼𝛼 was trivial, which indicated a limited difference between 𝛼𝛼s. However, the minimum SD of 𝜆𝜆′𝚤𝚤 may locate in the 𝛼𝛼 that was not chosen. Therefore, there is a small chance that the analysis failed to 228 identify the optimal combination of 𝛼𝛼 and 𝜆𝜆̂. At the same time, other than time variable, which included a cubic spline, the prediction model did not include interactions, squares, or cubic splines for the rest of predictors. It seems that the effect estimation would be biased without considering those adjustments. However, since the sample size was large, including multiple visits, this cohort belonged to big data. Therefore, those adjustments on variables were not required to identify a model with better performance. Furthermore, from physicians' perspective, there is limited need of including interactions, squares in this analysis; since those adjusted variables are complicated to explain to patients, the parsimonious model is preferred. Last, there were two issues associated with including two variables for the identification of the dynamic treatment strategy: whether or not other variables are needed and whether or not more cut-offs are needed. While after including the relative change of predicted probability the proportion of patients following each of the different treatment change strategies over time increased, to conclude that there are no other variables needed is an arbitrary decision. However, considering the limited time that each physician has when treating a patient and the complexity of a treatment strategy, using two variables to define a dynamic treatment strategy is acceptable. Whether more cut-offs are needed is a difficult question to answer in the current stage of research. However, given current available information, the number of cut-offs should be reasonable, since for the relative change of predicted probability, those five cut-offs cover 95% CI, and for predicted probability, those related cut-offs cover an even larger range. 6.3 Conclusions 229 Even though there are several limitations for this analysis, due to the application of the innovative methods and comprehensive considerations, the result of this analysis is reasonable, accurate, and stable. In summary, Aim 2 bridged the gap between Aim 1 and 3. All patients with irregular frequency of visit were reformatted into having a routine visit every quarter. At the same time, according to the mechanism of missing data, a complex strategy of missing value imputation was successfully applied, which generated 10 imputed datasets. Under the assistance of machine learning method (elastic net), the prediction model that balanced the accuracy and parsimoniousness was generated using the imputed datasets. With the support of ‘Rubin's rule', the coefficients of each independent variable were combined, and the predicted probability of having rational treatment change and relative change of predicted probability between previous and current visit were calculated accordingly. Given the different thresholds of predicted probability and relative change of predicted probability, 25 varied timing strategies for treatment change were created. The proportion of patients who followed any one of the strategies was high. In another word, in Aim 3, no matter which strategy was identified as the optimal one for treatment change, which is associated with the longest time to mucoid PaPI, it will not be difficult to embed into clinical practice, since the proportion of patients who followed any one of the 25 strategies was high. A patient will receive a rational treatment change on treatment class level, if and only if his predicted probability and relative change of predicted probability between previous and current visit was higher than the threshold of the strategy, and vice versa. However, none of the 25 strategies is perfect, since there is a grace period for the 230 predicted probability of having rational treatment change, within which the prescribing behavior of either having or not having a rational treatment change is acceptable. Models with different lengths of grace period had also investigated. The current grace period was chosen after balancing the proportion of patients who followed the strategy and the proportion of patients who had uncertainty on the treatment change. Currently, the uncertainty of the treatment change was caused by the low accuracy of differentiating the observed treatment change from no treatment change within a specific range of the values of predicted probability. However, after successfully identifying the optimal strategy and that well accepted by healthcare providers in clinical practice, the uncertainty range will be shrunk, which shortens the grace period. In other words, the more evidence we have, or the more physicians prescribed rationally following the strategy, the less uncertainty is left. Ideally, the strategy will be re-estimated using the latest cohort every couple of years. After several iterations, at the end, the grace period will disappear and an optimal strategy with a clear-cut threshold will be generated. With the identification of an optimal strategy, healthcare providers will be able to prescribe rationally without any uncertainty, supported by confirmed evidence, rather than guessing whether a treatment change is needed when the predicted probability locates within the grace period. At the same time, the value-based formulary can be designed on the treatment class level: adding treatment or switching treatment will only be reimbursed, if the timing of prescribing matches the threshold of dynamic treatment regime. In such a value-based formulary, patients' lung function will be optimized so as to avoid or delay the need for extremely expensive treatments, such as ivacaftor and ivacafotr/lumacaftor, unless the healthcare provider has already prescribed all the other treatments step by step (step therapy), and the scenario of 231 suboptimal treatment effect has already occurred (prior authorization). Therefore, the annual cost of the health plan for CF patients could be well maintained without sacrificing the healthcare utilization. Table 6.1. The minimum of mean cross-validated error using deviance as the measurement (strict definition) Alpha 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.513712 0.506486 0.506439 0.506416 0.506404 0.506395 0.506386 0.506380 0.506375 0.506370 0.506368 2 0.513815 0.506602 0.506565 0.506545 0.506534 0.506526 0.506521 0.506517 0.506514 0.506511 0.506510 3 0.513591 0.506285 0.506239 0.506218 0.506204 0.506195 0.506190 0.506189 0.506186 0.506184 0.506184 4 0.514084 0.506779 0.506739 0.506713 0.506702 0.506694 0.506686 0.506683 0.506681 0.506678 0.506678 Imputed dataset 5 6 0.513882 0.513225 0.506607 0.505879 0.506555 0.505823 0.506529 0.505802 0.506511 0.505789 0.506501 0.505780 0.506496 0.505772 0.506492 0.505767 0.506489 0.505763 0.506485 0.505761 0.506483 0.505758 7 0.513310 0.506029 0.505977 0.505957 0.505946 0.505941 0.505934 0.505939 0.505936 0.505934 0.505933 8 0.513633 0.506293 0.506249 0.506227 0.506215 0.506208 0.506200 0.506196 0.506193 0.506191 0.506189 9 0.513643 0.506276 0.506230 0.506213 0.506203 0.506195 0.506189 0.506185 0.506182 0.506179 0.506177 10 0.513989 0.506802 0.506757 0.506734 0.506723 0.506715 0.506709 0.506706 0.506694 0.506690 0.506688 232 Table 6.2. The minimum of mean cross-validated error using deviance as the measurement (loose definition) Alpha 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.576606 0.570516 0.570441 0.570402 0.570379 0.570372 0.570363 0.570359 0.570356 0.570351 0.570348 2 0.576576 0.570483 0.570408 0.570368 0.570349 0.570333 0.570330 0.570326 0.570323 0.570317 0.570316 3 0.576596 0.570498 0.570422 0.570381 0.570362 0.570346 0.570342 0.570338 0.570335 0.570330 0.570328 4 0.576515 0.570403 0.570327 0.570288 0.570264 0.570258 0.570249 0.570245 0.570242 0.570236 0.570234 Imputed dataset 5 6 0.576621 0.576628 0.570550 0.570521 0.570472 0.570442 0.570432 0.570401 0.570409 0.570378 0.570402 0.570370 0.570393 0.570361 0.570389 0.570358 0.570386 0.570353 0.570380 0.570347 0.570379 0.570345 7 0.576668 0.570582 0.570505 0.570465 0.570441 0.570433 0.570424 0.570417 0.570415 0.570410 0.570409 8 0.576594 0.570485 0.570407 0.570367 0.570347 0.570331 0.570327 0.570323 0.570320 0.570314 0.570312 9 0.576594 0.570470 0.570394 0.570354 0.570330 0.570323 0.570313 0.570307 0.570305 0.570299 0.570298 10 0.576557 0.570454 0.570378 0.570339 0.570315 0.570308 0.570299 0.570295 0.570292 0.570286 0.570283 233 Table 6.3. The lambda that is conditional on alpha (strict definition) One se lambda (alfa=0.8) Best lambda (alfa=0.8) One se lambda (alfa=0.9) Best lambda (alfa=0.9) One se lambda (alfa=1.0) Best lambda (alfa=1.0) 1 2 3 4 5 6 7 Imputed dataset 8 9 10 Mean SD Min Max Median 0.002508 0.002497 0.002516 0.002768 0.002509 0.002503 0.002507 0.002506 0.002508 0.002482 0.002530 0.000080 0.002482 0.002768 0.002507 0.000295 0.000244 0.000246 0.000270 0.000269 0.000268 0.000245 0.000269 0.000269 0.000266 0.000264 0.000015 0.000244 0.000295 0.000269 0.002447 0.002219 0.002236 0.002461 0.002230 0.002225 0.002228 0.002227 0.002446 0.002206 0.002293 0.000104 0.002206 0.002461 0.002229 0.000262 0.000217 0.000240 0.000240 0.000239 0.000239 0.000218 0.000239 0.000239 0.000237 0.000237 0.000012 0.000217 0.000262 0.000239 0.002202 0.001997 0.002012 0.002215 0.002203 0.002002 0.002005 0.002005 0.002202 0.001986 0.002083 0.000100 0.001986 0.002215 0.002009 0.000259 0.000214 0.000216 0.000216 0.000215 0.000215 0.000196 0.000215 0.000215 0.000234 0.000220 0.000016 0.000196 0.000259 0.000215 234 Table 6.4. The lambda that is conditional on alpha (loose definition) One se lambda (alfa=0.8) Best lambda (alfa=0.8) One se lambda (alfa=0.9) Best lambda (alfa=0.9) One se lambda (alfa=1.0) Best lambda (alfa=1.0) 1 2 3 4 5 6 7 Imputed dataset 8 9 10 Mean SD Min Max Median 0.002779 0.002779 0.002779 0.002779 0.002779 0.002779 0.002779 0.002779 0.002779 0.002779 0.002779 4.336809E-19 0.002779 0.002779 0.002779 0.000521 0.000521 0.000521 0.000521 0.000521 0.000521 0.000521 0.000521 0.000521 0.000521 0.000521 0.000000E+00 0.000521 0.000521 0.000521 0.002470 0.002470 0.002470 0.002470 0.002470 0.002470 0.002470 0.002470 0.002470 0.002470 0.002470 0.000000E+00 0.002470 0.002470 0.002470 0.000463 0.000463 0.000463 0.000463 0.000463 0.000463 0.000463 0.000463 0.000463 0.000463 0.000463 0.000000E+00 0.000463 0.000463 0.000463 0.002223 0.002223 0.002223 0.002223 0.002223 0.002223 0.002223 0.002223 0.002223 0.002223 0.002223 4.336809E-19 0.002223 0.002223 0.002223 0.000417 0.000417 0.000417 0.000417 0.000417 0.000417 0.000417 0.000417 0.000417 0.000417 0.000417 5.421011E-20 0.000417 0.000417 0.000417 235 Table 6.5. The choices of optimal lambda and alpha combination for varied outcomes Definitions Loose definition Neutral definition Strict definition No BD PEx was treated as categorical variable Alfa Probability Lambda Resource 0.8 / 0.9 / 1 1 0.8 / 0.9 / 1 1 0.002223 0.9 1 0.2 1 0.002200 0.002009 0.002223 imputed dataset 1 imputed dataset 1 median median PEx was treated as continuous variable Alfa Probability Lambda Resource imputed 0.8 0.2 0.003050 dataset 1 imputed 0.9 0.4 0.002976 dataset 1 imputed 1 0.4 0.002678 dataset 1 imputed 0.8 0.2 0.003050 dataset 1 imputed 0.9 0.4 0.002976 dataset 1 imputed 1 0.4 0.002678 dataset 1 0.9 / 1 1 0.002201 median 236 Table 6.6. The proportion of a variable that has been selected in each model among 10 imputed datasets Variables Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.9 0.8 0.9 0.8 1 1 0.9 0.6 0.9 0.6 1 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0.3 0.6 0 0 0 1 0 0 0.5 1 0 0 0 0 0 0 0.2 0.3 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0.3 0.9 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0.3 0.3 0 0 237 (Intercept) Age Predicted FEV1 in current visit Predicted FEV1 in previous visit Mutation 1 class Mutation 2 class Hispanic Gender Race Smoking Transplant F508 Arthropathy CFRD DIOS GERD Pancreatic insufficiency Pancreatitis TB Pneumothorax Not include BD (not included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 1 1 1 1 1 1 1 1 1 1 1 1 Table 6.6. (continued) Variables Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0 0.2 0 1 0.4 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.8 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 238 Hemoptysis Enzymes ABPA Aspergillus B. cepacia B. cenocepacia Burkholderia species Candida Mycobacterium gordonae MAI MRSA MSSA Other gram-negative microorganisms Serratia marcescens Staphylococcus aureus Stenotrophomonas /Maltophilia Non-mucoid Pa PI Unknown type of mucoid Pa PI Mucoid Pa PI Not include BD (not included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8 0 0.8 0 0.9 0.7 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 Table 6.6. (continued) Variables Number of PEx in the past year in previous visit Number of PEx in the past year in previous visit / (loose definition) Drug resistance of aminoglycosides in previous visit Drug resistance of beta lactams in previous visit Drug resistance of quinolones in previous visit Number of PEx in the past year in current visit Number of PEx in the past year in current visit (loose / definition) Drug resistance of aminoglycosides in current visit Not include BD (not included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 1 0 / 1 / 0 / 1 / Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 0 / 1 0 1 0 1 0 1 0 1 0 1 0.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.6 0.2 0 0 0 0 0.9 0.5 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 / 1 / 1 / 1 / 1 / 1 1 239 Table 6.6. (continued) Variables Drug resistance of beta lactams in current visit Drug resistance of quinolones in current visit Mucolytics Inhaled antibiotics Anti-inflammatories Bronchodilators Not include BD (not included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 1 1 1 1 0 0 1 1 1 1 0 0 1 0.5 1 0.5 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Cat and Con indicated that the number of PEx in the past year was treated as categorical and continuous variable respectively. 240 Table 6.7. The proportion of a variable that has been selected in each model among 10 imputed datasets for categorical variables Variables Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con Reference 0 0 0 0 Reference 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Reference 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 Reference 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Reference 0 0 0 0 0 0 Reference 0 0 0 0 241 Mutation 1 class: 1 2 3 4 5 Doesn't belong to any class Missing Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Number of PEx in the past year in previous visit: 0 1 2 Not include BD (not included PExloose) Strict Loose Neutral Cat Con Cat Con Cat Con Table 6.7. (continued) Not include BD (not included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con Variables 0 1 1 0 0 0 1 0 0 0 / / / / / / / 0 1 1 0 0 0 1 0 0 0 / / / / / / / / / / / / / / Reference / / / / / / / 0 0.9 0.6 0 0 0 1 0 0 0 / / / / / / / 0 1 1 0 0 0 1 0 0 0 / / / / / / / 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 Reference 0 0 0 0 0 0 0 0 0.9 0.8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 242 Number of PEx in the past year in previous visit: 3 4 5 6 7 8 9 10 11 12 Number of PEx in the past year in previous visit (loose definition): 0 1 2 3 4 5 6 7 Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con Table 6.7. (continued) Not include BD (not included PExloose) Strict Loose Neutral Cat Con Cat Con Cat Con Variables / / / / / / / / / / / / / / / / / / 1 1 1 0 0.3 0 0 0 / / / / / / / / / / / / / / / / / / Reference 1 1 1 0 0.3 0 0 0 / / / / / / / / / / / / / / / / / / 1 1 1 0 1 0 0 0.6 0 0 0 0 0 0 1 0 0 1 1 1 0 0.5 0 0 0 0 0 0 0 0 0 1 0 0 Reference 1 1 1 0 0.5 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 243 Number of PEx in the past year in previous visit (loose definition): 8 9 10 11 12 13 14 15 16 Number of PEx in the past year in current visit: 0 1 2 3 4 5 6 7 8 Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con Table 6.7. (continued) Not include BD (not included PExloose) Loose Strict Neutral Cat Con Cat Con Cat Con Variables 0 0 0 0 / / / / / / / / / / / / / 0 0 0 0 / / / / / / / / / / / / / / / / / / / / / / / / / / Reference / / / / / / / / / / / / / 0 0 0 0 / / / / / / / / / / / / / 0 0 0 0 / / / / / / / / / / / / / 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Reference 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 1 0 0 0 244 Number of PEx in the past year in current visit: 9 10 11 12 Number of PEx in the past year in current visit (loose definition): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Not include BD (included PExloose) Strict Loose Neutral Cat Con Cat Con Cat Con Table 6.7. (continued) Variables Number of PEx in the past year in current visit (loose definition): 14 / 15 / 16 / Not include BD (not included PExloose) Loose Neutral Strict Con Cat Con Cat Con Cat / / / / / / / / / / / / / / / Not include BD (included PExloose) Loose Neutral Strict Cat Con Cat Con Cat Con 0 0 1 0 0 1 0 0 1 Cat and Con indicated that the number of PEx in the past year was treated as categorical and continuous variable respectively. 245 Table 6.8. The coefficients for the model (strict definition) Variables SE Lower boundary -0.281499 -0.038898 -0.031758 0.014067 OR Percentage of missing Upper boundary 0.435629 -0.022415 -0.026112 0.019648 1.080112 0.969809 0.971480 1.017000 2% 1% 4% 2% 0.077065 -0.030657 -0.028935 0.016858 0.182932 0.004205 0.001440 0.001424 0.019029 0.055319 -0.351695 -0.471186 -0.078217 -0.014997 0.039885 0.077819 0.136665 0.142136 0.055941 0.149995 Reference -0.059151 0.097209 -0.097219 0.207857 -0.619562 -0.083827 -0.749778 -0.192595 -0.187863 0.031429 -0.308987 0.278994 1.019211 1.056878 0.703495 0.624261 0.924764 0.985115 3% 3% 2% 2% 1% 1% 0.084421 0.141071 0.262386 Reference -0.345979 -0.015049 -0.190242 0.362758 -0.357947 0.670587 0.834841 1.090088 1.169200 1% 1% 0% 0.151121 0.061444 0.128865 Reference -0.492655 0.099758 -0.324136 -0.083267 -0.152257 0.352924 0.821644 0.815706 1.105540 2% 2% 2% -0.180514 0.086258 0.156320 -0.196449 -0.203701 0.100334 246 (Intercept) Age Predicted FEV1 in current visit Predicted FEV1 in previous visit Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race Caucasian Black Asian Others Smoking No Yes Unknown Pancreatic insufficiency Coefficients 95% CI Table 6.8. (continued) Variables Pancreatitis Aspergillus B. cepacia MSSA Staphylococcus aureus Non-mucoid Pa PI Mucoid Pa PI Number of PEx in the past year in previous visit: 0 1 2 3 4 5 Drug resistance of beta lactams in previous visit: No Yes Testing not done Coefficients -0.430184 0.122658 0.719820 -0.209266 -0.067421 0.309718 0.535222 -0.339920 -0.539888 -0.640141 -0.543722 -0.753817 -0.425746 0.472472 SE 95% CI OR Percentage of missing Upper boundary 0.002207 0.198942 1.055364 -0.117667 0.039567 0.446745 0.686862 0.650390 1.130498 2.054064 0.811180 0.934801 1.363041 1.707827 3% 2% 0% 2% 2% 3% 2% 0.059419 0.094210 0.140864 0.208533 0.299377 Reference -0.456380 -0.223459 -0.724543 -0.355233 -0.916245 -0.364037 -0.952453 -0.134990 -1.340586 -0.167048 0.711828 0.582813 0.527218 0.580584 0.470567 1% 2% 2% 2% 0% 0.143198 0.074977 Reference -0.706535 -0.144956 0.325515 0.619428 0.653282 1.603953 6% 1% 0.220581 0.038919 0.171199 0.046733 0.054585 0.069905 0.077365 Lower boundary -0.862575 0.046374 0.384277 -0.300864 -0.174410 0.172691 0.383582 247 Table 6.8. (continued) Variables SE Lower boundary Upper boundary OR Percentage of missing 0.056136 0.089067 0.132873 0.211695 0.279524 Reference 0.685017 0.905069 1.010491 1.359634 1.161408 1.682280 0.659687 1.489525 0.882340 1.978065 2.214536 3.270891 4.144757 2.928839 4.179546 1% 1% 2% 1% 1% 0.074467 0.599280 Reference 0.039897 0.331802 -1.763120 0.586015 1.204241 0.555130 0% 0% 0.287911 0.099959 0.105390 0.595046 Reference 0.081336 0.494485 -1.066309 1.266227 1.333638 1.105126 2% 0% -0.703457 0.038818 -0.779540 0.494872 1% 0.795043 1.185062 1.421844 1.074606 1.430203 0.185849 -0.588553 -0.627374 248 Number of PEx in the past year in current visit: 0 1 2 3 4 5 Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics 0 1 Coefficients 95% CI Table 6.8. (continued) Variables Mucolytics 2 Inhaled antibiotics 0 1 2 3 Anti-inflammatories 0 1 2 Bronchodilators 0 1 2 Coefficients -1.240852 -0.496329 -0.850033 -2.042663 -0.406660 -0.281437 -0.322451 -0.185735 SE 95% CI Lower boundary Upper boundary OR Percentage of missing 0.047396 -1.333747 -1.147958 0.289138 1% 0.033702 0.075273 0.287697 Reference -0.562386 -0.430273 -0.997571 -0.702495 -2.606539 -1.478788 0.608761 0.427401 0.129683 2% 2% 0% 0.034664 0.104250 Reference -0.474603 -0.338717 -0.485764 -0.077111 0.665871 0.754698 2% 0% 0.033481 0.075646 Reference -0.388079 -0.256824 -0.334008 -0.037462 0.724371 0.830494 2% 2% 249 Table 6.9. The coefficients for the model (loose definition) Variables 0.635256 -0.028081 -0.009108 -0.001201 0.010826 -0.009286 -0.418090 -0.382783 -0.121752 -0.116738 -0.211211 0.069268 0.136831 -0.177151 -0.231731 0.634461 SE OR Percentage of missing Upper boundary 0.848228 -0.020542 -0.006405 0.001495 1.887506 0.972310 0.990934 0.998800 1% 0% 5% 5% 0.036397 0.072113 0.123116 0.121941 0.049907 0.071729 Reference -0.060511 0.082162 -0.150626 0.132054 -0.659393 -0.176787 -0.621783 -0.143783 -0.219568 -0.023936 -0.257324 0.023848 1.010884 0.990757 0.658303 0.681961 0.885368 0.889818 0% 0% 0% 0% 0% 0% 0.078868 0.130407 0.247878 Reference -0.365789 -0.056633 -0.186325 0.324861 -0.349000 0.622662 0.809603 1.071723 1.146634 0% 0% 0% 0.137589 0.056634 0.275233 Reference -0.446821 0.092519 -0.342732 -0.120730 0.095015 1.173907 0.837653 0.793159 1.886006 0% 0% 0% 0.108659 0.003846 0.001378 0.001375 Lower boundary 0.422284 -0.035619 -0.011810 -0.003896 250 (Intercept) Age Predicted FEV1 in current visit Predicted FEV1 in previous visit Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race Caucasian Black Asian Others Smoking No Yes Unknown Hemoptysis Coefficients 95% CI Table 6.9. (continued) Variables Aspergillus B. cepacia MSSA Staphylococcus aureus Mucoid Pa PI Number of PEx in the past year in previous visit: 0 1 2 3 4 5 Number of PEx in the past year in current visit: 0 1 2 3 4 5 Coefficients SE 95% CI Lower boundary 0.038965 0.336263 -0.299191 -0.178391 0.163388 OR Percentage of missing Upper boundary 0.179716 0.984441 -0.129434 0.019499 0.321281 1.115542 1.935473 0.807096 0.923628 1.274220 0% 0% 0% 0% 0% 0.109340 0.660352 -0.214312 -0.079446 0.242335 0.035906 0.165355 0.043306 0.050483 0.040280 -0.140077 -0.250059 -0.251660 -0.033396 -0.136775 0.055017 0.087130 0.130701 0.194640 0.281766 Reference -0.247908 -0.032246 -0.420831 -0.079287 -0.507828 0.004509 -0.414885 0.348092 -0.689027 0.415477 0.869291 0.778755 0.777509 0.967155 0.872166 0% 0% 0% 0% 0% 0.052618 0.083303 0.125114 0.200281 0.266998 Reference 0.524340 0.730599 0.737217 1.063758 0.782211 1.272651 0.236603 1.021689 0.285575 1.332187 1.872865 2.460802 2.793880 1.876008 2.245393 0% 0% 0% 0% 0% 0.627469 0.900488 1.027431 0.629146 0.808881 251 Table 6.9. (continued) Variables Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics 0 1 2 Coefficients SE 95% CI Lower boundary Upper boundary OR Percentage of missing 0.127152 0.094147 0.070929 0.760673 Reference -0.011867 0.266171 -1.396746 1.585039 -0.186798 -0.683217 0.119371 0.755187 Reference -0.420760 0.047164 -2.163356 0.796922 0.829611 0.504990 0% 0% 0.242641 0.252922 0.101375 0.552340 Reference 0.043949 0.441333 -0.829645 1.335488 1.274611 1.287783 0% 0% -0.755664 -1.295743 0.035636 0.043609 -0.825509 -1.381214 0.469699 0.273695 0% 0% -0.685819 -1.210271 1.135589 1.098721 0% 0% 252 Table 6.9. (continued) Variables Inhaled antibiotics 0 1 2 3 Anti-inflammatories 0 1 2 Bronchodilators 0 1 2 Coefficients -0.546949 -0.905168 -2.114768 -0.405523 -0.366443 -0.324555 -0.175278 SE 95% CI Lower boundary Upper boundary OR Percentage of missing 0.030906 0.070192 0.276643 Reference -0.607524 -0.486374 -1.042742 -0.767594 -2.656978 -1.572559 0.578713 0.404474 0.120661 0% 0% 0% 0.031814 0.099734 Reference -0.467878 -0.343169 -0.561917 -0.170968 0.666628 0.693196 0% 0% 0.030686 0.069342 Reference -0.384698 -0.264412 -0.311185 -0.039370 0.722849 0.839224 0% 0% 253 254 Figure 6.1. Histogram and normal quantile-quantile plot of bootstrapping the cutoff of 𝑡𝑡𝑡𝑡,𝑖𝑖 under strict definition 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 and 𝑟𝑟𝑟𝑟 255 Figure 6.2. Histogram and normal quantile-quantile plot of bootstrapping the cutoff of 𝑡𝑡𝑡𝑡,𝑖𝑖 under loose definition 𝑝𝑝̂ 𝑡𝑡𝑡𝑡,𝑖𝑖 and 𝑟𝑟𝑟𝑟 Figure 6.3. The histogram of predicted probability of having rational treatment change given observed treatment change pattern (strict 256 definition). Figure 6.4. Plot of the proportion of patients following each of the different treatment change strategies over time (strict definition, 257 only included predicted probability, 𝑝𝑝∗ ) Figure 6.5. Plot of the proportion of patients following each of the different treatment change strategies over time (strict definition, 258 included both predicted probability, 𝑝𝑝∗ , and relative change of predicted probability, 𝑝𝑝∗∗ ) CHAPTER 7 OPTIMAL TREATMENT REGIME 7.1 Results To fulfill Aim 3, this section describes the results from four parts. The first part described how the augmented datasets were created to investigate the treatment effect of following different DTRs. Then, both data-driven and knowledge-driven methods were applied to select the variables in the numerator and denominator of IPTW and IPCW respectively. After identifying the variables in the numerator and denominator of IPCW and IPTW, the related predications were conducted independently in each replicate among 10 augmented datasets. Give the coefficients that were identified previously, the weight was created for all visits. The influence of applying stabilized inverse probability weighting (SIPW) and unstabilized inverse probability weighting (UIPW) was compared by using two outcomes: the value distribution of weights, and nonparametric Kaplan Meier curve, respectively. At the end, the results of four models were present. Those models were, 1) fixed parameterization of the dynamic logistic MSMs with UIPW; 2) fixed parameterization of the dynamic logistic MSMs with SIPW; 3) flexible parameterization of the dynamic logistic MSMs with SIPW; 4) and time-dependent cox regression. 7.1.1 Creating the Augmented Datasets 260 As mentioned in Aim 2, 25 unique dynamic treatment strategies were taken into consideration for Objective 3. Those 25 strategies were marked as two digits. From the left to the right, the first digit indicated the value of the predicted probability of having a rational treatment change and the second digit indicated the value of the relative change of the predicted probability of having rational treatment change between current visit and previous visit. For example, strategies 33 and 43 had the same value for the relative change of the predicted probability, but had different predicted probabilities. To simplify the scenario, in Aim 3, only the strict definition of rational treatment change was considered. Therefore 25 replicates were created for each unique visit. Each replicate relates to one unique dynamic treatment change strategy. According to the related strategy, all visits were investigated in each replicate. If the observed treatment change pattern of having a rational treatment change in a visit conflicted with the counterfactual treatment pattern that the patient was supposed to follow, as determined by the related strategy, then the follow-up visits were artificially censored. In other words, according to the patient's demographic characteristic, clinical variables, and treatment history, if in a visit the physician did not make a treatment change decision that followed the related dynamic treatment change strategy, the follow-up visits were censored. An augmented dataset, which included 25 replicates, was created, and only those visits where the treatment classes that a patient received followed related rational treatment change strategies were kept in each replicate. The 10 imputed datasets transited to 10 augmented imputed datasets, respectively. 7.1.2 Variable Selection for the Weights 261 The selection of variables to predict IPW, which would appropriately adjust selection bias and confounder bias in a research question, is a complicated procedure. The variables could be selected through either knowledge-driven or data-driven methods. The knowledge-driven method is the most common method, and includes all variables that were mentioned in the related articles or according to clinical experiences. Those variables must have relationships with both exposure and outcome. After identifying those variables, given the current understanding of the relationships among those variables, a DAG figure is created. Other than intermediators, the other types of variables are included in the model. The data-driven method selects only the variables that are significantly associated with the predicted outcome. All variable selection methods that were mentioned previously could be applied here. Both data-driven and knowledge-driven methods were applied in this study. Specifically, based on knowledge and clinical experience, all variables in the CFFPR that were investigated in the previous two aims were included. The result in Aim 2 indicated that LASSO was more preferable for this dataset, since 𝛼𝛼 always equaled to 1, which makes elastic net as same as a LASSO, among all prediction models in each one of the 10 imputed datasets. Because of this, LASSO was applied to select the variable for calculating the weight for the numerator and denominator of inverse probability of censoring weighting (IPCW) and inverse probability of treatment weighting (IPTW), respectively. After simplification, there was only one outcome for IPTW, rational treatment change under strict definition. However, the outcome for IPCW was identified jointly by three outcomes: disenrollment, death, or end of study. In another words, IPTW 262 was applied to adjust the time-dependent confounder between the exposure (patient who followed a specific strategy to have rational treatment change under strict definition) and the outcome (time to mucoid PaPI). Similarly, the selection bias that was caused by loss to follow-up, death, and end of study was jointly adjusted by the IPCW. In order to investigate the difference between using one censor indicator that covered all three reasons and using three indicators that covered different reasons, independently, to predict IPCW, both the numerator and the denominator of IPCW were investigated under those scenarios in 10 imputed datasets. Unlike the IPCW, considering the influence of different strategies, the variable that was selected in IPTW was also investigated in each replicate of the 10 augmented imputed datasets. As long as a variable was selected in the same replicate once among 10 augmented datasets, it would be selected for that replicate related strategy. Therefore, the proportion of being selected was jointly determined by imputed dataset and strategy. If a variable was only chosen once, the proportion of being selected in IPTW and IPCW were 0.04 and 0.1, respectively. Table 7.1 presents the related results of variable selection. If a variable was selected for less than 50% among all augmented imputed datasets in the prediction model either for numerator or denominator of IPTW, it was excluded. Considering no variable was selected, if the disenrollment or death was applied as the outcome to predict either the numerator or the denominator of the IPCW, the joint probability of predicting three indicators equaled to the probability of predicting the end of the study alone. Table 7.1 shows that there is barely any difference on variable selection between using censor and using the end of study as the outcome on either predicting the numerator or the denominator of the IPCW. At the same time, considering the importance of simplicity to a model, only one indicator was chosen. 263 Therefore, one indicator, censor, was applied as the outcome to predict both the numerator and denominator of IPCW. Following Hernan's method of predicting the IPW with time-dependent confounders, both baseline variables (V) and time-dependent variables that were measured in the current visit (L) were included in the denominator prediction. For numerators, only baseline variables (V) were included. At the same time, in each model, the selected baseline variables (V) had to be a subset of the variables that were selected in the current visit (L). Moreover, treatment-related variables were included in the IPTW but not the IPCW, considering the assumption that treatment-related variables affected the censoring indirectly. Last, given the assumption of no unmeasured confounder left after adjusting the IPCW and IPTW, other than treatment-related variables, the rest of the variables should be included in both IPCW and IPTW at the same time. The only exception was censoring indicator, which only existed in the prediction model of IPTW as a predictor. Following the above procedures, the variables that were finally included in each model are also present in Table 7.1. Different color font indicates the rationale of making a related decision. For example, variable of drug resistance of beta lactam in baseline visit was only selected by LASSO in predicting the denominator of IPTW. Because it was a baseline variable, as long as it was chosen in the denominator, it had to be chosen in the numerator. Therefore, it was selected in the prediction model of the numerator of IPTW, and was marked in pink to indicate the rationale. Since it was selected in IPTW, this variable should also be selected in the related model in IPCW, which is marked as orange in the decision column of IPCW. Table 7.1 presents all variables that were 264 selected in those four models. The following time-dependent variables (L) were selected: height, weight, predicted FEV1 in current visit, number of visit, number of visit (spline), smoking status, transplant status, CFRD status, whether the patient had GERD, whether the patient had pancreatic insufficiency, whether the patient had pancreatitis, whether the patient had hemoptysis, did the patient use any enzymes, whether the patient had ABPA, whether the patient was infected by any species of Aspergillus, B. cepacia infection, candida infection, MAI infection, MRSA infection, MSSA infection, being infected by any other Gram-negative microorganisms, S. aureus infection, whether patient was diagnosed with nonmucoid PaPI in the current visit, whether the patient was diagnosed with unknown type of mucoid PaPI in the current visit, number of PEx in the past year, drug resistance to aminoglycosides, drug resistance to beta lactams, drug resistance to quinolones, number of mucolytics that the patient received, and the number of antiinflammatories that the patient received. The following baseline variables (V) were also selected: mutation 2 class, age, predicted FEV1, height, weight, race, transplant status, CFRD status, whether the patient had GERD, B. cepacia infection, MRSA infection, being infected by any other Gram-negative microorganisms, whether the patient was diagnosed with nonmucoid PaPI, number of PEx in the past year, drug resistance to aminoglycosides, drug resistance to beta lactams, drug resistance to quinolones, number of mucolytics that the patient received, and number of anti-inflammatories that the patient received. 7.1.3 Calculating the Weights 265 After identifying the variables in the numerator and denominator of IPCW and IPTW, the related predications were conducted independently in each replicate among 10 augmented datasets. Tables 7.2 and 7.3, 7.4 and 7.5, 7.6 and 7.7, 7.8 and 7.9 present the estimations of the odds ratio for variables in the numerator of IPTW, the denominator of IPTW, the numerator of IPCW, and the denominator of IPCW, respectively. To better investigate the difference among different imputed datasets, Tables 7.2, 7.4, 7.6, and 7.8, and Tables 7.3, 7.5, 7.7, and 7.9 present the results from the 2nd, and 8th imputed dataset, respectively. In each table, the results using strategies, 33, 43, and 53 are also present. There were few differences among either strategies or imputed datasets. However, the differences among the imputed datasets were larger than the differences among the strategies. The majority of the variables matched the expectation and understanding, but the effects were not statistically significant in predicting the numerator of IPTW (Tables 7.2 and 7.3). Only some of the variables had statistically significant influences: mutation 2 class, drug resistance to aminoglycosides, and the number of anti-inflammatories. With a decrease in the severity of mutation that a patient had, or an increase in the number of anti-inflammatories that a patient received, the chance of having a rational treatment change, on the numerator of IPTW, decreased. Compared to predicting the numerator of the IPTW, more variables were statistically significant in predicting the denominator of the IPTW (Tables 7.4 and 7.5), such as baseline variables (V): age, predicted FEV1, height, mutation 2 class, whether the patient had been diagnosed with nonmucoid PaPI, number of PEx in the past year, number of mucolytics that the patient received, and timedependent variables (L) that were measured at the current visit: predicted FEV1, height, 266 number of visit, number of visit in spline, smoking status, CFRD status, whether the patient had pancreatitis, whether the patient had ABPA, B. cepacia infection, MSSA infection, whether patient was diagnosed with nonmucoid PaPI, whether the patient was diagnosed with an unknown type of mucoid PaPI, number of PEx in the past year, number of mucolytics that a patient received, number of inhaled antibiotics that a patient received, number of anti-inflammatories that a patient received, and number of bronchodilators that a patient received in the current visit. If a variable was chosen both at the current visit and baseline visit, the directions of effect were conversed, and the direction of effects in the current visit was always consistent with our expectation. For example, if, at the baseline visit, the patient had a higher predicted value of FEV1, was taller, had fewer PEx, or more mucolytics, then it was more likely that the patient would receive a rational treatment change in the current visit. Conversely, if the patient had a higher predicted value of FEV1, fewer PEx, more mucolytics, or was taller, then the chance of having a rational treatment change decreased. Compared with the related reference, the chance of having a rational treatment change was huge, more than 10, for a patient who was infected by B. cepacia. The odds ratio was 11.274 under strategy 33 in imputed dataset 2. If a patient had pancreatitis or MSSA infection at the current visit, the chance of having a rational treatment change decreased; these are the only two variables that had a direction of effect different than our expectations. Several variables were statistically significant in predicting the numerator of the IPCW (Tables 7.6 and 7.7) at the baseline visit, such as predicted FEV1, height, weight, GERD, B. cepacia infection, being infected by any other Gram-negative microorganisms, and diagnosis with nonmucoid PaPI. The directions of effects were reasonable for the 267 majority of those variables. For example, if a patient had GERD as a comorbidity, was infected by B. cepacia, or infected by any other Gram-negative microorganisms at the baseline visit, the chance of being censored increased. However, two of their directions conflicted with our expectation and knowledge-the better lung function a patient had, or the heavier a patient was, the more likely the patient would be censored. At the same time, the probability of being censored was dominated by three scenarios: had received a transplant, infected by B. cepacia, and infected by other Gram-negative microorganisms at the baseline visit. These had large odds ratios of 8.152, 8.082, and 3.759 under strategy 33 in imputed dataset 2, respectively. The estimates of variables in denominator of IPCW shared similar effects to the one in numerator of IPCW. For example, weight, B. cepacia infection, MRSA infection, and infection by other Gram-negative microorganisms at the baseline visit still had statistically significant effects with the same direction as the one in the numerator. Similar to the trends in IPTW, the same variable that was measured in the current visit also had a conversed effect compared with the related one that was measured in the baseline visit. For example, having had a transplant at the current visit would decrease the probability of being censored. However, being infected by B. cepacia, being infected by MRSA, or being infected by other Gram-negative microorganisms at the current visit would also increase the probability of being censored, especially in the last scenario, which still had statistically significant effect. 7.1.4 Influence of Applying Different Methods to Calculate Weights Those estimates among the different models gave a general description of each variable in the model. The performance of those prediction models, especially the 268 difference between using stabilized weights and unstabilized weights, which would affect the result of Aim 3 directly, will be compared in the following section. Figure 7.1 shows the distribution of the stabilized inverse censoring weight (SICW) under different strategies in different imputed datasets. The left and right columns represent the distribution in imputed datasets 2 and 8, respectively. From the top to bottom, those figures represent the distribution under strategies 33 and 53, respectively. SICW was normally distributed around 0.9 regardless of the strategies and imputed datasets that were chosen. Unlike the agreement on using the SIPCW to adjust the selection bias, the difference between using the stabilized inverse treatment weighting (SIPTW) and unstabilized inverse treatment weighting (UIPTW) to adjust the time-dependent confounder was unclear. Therefore, Figure 7.2, 7.3 was compared to Figure 7.4, 7.5 respectively. While the differences among different strategies and imputed datasets as represented in the figures are trivial, there is a huge difference in distribution between UIPTW and SIPTW. Unlike SIPTW, which is normally distributed with 1 as the mean in Figure 7.3, the UIPTW has an exponential distribution in Figure 7.2. Moreover, 1 is the minimum value of UIPTW. Figure 7.4 and Figure 7.5 present the distribution of final weights in both UIPW and SIPW, respectively. UIPW and SIPW was the production of SIPCW and UIPTW, and SIPCW and SIPTW, respectively. Even though at first glance Figure 7.4 seems like a normal distribution, it has an extreme skew to the right. The distribution in Figure 7.5 is closer to normal with a mean around 1. More importantly, compared to UIPW, the chance of having larger value was much lower in SIPW, which are supported by the data in Tables 7.10 and 7.11. Without any further adjustment, from the mean, median, upper quartile to maximum, the SIPW is always associated with 269 smaller values. The upper quartile for SIPW and UIPW is around 1 and 3, respectively. The skew that is introduced to the data by using UIPW meant that SIPW is preferred in this study. Table 7.12 shows the number of extreme values, larger than 10, in SIPW under different strategies in varied imputed datasets. Only 175 out of 28,976 visits were associated with extreme values under strategy 33 in imputed dataset 2. The proportion of having extreme values in SIPW was consistent, around 0.6%, regardless of the strategy and imputed dataset. Under this situation, 10 was set as the maximum value of SIPW. Table 7.11 presents the distribution of truncated SIPW, which had around 1.015 as the mean. Figures 7.6, 7.7, and 7.8 also present the influence of different methods of weighting. The left and right columns represent the trends in imputed datasets 2 and 8, respectively. From the top to bottom, those figures represent the nonparametric Kaplan Meier curve without any adjustment, adjustment by UIPW, and adjustment by SIPW, respectively. Without any adjustment or adjusting by SIPW, the difference in results between different imputed datasets is trivial. However, there are huge differences regarding the method of weighting. Without any adjustment, there are barely any differences in the survival curves among five strategies (13, 23, 33, 43, 53) in Figures 7.6, 7.7, and 7.8. Applying IPW to adjust the results reveals huge differences among the strategies and imputed datasets. As shown in Figures 7.6, 7.7, and 7.8, in imputed dataset 2, after adjusting UIPW, there is a huge decrease of survival in the 5th visit for strategies 13 and 23, from 1 to 0.91, and another decrease on the 20th visit, from around 0.90 to 0.88. The other strategies had survival rates around 1 until the 23rd visit. However, in imputed dataset 8 for strategies 13 and 23, there was a decrease at the 5th visit, from 1 to 270 0.975 and a decrease to 0.94 at the 20th visit. The other strategies shared trends similar to the related ones in imputed dataset 2. When the SIPW was applied, the decrease occurred constantly, and the final survival rate was comparable to the one that was adjusted by the UIPW under same strategy. At the same time, there remained a difference in survival rates between the different strategies until late visits. 7.1.5 Results of Applying Different Models As mentioned previously, SIPW is preferred in this study. However, in order to investigate the stability of the result, UIPW was also applied to adjust the final model. Other than the difference of building weight, unlike using UIPW to adjust bias, which does not require any variable adjustment in the regression model, when SIPW was applied, the baseline variables in the numerator had to be adjusted in the regression model. The following baseline variables were adjusted in the regression model: mutation 2 class, predicted FEV1, number of PEx in the previous year, number of mucolytics that the patient received, number of anti-inflammatories that the patient received, age, gender, race, transplant status, drug resistance to aminoglycosides, drug resistance to beta lactams, and drug resistance to quinolones. Tables 7.13 and 7.14 present the results of fixed parameterization of the dynamic logistic MSMs. Table 7.15 present the results of a flexible parameterization of the dynamic logistic MSM. Unstabilized and stabilized weighting were applied in Tables 7.13 and 7.14, respectively. The point estimate in each table was calculated according to the result in 10 augmented imputed datasets. The minimum and maximum of point estimation among those 10 augmented imputed datasets were also reported. 7.1.5.1 The Fixed Parameterization of the Dynamic Logistic MSM with UIPW 271 Table 7.13 shows that without adjusting any baseline variables, and with applied UIPW, not following any strategy (strategy 1) was superior to some strategies (strategies 11-15, 21-25), and worse than other strategies. Figure 7.9 supports the conclusion by depicting the survival curves of six strategies (no strategy, 13, 23, 33, 43, and 53). As shown in Figure 7.9, the survival rates are low for strategies 13 and 23, and are high for strategies 33, 43, and 53. The survival rate of not following any strategy is located in the center, and is surrounded by the survival curves of the other five strategies under discussion. Compared to following strategy 55, if a physician did not follow any rational treatment change strategy, the odds of developing mucoid PaPI would be 4 times higher. If the cut-off of predicted probability was fixed, then the worst outcome was always associated with a relative change of predicted probability equal to 1.8310% (strategy ‘X3') or 2.6355% (strategy ‘X4'). Among the strategies that were investigated, without adjusting variables, strategy 31 was associated with the optimal outcome and strategy 23 was associated with the worst outcome. 7.1.5.2 The Fixed Parameterization of the Dynamic Logistic MSM with SIPW After adjusting the baseline variables and applied SIPW, not following any strategy caused the worst outcome (Table 7.14 and Figure 7.10). Given fixed baseline variables, compared to following strategy 55, if a physician's treatment changes did not follow any rational treatment change strategy, the odds of developing mucoid PaPI would be 1.17 times higher (95%CI (1.13, 1.22)). Similar to the previous model, if the cut-off of predicted probability was fixed, then the worst outcome was always reflected 272 by a relative change of predicted probability equal to 1.8310% (strategy ‘X3'). As shown in Figure 7.10, other than not following any strategy, the differences of odds ratio among different strategies were trivial. Compared to following strategy 55, only following strategies 31, 51, or 52 would delay the progression to mucoid PaPI. The optimal outcome was achieved if the physician followed strategy 51. Considering the results of point estimation and confidence interval, at the baseline visit, with an increase in predicted FEV1, a decrease in age, a decrease in severity of mutation class, or a decrease in the number of mucolytics a patient received, the odds of developing mucoid PaPI decreased. At the same time, Caucasian and Black patients had the lowest and highest odds of developing mucoid PaPI, respectively. The number of PEx in the past year had an inconsistent effect on the outcome: it increased at the beginning, and decreased when the number of PEx was greater than 4. If a patient had drug resistance to aminoglycosides, the odds of developing mucoid PaPI increased. Surprisingly, if a patient had not test drug resistance of quinolones, the odds of developing mucoid PaPI increased. 7.1.5.3 The Flexible Parameterization of the Dynamic Logistic MSM with SIPW The result of flexible model was present in Table 7.15. After adjusting the baseline variables and applying the SIPW, the results show that the assumption of constant-time hazards was held. Compared to the effects in the 6th year, the effects in the first 2 years were not statistically significantly different. Even though the effects from the 3rd to 5th year were statistically significant, the absolute impacts were trivial compared to influences from the other variables. The maximum of the absolute difference of the coefficient, 0.4178, occurred at the 5th year. This difference was much smaller than the 273 absolute difference of the coefficient on the interaction between strategy 1X and any year. When not considering the effect of strategy on a specific year, not following any strategy still caused the worst outcome. Compared to following strategy 55, which was the most strict strategy to define rational treatment change, not following any strategy, on average, increased the odds ratio of developing mucoid PaPI by 1.4156 times. When taking only the interaction between strategy and year into consideration, the treatment effect of not following any treatment strategy ranked in the middle among all strategies in the same year. At the baseline visit, with an increase in predicted FEV1, a decrease in age, a decrease in severity of mutation class, and a decrease in number of mucolytics a patient received, the odds of developing mucoid PaPI decreased. At the same time, Caucasian and Black patients had the lowest and highest odds of developing the outcome, respectively. The number of PEx in the past year had inconsistent effect on the outcome: it increased at the beginning, and decreased when the number of PEx was greater than 4. Similarly, in mutation 2, compared to the class I, the chance of developing mucoid PaPI was higher in the class III. If a patient had drug resistance to aminoglycosides, the odds of developing mucoid PaPI increased. If a patient had not test the drug resistance to beta lactams or quinolones, the odds of developing mucoid PaPI was decreased and increased respectively. 7.1.5.4 The Time-dependent Cox Regression A time-dependent Cox regression was also built to investigate the difference between following a strategy when changing treatment (specifically strategy 33, the first strategy identified) and changing treatment without following any strategy. The final 274 model was identified based on the AIC value using stepwise regression. As shown in Table 7.16, the result of variable selection is consistent among 10 imputed datasets. Table 7.17 presents the final result after combining the result from 10 imputed datasets. Compared with following strategy 33, the chance of developing mucoid PaPI would be 2.84 times higher if a physician made a treatment change without following any strategy. The above result was consistent with the result in the fixed parameterization of the dynamic logistic MSMs using the SIPW. Hemoptysis, MAI infection, and bronchodilator use were three variables that would significantly shorten the time to mucoid PaPI. 7.2 Discussions The discussion section is organized in the following manners. The first part focuses on the discussion of the innovations and successes of this objective. Then, all limitations are discussed. At the end, a summary of these results and their potential applications are summarized from three perspectives: 1) steering the design of RCTs; 2) directing the clinical practice; 3) supporting the design of value-based drug formulary. 7.2.1 Strengths There are two innovations in the investigation of Aim 3. First, this is the first study to investigate the causality of different treatment change strategies. This solves two complicated research questions: investigating dynamic treatment regimens and investigating treatment change, at the same time. Dynamic MSM is an advanced causal method, which was applied to investigate dynamic treatment changes. However, unlike traditional questions about dynamic treatment regimens, which investigate initiation, this 275 study was able to investigate the effects of treatment switching by using the predicted probability of having a rational treatment change and related strategies. Moreover, this study innovatively embedded the regularization method into selecting variables for the prediction of IPW. The majority of the time, confounders and intermediate variables have to be well identified in order to appropriately investigate the causality between exposure and outcome. With the increase of the number of parameters and the sheer volume of data available, the chance of fully understanding the function of each variable in the dataset was dramatically decreased. This regularization method provides an opportunity to investigate causality with a fair amount of knowledge. Other than the innovations, one of the key successes was the ability to build the SIPW with a narrow range under this complicated scenario. Benefitting from combining the data-driven and knowledge-driven method, the prediction models of numerator and denominator of IPTW and IPCW balanced the parsimoniousness and accuracy of prediction at the same time. After truncating the stabilized weights that were beyond 10, the mean of SW decreased from 20.9848 to 1.0177 under strategy 53 in imputed dataset 2. At the same time, unlike the traditional prediction model of IPCW, which included the indicator of receiving treatment as an independent variable, in this study, IPTW held an indicator of being censored. In other words, rather than building IPCWs that were conditional on whether a patient received the treatment, in this study, IPTW was predicted conditional on whether or not the patient was being censored. This change was made to embrace the uniqueness of dynamic treatment regimes, in which the artificially censored dataset would be identified after the normal censoring had already been adjusted. At the same time, the causation of extreme values of SIPW was investigated. The 276 majority of the time, censoring was caused by the different rationales between predicting the probability of having a rational treatment change and predicting the probability of being artificially censored for the denominator of the IPTW. The first outcome was predicted by the difference of values that were measured between previous and current visits for the same variables. However, the denominator of the IPTW may be determined by the difference of values that were measured between baseline and current visits. Together with the issue that no visit was censored at the 1st visit, extreme values of SW could occur. For example, one patient had a higher predicted FEV1 (200%) at the baseline visit, the value decreased dramatically to 61.58% at the 1st visit, and maintained consistently around 80%, in the following visits. At the first visit, the predicted probability of having rational treatment was much higher (0.5141) than the threshold. However, because of the missing value for the relative change of predicted probability, the visit was not censored. In the following visits, since the lung function barely fluctuated, the predicted probability of having rational treatment change was lower than the cut-offs for the strategy, so no rational treatment change was needed. However, the inflated value of the predicted FEV1 at the baseline visit still affected the prediction of the denominator of IPTW. Under that situation, the SIPW kept increasing exponentially from the baseline until the last visit. Even though the stabilized weight in each visit ranged only from 1.11 to 3.42, the final weight in each visit is a product that includes all previous visits, and some patients had as many as 20 visits. There is no doubt that these patients have extreme values of SW. Since truncated weights with extreme values can decrease the chance that a small number of replicates have undue influence on the result of the analysis, those replicates with extreme values should not significantly bias the 277 result. At the same time, the varied results among those four models directly present the issues of building dynamic MSM, and the fixed parameterization of the dynamic logistic MSMs with SIPW should be the final model in this study. As mentioned previously, the SIPW is more stable compared with UIPW. Both the fixed parameterized model and flexible parameterized model with SIPW indicated similar results: physicians who did not prescribe treatment following any strategy would cause the worst outcome. However, compared with the fixed parameterized model, which identified an optimal strategy that did not associate with time, the identification of optimal strategy is complicated for the flexible parameterized model: the optimal strategy is varied in each calendar year. Considering the complexity of the optimal strategy, and the marginal benefit of applying the flexible parameterized model, the fixed parameterized model is preferred. Last but not least, compared with time-dependent cox regression, the fixed parameterized model was more likely to comprehensively adjust the time-dependent confounders, which were supported by the results. In the time-dependent cox regression, compared with physicians who followed strategy 33 to change prescription, the chance of developing mucoid PaPI would be 2.84 times higher, if a physician made a treatment change without following any strategy. For the same comparison, the number decreased to 1.14 in the fixed parameterized model. Therefore, the results in the fixed parameterization of the dynamic logistic MSMs with SIPW are the key findings in this objective. Specifically, this study suggests that physicians had to make treatment changes following rational treatment change strategies. If not, the worse outcome would occur. Compared to following a specific strategy, 55, the odds of developing mucoid PaPI 278 would be 1.17 times higher for a patient whose treatment change did not follow any strategy. The optimal outcome would be achieved following strategy 51: the physician should not provide a treatment change on the treatment class level if the predicted probability of having a rational treatment change is lower than 0.088 and the relative change of probability is lower than 0.222% between the current and the previous visit; if the probability is higher than 0.098 and the relative change of probability is higher than 0.222%, then the physician should change the treatment on the treatment class level. Generally speaking, these results are consistent with the concept of evidence-based medicine: treatment has to be changed if and only if it is supported by the clinical signals. However, given there have been limited longitudinal studies for CF patients, the accuracy of this result is hard to prove directly. Let alone, the treatment effects of following varied DTRs to make treatment change were not statistically different given their boot strapped confidence intervals. More studies are needed before identifying the DTR that causes the optimal outcome. 7.2.2 Limitations The analysis present in this aim relies on the validity of the assumptions outlined in the method section of this dissertation. Unlike positivity, which was investigated by testing whether there was at least 1 patient in all potential scenarios, the assumptions of consistency and no unmeasured confounder are untestable. However, CFF accredited clinics and hospitals almost prevented the pathogen transmission, such as Pseudomonsa Aeruginosa, among patients by following the Infection Prevention and Control Guideline167 and cohort segregation. There was still a small chance that the pathogen 279 transmission existed, 0.018 per year for chronic infection with Pa.168 Under that situation, the assumption of consistency may be violated by interferences among patients who received chronic treatments and who did not. Fortunatelly, the rate was low and there is limited time to have patient-patient interaction for pediatric CF patients. Therefore, the chance of violating the assumption of stable unit treatment consistency, thereafter to violate the assumption of consistency, would be low. In addition, the assumption that the artificial censorship and the censorship models used in the denominator of the weights are correctly specified is crucial for consistent estimates. To increase the possibility of correctly fitting the probability of artificial censoring and censoring, respectively, very rich models with tremendous numbers of variables were applied, and variables that were selected in each model were included jointly. However, the direction and consistency of effect estimates are conflicted with current knowledge for several variables in the prediction model of IPW, which may be problematic. For example, the more PEx that a patient had in the baseline visit, the higher chance the patient may receive rational treatment change in predicting the numerator of the IPTW. However, when the number was greater than 4, the estimate decreased, and even reversed when the number equaled or was greater than 5. Similarly, compared to mutation 2 class V, a patient whose mutation did not belong to any class or was missed had a higher value in predicting the numerator of the IPTW. This scenario could be explained by the tendency for physicians to make a rational treatment change according to other clinical signals when faced with an uncertain mutation type. However, without further information, the explanation is not certain. If a patient had pancreatitis or MSSA infection at the current visit, the chance of having a rational treatment change decreased, which was opposite to our expectation. 280 Last but not least, 25 strategies were investigated in this study. According to the result in the flexible parameterization of dynamic logistic MSM, which is not smooth, perhaps other potential strategies should be investigated. At the same time, the identification of those strategies in Objective 2 may also bias the result, if there was any unmeasured confounder that confounded any irrational treatment changes as a rational treatment change given the clinical signals. 7.3 Applications The results of this study are very likely to be generalizable to other samples with the same outcomes. The CFFPR is a nationwide patient registry that, since 1986, has been aimed at tracking treatment effects on and survival time transitions of CF patients. Considering the longitudinal and national characteristics of the CFFPR, the abundant variables measured in the database, and the prudent inclusion criteria, this study has good generalizability. At the same time, developing mucoid PaPI works as the indicator for disease progression, after which the chance of survival decreases dramatically. Using this indicator provides an alternative way of identifying treatment effects that doesn't require further adjustment for death. Given the above characteristics, the results of this study are stable and generalizable for several potential applications, which are described in the following sections. 7.3.1 Steering the Design of RCTs In the current study, the observational data were applied to emulate the RCT, which investigated the DTR of treatment change that causes the optimal outcome. Even 281 though the advanced method had been applied, those results have to be double-proved before being adopted into the guidelines and supporting future decision-making. Unlike traditional RCTs, which compare efficacy among two or several interventions, this innovative RCT requires comparison of the efficacy of several DTRs. Since all DTRs are determined by the threshold of the predicted probability of having rational treatment change and the threshold of the relative change of predicted probability of having rational treatment change between the current and previous visit, without the results of this study, millions of potential DTRs have to be compared in order to identify the optimal one. Considering the extreme expense of conducting an RCT and the sample size needed to generate enough power, the results of this study are invaluable, specifically for the following two conclusions. The patient who did not follow any regime for treatment changes had worse outcomes than patients who followed any other regime. With the increase of the threshold of relative change of predicted probability, the hazard ratio of developing mucoid PaPI increased and then decreased among patients who followed related DTRs. The regime in which the threshold of relative change of predicted probability equaled 1.831% always caused the worst outcome in regimes with the same threshold of predicted probability. Therefore, the main focus of designing an RCT is investigating the optimal threshold of the predicted probability of rational treatment change. If the project were funded with $1 million (probably enough to recruit only 200 patients), with the study's results, hypothetically just five DTRs would need to be compared to investigate the optimal DTR. Specifically, patients older than 6 years old and diagnosed with nonmucoid PaPI but mucoid PaPI are randomized into five DTRs. 282 For those DTRs, the lower thresholds of predicted probability of having rational treatment change are 0.072, 0.076, 0.080, 0.084, and 0.088. The upper thresholds are 0.01 higher than the related lower thresholds. At the same time, the threshold for the relative change of predicted probability of having rational treatment change is consistent among those five DTRs: 0.222% and 3.440% for the lower and upper thresholds, respectively. Whenever both a patient's predicted probability and relative change of predicted probability of having rational treatment change are higher than the upper threshold, then he receives a rational treatment change. If both of those two values are smaller than the related lower threshold, then he should not receive any rational treatment change. Prescribing an additional treatment from any one of three treatment classes-inhaled antibiotics, mucolytics, or anti-inflammatories-can be defined as a rational treatment change if it follows the previous rules. For the rest of the scenarios, they follow the rules all the time regardless of whether additional treatment is prescribed. If the patient develops mucoid PaPI, receives a lung transplant, or dies, he will be censored. Generally speaking, this design balances the trade-off between sample size and number of DTRs being investigated. The design specifically focuses on investigating the causality between using different thresholds of predicted probability to define DTRs and time until mucoid PaPI develops. Hypothetically, if all 25 regimes were investigated, there would be only eight patients followed each one of the regimes. Obviously, not enough power would be generated in this hypothetical trial. Using the same concept of DTR design rather than applying a specific threshold of relative change for each regime, a relatively broad grace period is given: 0.222% to 3.440%. The determination of whether a patient will follow a specific regime depends on whether the observed treatment change 283 pattern is consistent with the threshold of the related regime. Unlike the observational study, which created 25 replicates of each individual visit, only five replicates have to be created in the RCT since the threshold of predicted probability is fixed among those five DTRs. Those five replicates apply only to investigate the optimal thresholds of the relative change, assuming the threshold of predicted probability is fixed. With the support of this design, even the RCT enrolls only 200 patients, who are randomly assigned into one of the five regimes; after five replicates are created, the results can represent 1000 patients. On average, around 40 patients follow each of the 25 DTRs, which may generate enough power. In other words, the results of this study are invaluable, especially in the direction of supporting the design of RCTs. 7.3.2 Directing the Clinical Practice With the identification of the optimal dynamic treatment regime, using the longitudinal data under the causal inference, physicians can use these results in the future to make treatment changes at the right time by following the optimal strategy. Using the optimal regime 51 as an example, the physician should not provide a treatment change on the treatment class level if the predicted probability of having a rational treatment change is lower than 0.088 and the relative change of predicted probability is lower than 0.222% between the current and previous visits; if the probability is higher than 0.098 and the relative change of predicted probability is higher than 0.222%, then the physician should change the treatment at the treatment class level. If the predicted probability and relative change of predicted probability are in the remaining scenarios, the prescribing behavior is acceptable regardless of whether a treatment change is made. At the same time, given 284 unique demographic values, clinical variables, and treatment histories at the baseline visit and current visit, the physician can make personalized treatment change decisions for each patient confidently, rather than guessing whether the demographic and clinical characteristics of each individual patient match the studies' inclusion criteria, from which the guidelines were generated. With the application of the optimal dynamic rational treatment change strategy, both healthcare providers and patients are surrounded with certain evidences when a treatment change decision has to be made. Therefore, the clinical outcome-time to mucoid PaPI-will be extremely delayed at the CF patient population level. 7.3.3 Supporting the Design of Value-based Drug Formulary At the same time, the study results could also support value-based formulary design prior to reimbursement of extremely expensive medications by optimizing traditional treatment utilization through step therapy, tiered formulary, prior authorization, and other tools for managed care pharmacy. Drug formulary was initially designed in the early twentieth century to manage and control inventory, manage costs, and facilitate the purchasing process.169,170 As time passed, drug formulary evolved into a negotiating tool with drug manufacturers. In order to design a drug formulary, drug review and formulary placement decisions have to be made based mainly on clinical safety and efficacy. Other than those two components, cost and rebate are other major factors for traditional costbased formulary designs.169,170 Cheaper treatments, including the sum of manufacturer price and rebate, are always listed in the lower tier with low or no copayments. Rather than applying cost as the third component, the value-based formulary ranks individual 285 treatments in therapeutic areas according to comparative drug values171-173 and assigns them to related tiers. Compared with traditional cost-based formulary design, the valuebased formulary design reduces the annual cost of the health plan without negatively affecting healthcare utilization.173 With the successful identification of the dynamic treatment regime, the valuebased formulary could be designed on the treatment class level: additional treatment or switching treatment will be reimbursed only if the prescription timing matches the threshold of the dynamic treatment regime. In such a value-based formulary, patients' lung function would be optimized so as to avoid or delay the need for extremely expensive treatments such as ivacaftor and ivacafotr/lumacaftor unless the healthcare provider has already prescribed all the other treatments step by step (step therapy) and the scenario of suboptimal treatment effects has already occurred (prior authorization). Therefore, the annual cost of the health plan for CF patients could be well controlled without sacrificing healthcare utilization. After several years' application, with improvements in patients' health and emerging treatments, a better strategy may be identified. At the same time, with an increase in the number of patients who follow the optimal strategy, the grace period narrows down. Therefore, every couple of years, a new iterative strategy will be identified with more certain evidences. After enough iterations, grace periods may eventually disappear, and an optimal strategy with a clear-edged threshold could be identified. Before the final optimal strategy is identified, insurance companies will redesign their formularies whenever the optimal strategy is updated. They will reimburse only those treatment changes that match the optimal strategy. In such a situation, this 286 research could not only improve patients' health but also help control healthcare costs indirectly. 7.4 Conclusions The analysis undertaken in Aim 3 represents the first comparison of dynamic rational treatment change strategies for chronic treatment of pediatric CF patients using marginal structural models and inverse probability weighting. In summary, patients who do not follow a treatment-change regime have worse outcomes than those following any regime. Among the patients who followed different DTRs, the hazard ratio of developing mucoid PaPI first increased, then decreased, as the threshold of relative change of predicted probability increased. The regime in which the threshold of relative change of predicted probability equaled 1.831% always caused the worst outcomes compared with other regimes that shared the same threshold of predicted probability. An optimal strategy was identified among 25 strategies; this optimal strategy maximized the time to infection with mucoid PaPI and includes the following guidelines: the physician should not provide a treatment change on the treatment class level if the predicted probability of having a rational treatment change between the current and previous visit is lower than 0.088 and the relative change of predicted probability is lower than 0.222%; if the probability is higher than 0.098 and the relative change of predicted probability is higher than 0.222%, then the physician should change the treatment on the treatment class level. If the probability ranges from 0.088 to 0.098, it is acceptable to either implement a treatment change or not. Generally speaking, these results are consistent with the concept of evidence-based medicine: treatment has to be changed if and only if it is supported by the clinical signals. 287 Currently, several guidelines for chronic lung health maintenance treatments exist to recommend prescribing practices. However, rather than suggesting the order of prescription, the guidelines only categorize all treatments by the certainty of net benefits. Additionally, those evidences are generated by existing RCTs with small sample sizes and extremely narrow characteristics that don't represent the whole patient population. With the identification of the optimal dynamic treatment regime, using longitudinal data under the causal inference, physicians can use the results of this study in the future to make treatment changes at the right time by following the optimal strategy. At the same time, physicians can make personalized treatment change decisions for each patient confidently given the unique demographic values, clinical variables, and treatment histories at the baseline visit and current visit, rather than guessing whether the demographic and clinical characteristics of each individual patient match the studies' inclusion criteria from which the guidelines were generated. With the application of the optimal dynamic rational treatment change strategy, both healthcare providers and patients are presented with certain signs when a treatment change decision has to be made. Therefore, the clinical outcome-time to mucoid PaPI-will be maximally delayed at the CF patient population level. The only drawback is that the current study has generated causality by emulating the design of an RCT but conducting a real RCT. However, the results of this study will help to design an RCT to investigate the causality between following different DTRs and a delay in developing mucoid PaPI. The results of the new RCT, in return, can prove the evidence generated by this study. At the same time, the study results could also support value-based formulary 288 design by optimizing traditional treatment utilization-step therapy, tiered formulary, prior authorization, and other tools for managed care pharmacy-prior to reimbursement of extremely expensive medications. After several years' application, with improvement in patient health and emerging treatments, a better strategy may be identified. At the same time, with an increase in the number of patients following the optimal strategy, the grace period will narrow down. Therefore, every couple of years, a new iterative strategy will be identified with more certain evidence. After a number of iterations, grace periods may eventually disappear, and an optimal strategy with clear-edged thresholds could be identified. Insurance companies will then redesign their formularies whenever the optimal strategy is updated. They will reimburse only those treatment changes that match the optimal strategy. In this situation, this research can not only deliver the right therapy to the right patient at the right time but also at the right cost, indirectly controlling healthcare costs by optimizing traditional treatments and delaying the use of innovative yet expensive treatments. Table 7.1. The final variable selection for IPW. X X 1 1 X 1 X 1 X 1 0 0 1 1 1 1 1 1 1 1 X X 0 0 X X 0 0 0 X X 0 0 0 X X X X X X X X X X 0 0 X X 0 0 0 0 0 0 0 0 X X X X X X X X 1 0 0 0 1 0 0 0 0 X X 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 X X 1 1 X X 0 0.28 0 X X 0 0 0 X X X X X X X X 1 0 0 0 1 1 0.48 1 0 X X 0 0.52 1 1 0.16 0.44 0 0 0 1 1 X X 1 1 X X 0 1 0 X X 0 0 1 X X X X X X X X 1 1 1 1 1 1 0 1 0 X X 0 0 1 1 1 0 1 0 1 1 1 X X 1 1 X X 0 1 0 X X 0 0 1 X X X X X X X X 1 1 1 1 1 1 0 1 0 X X 0 0 1 1 1 0 1 0 1 1 1 289 Variable (Intercept) Age (baseline) Predicted FEV1 in current visit Predicted FEV1 in current visit (baseline) Height Weight Height (baseline) Weight (baseline) Number of visit (spline) Number of visit Mutation 1 class Mutation 2 class F508 Disenrollment Death Hispanic Gender Race Smoking Transplant Arthropathy CFRD_status DIOS GERD Pancreatic insufficiency Pancreatitis IPCW (censor) Numerator 1 0 Probability of being selected Final decision Result from LASSO IPCW IPCW (censor) IPTW IPTW IPCW (end) IPCW (end) (censor) Numerator Denominator Denominator Numerator Denominator Numerator Denominator Numerator Denominator 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 Table 7.1. (continued). Variable X X 1 1 X 0 X 1 X 1 X X X X 0 0 0 0 X X 0.4 1 X X 0 1 X X 0 1 X X 0 0 X 0 X 0 X 0 X X 0 0 X 0.16 X 1 X 1 X X 0 0 X 0.6 X 1 X 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 290 TB Pneumothorax Hemoptysis Using any enzymes ABPA Aspergillus B. cepacia B. cenocepacia Burkholderia species Candida Mycobacterium gordonae MAI MRSA MSSA Other gram-negative microorganisms Serratia marcescens Staphylococcus aureus Stenotrophomonas/ Maltophilia Non-mucoid Pa PI Unknown type of mucoid Pa PI Smoking (baseline) Transplant (baseline) Arthropathy (baseline) CFRD_status (baseline) IPCW (censor) Numerator X X X X X X X X X X X X X X Probability of being selected Result from LASSO Final decision IPCW IPTW IPCW (censor) IPTW IPCW (end) IPCW (end) (censor) Numerator Denominator Denominator Numerator Denominator Numerator Denominator Numerator Denominator X 0 0 X X X 0 X X X 0 0 X 0.32 X 0 X 0 X 1 0 X 0 X 1 X 1 X 1 1 X 0 X 1 X 1 X 0 0 X 0.84 X 1 X 1 X 0 0 X 1 X 1 X 1 X 1 1 X 1 X 1 X 1 X 0 0 X 0.2 X 0 X 0 X 0 0 X 0 X 0 X 0 X 1 1 X 0 X 1 X 1 X 0 0 X 0 X 0 X 0 X 1 1 X 0 X 1 X 1 X 0 0 X 0 X 1 X 1 X 0 0 X 1 X 1 X 1 Table 7.1. (continued). Variable DIOS (baseline) GERD (baseline) Pancreatitis (baseline) Hemoptysis (baseline) Using any enzymes (baseline) ABPA (baseline) Aspergillus (baseline) B. cepacia (baseline) B. cenocepacia (baseline) Burkholderia species (baseline) Candida (baseline) MAI (baseline) MRSA (baseline) MSSA (baseline) Other gram-negative microorganisms (baseline) Serratia marcescens (baseline) Staphylococcus aureus (baseline) Stenotrophomonas/ Maltophilia (baseline) Non-mucoid Pa PI (baseline) IPCW (censor) Numerator 0 0 0 X Probability of being selected Result from LASSO Final decision IPCW IPTW IPCW (censor) IPTW IPCW (end) IPCW (end) (censor) Numerator Denominator Denominator Numerator Denominator Numerator Denominator Numerator Denominator 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 X X X X X X X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 X 0 0 0 X 0 0 1 1 0 0 1 1 0 0 0 X 0 0 0 X 0 0 1 0 0 0 1 0 0 0 1 X 0 0 1 X 0 0 0 0 0 0.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0.28 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0.04 0 0 0 0 0 0 0 0 0 0.04 0 0 0 0 0 1 1 1 0 0 1 1 1 1 291 Table 7.1. (continued). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.32 0.76 1 1 1 1 X X 0 0 X 1 X 1 X 1 0 0 1 1 0 0.96 1 1 1 1 0 0 0 1 0 0.6 1 1 1 1 0 0 0 0 0 1 1 1 1 1 X X 0 0 X 1 X 1 X 1 X X 0 0 X 1 X 1 X 1 X X 0 0 X 0.76 X 1 X 1 X X X X X X X X X X X X X X X X X X X X X X X X 1 1 1 1 1 1 X X X X X X X X X X X X X X 1 1 1 1 1 1 292 Variable Unknown type of mucoid Pa PI (baseline) Number of PEx in the past year in current visit (baseline) Number of PEx in the past year in current visit Drug resistance of aminoglycosides in current visit (baseline) Drug resistance of beta lactams in current visit (baseline) Drug resistance of quinolones in current visit (baseline) Drug resistance of aminoglycosides in current visit Drug resistance of beta lactams in current visit Drug resistance of quinolones in current visit Mucolytics Inhaled antibiotics Anti-inflammatories Bronchodilators Mucolytics (baseline) IPCW (censor) Numerator Probability of being selected Final decision Result from LASSO IPCW IPTW IPCW (censor) IPTW IPCW (end) IPCW (end) (censor) Numerator Denominator Denominator Numerator Denominator Numerator Denominator Numerator Denominator Table 7.1. (continued). Variable Inhaled antibiotics (baseline) Anti-inflammatories (baseline) Bronchodilators (baseline) IPCW (censor) Numerator Probability of being selected Result from LASSO Final decision IPCW IPTW IPCW (censor) IPTW IPCW (end) IPCW (end) (censor) Numerator Denominator Denominator Numerator Denominator Numerator Denominator Numerator Denominator X X X X 0.08 0 X X 0 0 X X X X 1 1 X X 1 1 X X X X 0 0 X X 0 0 Red font indicates that the change was made to match the baseline variable that was selected in this model; Pink font indicates that that the change was made to match the baseline variable in the denominator; Green font indicates that that the change was made to match the variable in the numerator; Blue font indicates that the change was made to match the variable in the IPCW; Orange font indicates that the change was made to match the variable in the IPTW; 293 Table 7.2. The estimate of variables in the numerator of the IPTW under three strategies using imputed dataset 2. Variable Strategy 53 Point 95% Wald Estimate Confidence Limits <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 0.982 0.952 1.012 0.979 0.949 1.010 0.984 0.953 1.015 0.993 0.991 0.995 0.993 0.991 0.995 0.993 0.991 0.995 0.986 0.979 0.993 0.985 0.978 0.993 0.985 0.977 0.992 1.006 0.998 1.013 1.007 0.999 1.014 1.007 0.999 1.015 1.861 1.341 2.583 2.004 1.429 2.810 2.097 1.478 2.975 1.849 1.344 2.546 1.959 1.408 2.725 2.125 1.509 2.992 1.640 1.138 2.363 1.799 1.235 2.619 1.918 1.302 2.824 1.082 0.698 1.678 1.160 0.741 1.815 1.271 0.804 2.010 Reference 1.775 1.274 2.473 1.865 1.325 2.625 1.943 1.364 2.767 1.546 1.075 2.223 1.661 1.144 2.413 1.822 1.241 2.675 1.361 0.713 2.598 1.330 0.700 2.525 1.229 0.647 2.334 1.421 0.724 2.788 1.264 0.646 2.470 1.185 0.606 2.319 1.230 0.588 2.575 1.339 0.643 2.791 1.233 0.591 2.575 7.924 2.524 0.756 8.422 Reference Reference 2.276 0.682 7.592 2.373 0.711 294 Censor Age Predicted FEV1 in current visit Height Weight Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Transplant status: No Had Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Table 7.2. (continued). Variable Transplant status: Will have CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD B. cepacia MRSA Other gram-negative microorganisms Non-mucoid Pa PI Number of PEx in the past year in current visit: 0 1 2 3 4 5 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits 0.896 0.314 2.556 0.965 0.338 Strategy 53 Point 95% Wald Estimate Confidence Limits 2.754 1.011 0.354 2.884 Reference 1.060 0.647 1.738 1.159 0.707 1.901 1.289 0.785 2.116 1.181 0.968 1.440 1.174 0.959 1.436 1.213 0.991 1.485 0.967 0.863 1.083 0.993 0.886 1.114 1.029 0.917 1.156 3.100 0.335 28.664 3.455 0.372 32.055 3.644 0.394 33.711 1.022 0.909 1.150 1.048 0.931 1.180 1.067 0.946 1.202 1.250 0.774 2.018 1.344 0.833 2.170 1.330 0.816 2.168 1.127 0.932 1.362 1.168 0.963 1.417 1.124 0.928 1.362 Reference 1.070 0.962 1.190 1.115 1.001 1.241 1.129 1.013 1.259 1.168 0.954 1.431 1.208 0.985 1.482 1.227 1.000 1.505 1.685 1.217 2.331 1.635 1.181 2.262 1.562 1.124 2.172 1.120 0.613 2.045 1.202 0.660 2.189 0.973 0.528 1.794 0.935 0.366 2.389 0.973 0.382 2.481 1.009 0.396 2.574 295 Table 7.2. (continued). Variable Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics: 0 1 2 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 1.792 1.214 2.647 1.784 1.209 2.633 1.802 1.219 2.663 0.785 0.145 4.266 0.963 0.193 4.808 0.993 0.191 5.155 Reference 0.533 0.282 1.010 0.578 0.306 1.089 0.572 0.302 1.085 0.417 0.047 3.688 0.822 0.099 6.802 0.729 0.082 6.442 1.017 0.550 1.881 1.006 0.538 1.881 1.122 0.605 2.079 2.743 0.569 13.231 1.169 0.209 6.528 1.227 0.212 7.085 Reference Reference 0.995 0.913 1.084 0.956 0.876 1.042 0.926 0.848 1.010 0.549 0.472 0.639 0.528 0.452 0.616 0.512 0.437 0.600 296 Table 7.2. (continued). Variable Anti-inflammatories: 0 1 2 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 0.611 0.540 0.692 0.606 0.534 0.688 0.608 0.534 0.691 0.576 0.332 0.999 0.630 0.363 1.093 0.564 0.319 0.996 All variables were measured at the baseline 297 Table 7.3. The estimate of variables in the numerator of the IPTW under three strategies using imputed dataset 8. Variable Point Estimate Strategy 53 95% Wald Confidence Limits <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 0.992 0.962 1.023 0.992 0.961 1.023 0.990 0.959 1.021 0.993 0.991 0.995 0.993 0.991 0.995 0.994 0.992 0.996 0.986 0.979 0.993 0.985 0.978 0.992 0.985 0.978 0.992 1.004 0.996 1.011 1.004 0.996 1.011 1.004 0.997 1.012 2.052 1.463 2.876 2.199 1.550 3.119 2.354 1.637 3.385 2.002 1.440 2.785 2.136 1.517 3.007 2.335 1.636 3.332 1.816 1.249 2.641 2.001 1.360 2.944 2.203 1.480 3.281 1.244 0.798 1.938 1.334 0.847 2.101 1.476 0.926 2.354 Reference 1.944 1.381 2.736 2.037 1.430 2.901 2.096 1.452 3.026 1.761 1.214 2.555 1.913 1.304 2.806 2.027 1.364 3.012 1.351 0.709 2.572 1.288 0.679 2.445 1.186 0.624 2.255 1.344 0.686 2.633 1.177 0.602 2.300 1.090 0.557 2.134 1.238 0.591 2.593 1.375 0.660 2.867 1.290 0.617 2.697 12.478 3.714 1.060 13.012 Reference Reference 3.490 0.997 12.209 3.555 1.013 298 Censor Age Predicted FEV1 in current visit Height Weight Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Transplant status: No Had Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Table 7.3. (continued). Variable Transplant status: Will have CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD B. cepacia MRSA Other gram-negative microorganisms Non-mucoid Pa PI Number of PEx in the past year in current visit: 0 1 2 3 4 5 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits 0.907 0.318 2.588 0.970 0.340 Point Estimate Strategy 53 95% Wald Confidence Limits 2.766 1.017 0.356 2.900 Reference 1.071 0.653 1.755 1.093 0.659 1.815 1.207 0.726 2.006 1.146 0.938 1.399 1.141 0.932 1.396 1.191 0.972 1.458 0.983 0.877 1.101 1.006 0.896 1.128 1.041 0.927 1.169 3.205 0.349 29.414 3.520 0.382 32.482 3.739 0.405 34.482 1.009 0.897 1.136 1.055 0.937 1.189 1.076 0.954 1.213 1.273 0.789 2.055 1.352 0.838 2.181 1.350 0.828 2.201 1.192 0.985 1.441 1.205 0.992 1.464 1.156 0.951 1.404 Reference 1.067 0.959 1.187 1.108 0.995 1.234 1.130 1.013 1.259 1.174 0.958 1.439 1.162 0.946 1.428 1.200 0.977 1.475 1.683 1.215 2.332 1.756 1.269 2.430 1.688 1.215 2.346 1.145 0.627 2.089 1.197 0.657 2.178 1.234 0.678 2.246 0.946 0.370 2.419 1.025 0.401 2.623 1.061 0.415 2.714 299 Table 7.3. (continued). Variable Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics: 0 1 2 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Confidence Limits Estimate Confidence Limits Estimate Point Estimate Strategy 53 95% Wald Confidence Limits Reference 1.807 1.223 2.671 1.766 1.195 2.609 1.794 1.212 2.656 0.814 0.150 4.423 0.973 0.194 4.881 1.025 0.197 5.340 0.579 0.309 1.086 0.645 0.344 1.208 0.643 0.341 1.212 0.414 0.047 3.678 0.812 0.097 6.802 0.741 0.083 6.593 Reference Reference 1.011 0.548 1.863 0.980 0.526 1.828 1.090 0.589 2.018 2.804 0.581 13.523 1.192 0.212 6.700 1.204 0.207 6.994 Reference 0.969 0.889 1.056 0.928 0.851 1.012 0.884 0.810 0.965 0.539 0.462 0.628 0.512 0.438 0.598 0.490 0.417 0.575 300 Table 7.3. (continued). Variable Anti-inflammatories: 0 1 2 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Point Estimate Strategy 53 95% Wald Confidence Limits Reference 0.615 0.543 0.697 0.612 0.539 0.695 0.636 0.559 0.723 0.593 0.342 1.028 0.646 0.372 1.120 0.576 0.326 1.019 * All variables were measured at the baseline 301 Table 7.4. The estimate of variables in the denominator of the IPTW under three strategies using imputed dataset 2. Variable 1.018 1.014 1.021 1.018 1.015 1.021 1.018 1.015 1.021 1.020 1.004 1.005 0.993 1.035 1.015 1.019 1.004 1.005 0.993 1.034 1.015 1.013 1.004 0.999 0.994 1.028 1.016 1.829 1.794 1.796 0.964 1.284 1.269 1.211 0.606 2.607 2.535 2.663 1.534 1.980 1.935 2.002 1.083 2.853 2.766 3.002 1.738 2.101 2.110 2.146 1.207 1.444 1.461 1.416 0.746 3.058 3.048 3.250 1.955 1.626 1.219 1.141 0.746 2.317 1.992 1.374 1.353 1.336 0.675 Reference 1.731 1.202 1.287 0.779 2.494 2.126 1.820 1.552 1.250 0.928 2.650 2.595 0.737 0.557 0.780 0.355 0.261 0.342 1.529 1.190 1.777 0.869 0.599 0.995 1.795 1.273 2.259 0.763 0.539 0.811 0.372 0.255 0.359 1.565 1.139 1.836 4.506 Reference 1.064 0.191 5.926 1.097 0.196 6.137 0.781 0.135 0.421 0.282 0.439 Reference 302 Censor Age* Predicted FEV1 in current visit* Height* Weight* Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Transplant status: No Had Odds Ratio Estimates Strategy 33 Strategy 43 Strategy 53 Point 95% Wald Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Estimate Confidence Limits <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 0.899 0.869 0.930 0.898 0.868 0.929 0.904 0.874 0.935 Table 7.4. (continued). Variable Transplant status: Will have CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD B. cepacia* MRSA* Other gram-negative microorganisms* Non-mucoid Pa PI* Number of PEx in the past year in current visit: 0 1 2 3 4 5 Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate 0.666 0.189 Strategy 53 Point 95% Wald Estimate Confidence Limits 2.341 0.849 0.240 3.002 0.893 0.253 3.156 1.768 1.039 0.574 1.882 0.927 0.516 1.666 Reference 0.982 0.546 0.821 0.596 1.131 0.833 0.604 1.149 0.842 0.610 1.161 0.879 0.107 0.963 0.716 0.006 0.800 1.079 1.868 1.159 0.859 0.098 1.010 0.700 0.006 0.839 1.053 1.709 1.216 0.845 0.138 0.995 0.689 0.008 0.825 1.036 2.363 1.199 1.373 0.754 2.498 1.372 0.756 2.490 1.334 0.727 2.448 1.402 1.065 1.846 1.574 1.195 2.074 1.605 1.221 2.111 0.746 0.696 0.748 1.054 1.317 Reference 0.693 0.607 0.572 0.445 0.478 0.314 0.617 0.294 0.429 0.141 0.790 0.735 0.729 1.295 1.300 0.698 0.582 0.441 0.495 0.441 0.611 0.453 0.288 0.237 0.144 0.796 0.748 0.674 1.036 1.346 0.655 0.542 0.488 0.503 0.433 0.574 0.422 0.318 0.240 0.143 303 Table 7.4. (continued). Variable Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics: 0 1 2 Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate 1.465 7.393 0.907 0.609 Strategy 53 Point 95% Wald Estimate Confidence Limits 2.366 89.816 Reference 1.588 0.985 9.748 0.824 2.562 115.259 1.665 10.491 1.033 0.922 2.682 119.383 Reference 0.543 0.253 3.150 0.110 1.166 90.239 0.522 2.944 0.244 0.103 1.117 84.061 0.525 1.979 0.244 0.072 1.131 54.655 1.003 0.145 0.483 0.008 2.082 2.778 0.932 0.077 Reference 0.447 0.004 1.944 1.664 0.990 0.080 0.477 0.004 2.054 1.713 1.202 1.646 1.079 1.335 1.339 2.029 1.148 1.624 Reference 1.030 1.313 1.280 2.009 1.120 1.518 1.004 1.224 1.250 1.882 304 Table 7.4. (continued). Variable Anti-inflammatories: 0 1 2 Predicted FEV1 in current visit Height Weight Number of visit (spline) Number of visit Smoking: No Yes Unknown Transplant status: No Had Will have CFRD status: No Impaired glucose tolerance Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Reference 0.963 0.811 0.965 0.451 1.012 0.706 0.853 0.332 1.200 1.499 0.959 0.956 0.962 0.959 0.978 1.003 0.999 1.282 0.966 0.993 0.998 1.242 0.991 1.013 0.999 1.323 1.311 0.973 1.098 0.575 0.550 1.064 0.771 0.839 Strategy 53 Point 95% Wald Estimate Confidence Limits 1.144 2.067 0.993 0.733 0.835 0.338 1.183 1.589 0.955 0.962 0.959 0.956 0.963 0.980 0.967 0.993 0.985 0.973 0.998 1.002 0.993 1.012 1.001 0.992 1.011 0.999 0.999 0.999 0.999 0.999 0.999 1.246 1.207 1.285 1.217 1.180 1.256 1.565 1.648 1.292 1.080 1.545 1.277 1.068 1.528 1.086 0.647 1.822 1.004 0.608 1.659 0.206 0.221 1.473 5.130 0.795 0.330 1.914 0.730 0.301 1.770 1.119 0.260 4.813 1.007 0.232 4.370 0.597 0.597 0.997 1.178 0.806 0.624 1.041 0.774 0.599 1.000 0.903 0.643 1.269 0.889 0.633 1.249 Reference Reference 305 Table 7.4. (continued). Variable CFRD status: CFRD with or without fasting hyperglycemia GERD Pancreatic insufficiency Pancreatitis Hemoptysis Using any enzymes ABPA Aspergillus B. cepacia Candida MAI MRSA MSSA Other gram-negative microorganisms Staphylococcus aureus Non-mucoid Pa PI Unknown type of mucoid Pa PI Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 1.108 0.881 0.434 0.902 1.164 1.261 1.266 11.274 1.001 1.881 0.994 0.750 0.927 0.631 0.238 0.189 0.963 1.014 1.114 5.128 0.855 0.897 0.857 0.622 1.323 1.229 0.790 4.312 1.406 1.567 1.440 24.787 1.171 3.945 1.152 0.904 1.002 0.735 1.021 0.662 0.576 1.172 0.982 1.399 1.249 1.048 1.489 0.859 0.613 1.204 0.930 0.662 1.307 0.418 0.230 0.762 0.443 0.243 0.806 0.779 0.171 3.549 0.848 0.185 3.881 1.186 0.980 1.436 1.162 0.959 1.408 1.357 1.094 1.684 1.214 0.977 1.508 1.320 1.160 1.501 1.347 1.183 1.533 12.154 5.602 26.371 10.009 4.736 21.156 0.960 0.820 1.125 0.951 0.811 1.115 1.550 0.717 3.352 1.076 0.463 2.496 1.002 0.864 1.162 0.991 0.853 1.151 0.787 0.652 0.949 0.753 0.623 0.910 1.367 1.064 0.785 1.443 1.048 0.771 1.424 0.828 0.522 1.259 0.838 0.950 0.770 1.173 0.999 0.808 1.235 0.604 0.478 0.763 0.548 0.435 0.690 0.411 0.807 0.555 0.396 0.778 0.540 0.387 0.753 306 Table 7.4. (continued). Variable Number of PEx in the past year in current visit: 0 1 2 3 4 5 Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate 3.272 4.483 8.026 3.926 4.249 1.263 0.017 0.625 0.262 2.903 3.651 5.708 2.119 2.050 0.930 <0.001 0.377 0.013 3.688 5.505 11.284 7.272 8.804 1.717 0.423 1.037 5.430 Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 3.128 2.774 3.526 3.231 2.868 3.641 4.465 3.638 5.481 4.399 3.585 5.399 8.045 5.738 11.280 7.951 5.678 11.136 3.495 1.893 6.450 3.836 2.082 7.069 3.995 1.938 8.235 4.227 2.025 8.823 Reference 1.147 0.845 1.555 1.048 0.772 1.423 0.023 <0.001 0.529 0.021 <0.001 0.504 Reference 0.644 0.385 1.074 0.708 0.427 1.172 0.219 0.010 4.591 0.211 0.010 4.416 307 Table 7.4. (continued). Variable Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics: 0 1 2 Inhaled antibiotics: 0 1 2 3 Anti-inflammatories: 0 1 2 Bronchodilators: 0 1 2 Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Reference 1.477 54.302 0.914 3.862 2.385 763.538 0.626 0.151 0.546 0.124 0.717 0.184 0.622 0.148 0.407 0.221 0.016 0.369 0.168 0.002 0.449 0.290 0.141 0.406 0.419 0.624 0.368 0.384 0.477 1.014 0.616 0.693 0.675 0.882 1.507 0.935 2.428 1.638 1.019 2.633 43.272 3.000 624.226 41.581 2.925 591.078 0.543 0.713 0.577 0.504 0.662 0.121 0.180 0.148 0.121 0.180 0.368 0.448 0.416 0.377 0.460 0.182 0.137 0.243 0.191 0.143 0.254 0.018 0.002 0.155 0.022 0.003 0.181 Reference Reference Reference 0.441 0.388 0.503 0.428 0.376 0.488 0.515 0.313 0.848 0.568 0.349 0.924 Reference 0.621 0.566 0.682 0.629 0.573 0.690 0.718 0.564 0.914 0.719 0.563 0.917 308 Variables were measured at the baseline 0.562 0.545 Strategy 53 Point 95% Wald Estimate Confidence Limits Table 7.5. The estimate of variables in the denominator of the IPTW under three strategies using imputed dataset 8. Variable Strategy 53 Point 95% Wald Estimate Confidence Limits <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 0.906 0.876 0.937 0.908 0.877 0.939 0.909 0.879 0.941 1.017 1.014 1.020 1.017 1.014 1.021 1.017 1.014 1.021 1.022 1.007 1.038 1.021 1.006 1.036 1.014 1.000 1.028 1.000 0.990 1.010 0.999 0.989 1.010 1.000 0.989 1.010 2.012 1.398 2.895 2.161 1.483 3.149 2.390 1.621 3.523 1.922 1.346 2.745 2.091 1.446 3.024 2.349 1.605 3.437 1.956 1.307 2.927 2.196 1.449 3.327 2.491 1.627 3.815 1.143 0.715 1.829 1.288 0.798 2.079 1.468 0.900 2.396 Reference 1.767 1.229 2.543 1.877 1.289 2.733 1.981 1.343 2.922 1.447 0.879 2.384 1.503 0.902 2.506 1.745 1.033 2.948 0.783 0.379 1.618 0.846 0.408 1.753 0.721 0.351 1.485 0.580 0.273 1.233 0.578 0.271 1.234 0.496 0.234 1.052 0.902 0.397 2.049 1.119 0.491 2.546 0.937 0.413 2.128 4.761 0.829 0.147 4.678 Reference Reference 0.550 0.096 3.158 0.842 0.149 309 Censor Age Predicted FEV1 in current visit* Height* Weight* Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Transplant status: No Had Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Table 7.5. (continued). Variable Transplant status: Will have CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD B. cepacia* MRSA* Other gram-negative microorganisms* Non-mucoid Pa PI* Number of PEx in the past year in current visit: 0 1 2 3 4 5 Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate 0.909 0.264 3.133 0.980 0.281 Strategy 53 Point 95% Wald Estimate Confidence Limits 3.411 1.015 0.289 3.568 Reference 0.966 0.537 1.738 0.952 0.521 1.739 1.020 0.556 1.871 0.845 0.613 1.165 0.846 0.613 1.169 0.896 0.649 1.236 0.871 0.709 1.071 0.857 0.698 1.053 0.837 0.682 1.028 0.104 0.006 1.876 0.106 0.006 1.891 0.144 0.008 2.531 0.975 0.810 1.173 1.037 0.861 1.248 1.017 0.843 1.226 1.315 0.724 2.389 1.267 0.699 2.297 1.286 0.701 2.359 1.473 1.118 1.940 1.592 1.206 2.101 1.624 1.232 2.141 Reference 0.647 0.568 0.738 0.681 0.597 0.777 0.700 0.613 0.799 0.537 0.418 0.690 0.563 0.437 0.725 0.591 0.459 0.760 0.465 0.302 0.717 0.494 0.322 0.759 0.462 0.300 0.713 0.529 0.251 1.116 0.619 0.292 1.313 0.622 0.295 1.312 0.447 0.144 1.382 0.487 0.157 1.516 0.509 0.162 1.597 310 Table 7.5. (continued). Variable Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics: 0 1 2 Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 1.489 0.922 2.404 1.590 0.984 2.569 1.689 1.047 2.726 6.757 0.546 83.696 8.727 0.729 104.464 9.872 0.853 114.307 Reference 0.493 0.231 1.054 0.559 0.261 1.198 0.528 0.247 1.130 1.306 0.052 32.630 2.345 0.089 61.708 2.143 0.083 55.167 1.032 0.501 2.127 0.907 0.438 1.879 0.944 0.458 1.945 0.249 0.017 3.589 0.115 0.007 1.956 0.119 0.007 1.968 1.177 1.056 1.311 1.121 1.005 1.249 1.076 0.965 1.200 1.593 1.291 1.965 1.509 1.219 1.867 1.430 1.153 1.775 Reference Reference 311 Table 7.5. (continued). Variable Anti-inflammatories: 0 1 2 Predicted FEV1 in current visit Height Weight Number of visit (spline) Number of visit Smoking: No Yes Unknown Transplant status: No Had Will have CFRD status: No Impaired glucose tolerance Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 1.014 0.855 1.204 0.962 0.809 1.143 1.021 0.858 1.216 0.740 0.351 1.560 1.099 0.510 2.365 0.762 0.351 1.653 0.959 0.956 0.963 0.960 0.956 0.963 0.960 0.957 0.963 0.976 0.963 0.989 0.978 0.965 0.991 0.985 0.973 0.998 1.005 0.996 1.015 1.004 0.995 1.014 1.004 0.995 1.014 0.999 0.998 0.999 0.999 0.999 0.999 0.999 0.999 0.999 1.283 1.243 1.324 1.247 1.209 1.287 1.214 1.177 1.252 1.220 1.024 1.454 1.205 1.009 1.438 1.232 1.031 1.472 0.762 0.453 1.281 0.873 0.525 1.453 0.833 0.507 1.367 Reference 0.882 0.310 2.515 0.798 0.329 1.932 0.723 0.296 1.768 2.717 0.591 12.492 1.553 0.387 6.232 1.421 0.353 5.716 Reference 0.801 0.620 1.035 0.828 0.640 1.071 0.824 0.638 1.066 0.831 0.592 1.166 0.866 0.615 1.219 0.855 0.608 1.203 312 Table 7.5. (continued). Variable CFRD status: CFRD with or without fasting hyperglycemia GERD Pancreatic insufficiency Pancreatitis Hemoptysis Using any enzymes ABPA Aspergillus B. cepacia Candida MAI MRSA MSSA Other gram-negative microorganisms Staphylococcus aureus Non-mucoid Pa PI Unknown type of mucoid Pa PI Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 1.123 0.938 1.343 1.182 0.989 1.412 1.268 1.063 1.513 0.934 0.668 1.306 0.889 0.633 1.251 0.952 0.675 1.342 0.393 0.211 0.730 0.376 0.202 0.700 0.384 0.207 0.712 0.915 0.191 4.387 0.789 0.172 3.607 0.891 0.193 4.108 1.189 0.983 1.438 1.204 0.993 1.459 1.141 0.941 1.383 1.226 0.985 1.527 1.319 1.062 1.639 1.228 0.988 1.526 1.272 1.118 1.446 1.329 1.167 1.512 1.344 1.181 1.531 12.637 5.585 28.593 12.278 5.548 27.171 9.954 4.613 21.483 0.987 0.844 1.155 0.947 0.809 1.109 0.935 0.798 1.096 1.789 0.855 3.744 1.467 0.680 3.166 1.036 0.448 2.400 1.003 0.865 1.164 1.000 0.862 1.161 0.996 0.858 1.157 0.772 0.640 0.930 0.798 0.661 0.962 0.750 0.621 0.906 1.038 0.762 1.415 1.098 0.811 1.487 1.080 0.796 1.467 0.973 0.789 1.200 0.925 0.750 1.141 0.975 0.789 1.205 0.679 0.536 0.861 0.629 0.498 0.796 0.558 0.443 0.704 0.570 0.407 0.800 0.569 0.406 0.797 0.540 0.388 0.754 313 Table 7.5. (continued). Variable Number of PEx in the past year in current visit: 0 1 2 3 4 5 Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 3.326 2.950 3.750 3.226 2.861 3.637 3.212 2.850 3.619 4.486 3.653 5.510 4.466 3.636 5.485 4.339 3.537 5.322 9.116 6.502 12.782 8.489 6.056 11.899 8.353 5.972 11.684 3.925 2.110 7.301 3.491 1.879 6.484 3.646 1.961 6.779 3.803 1.730 8.361 4.106 1.905 8.853 4.015 1.819 8.861 Reference 1.274 0.938 1.729 1.164 0.859 1.578 1.061 0.782 1.440 0.023 <0.001 0.525 0.029 0.001 0.635 0.027 0.001 0.578 Reference 0.703 0.429 1.150 0.667 0.403 1.103 0.755 0.461 1.236 0.355 0.018 7.037 0.276 0.014 5.497 0.267 0.014 5.152 314 Table 7.5. (continued). Variable Drug resistance of quinolones in current visit: No Yes Testing not done Mucolytics: 0 1 2 Inhaled antibiotics: 0 1 2 3 Anti-inflammatories: 0 1 2 Bronchodilators: 0 1 2 Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 1.473 0.918 2.364 1.611 1.009 2.574 1.773 1.115 2.819 30.683 3.206 293.669 27.957 2.750 284.246 26.952 2.727 266.350 0.639 0.558 0.732 0.637 0.556 0.730 0.602 0.526 0.690 0.155 0.128 0.189 0.157 0.129 0.192 0.156 0.128 0.190 Reference Reference 0.400 0.363 0.441 0.402 0.365 0.444 0.411 0.372 0.454 0.210 0.159 0.276 0.167 0.125 0.223 0.170 0.127 0.228 0.009 <0.001 0.094 0.011 0.001 0.105 0.014 0.001 0.129 Reference 0.418 0.367 0.476 0.446 0.391 0.507 0.436 0.382 0.497 0.619 0.384 0.998 0.475 0.286 0.790 0.571 0.350 0.932 0.617 0.563 0.677 0.616 0.562 0.676 0.622 0.567 0.683 0.683 0.536 0.869 0.692 0.542 0.883 0.716 0.560 0.914 Reference 315 Variables were measured at the baseline Table 7.6. The estimate of variables in the numerator of the IPCW under three strategies using imputed dataset 2. Variable Strategy 53 Point 95% Wald Estimate Confidence Limits 1.009 0.965 1.056 1.013 0.969 1.059 1.005 0.962 1.049 1.005 1.002 1.008 1.005 1.002 1.007 1.004 1.001 1.007 0.981 0.971 0.99 0.98 0.971 0.99 0.982 0.972 0.991 1.012 1.002 1.022 1.011 1.001 1.021 1.011 1.001 1.02 0.783 0.549 1.117 0.761 0.539 1.074 0.745 0.534 1.04 0.677 0.482 0.95 0.669 0.482 0.929 0.659 0.48 0.905 0.827 0.546 1.252 0.814 0.544 1.22 0.779 0.526 1.154 1.119 0.714 1.754 1.075 0.693 1.668 1.035 0.674 1.59 0.743 0.517 1.067 0.728 0.512 1.034 0.707 0.503 0.993 0.762 0.497 1.168 0.725 0.479 1.099 0.697 0.465 1.046 0.908 0.415 1.986 1.031 0.473 2.248 1.064 0.489 2.319 0.932 0.405 2.143 1.068 0.468 2.437 1.059 0.464 2.414 0.955 0.388 2.352 1.091 0.444 2.678 1.141 0.467 2.784 Reference Reference Reference 8.152 2.491 26.671 8.051 2.47 26.243 7.481 2.299 24.348 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 316 Age Predicted FEV1 in current visit Height Weight Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Transplant status: No Had Will have Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald 95% Wald Point Estimate Confidence Limits Confidence Limits Estimate Table 7.6. (continued). Variable Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD B. cepacia MRSA Other gram-negative microorganisms Non-mucoid Pa PI Number of PEx in the past year in current visit: 0 1 2 3 4 5 Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 2.457 1.494 4.042 2.414 1.468 3.968 2.496 1.517 4.107 0.906 0.645 1.274 0.958 0.692 1.326 0.953 0.691 1.314 1.22 1.041 1.431 1.215 1.039 1.42 1.242 1.066 1.447 8.082 1.609 40.596 7.915 1.577 39.738 7.937 1.582 39.811 1.154 0.972 1.371 1.146 0.967 1.357 1.138 0.964 1.343 3.759 2.427 5.82 3.637 2.351 5.626 3.682 2.397 5.653 1.61 1.225 2.117 1.575 1.204 2.059 1.594 1.23 2.065 Reference 1.066 0.908 1.252 1.066 0.911 1.248 1.07 0.917 1.249 0.989 0.706 1.384 0.943 0.674 1.319 0.919 0.66 1.279 0.815 0.419 1.587 0.754 0.389 1.462 0.852 0.466 1.56 0.845 0.263 2.715 0.793 0.247 2.549 0.931 0.337 2.57 1.852 0.66 5.196 1.748 0.625 4.887 1.679 0.6 4.695 317 Table 7.6. (continued). Variable Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Confidence Limits Estimate Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 0.829 0.43 1.599 0.754 0.391 1.453 0.764 0.4 1.458 0.387 0.052 2.86 0.216 0.032 1.456 0.224 0.033 1.503 Reference 0.559 0.21 1.488 0.517 0.197 1.361 0.548 0.209 1.433 5.244 0.805 34.167 3.051 0.445 20.933 3.03 0.452 20.321 Reference 1.376 0.582 3.253 1.616 0.719 3.634 1.376 0.591 3.207 0.413 0.036 4.733 1.235 0.188 8.118 1.22 0.19 7.822 All variables were measured at the baseline 318 Table 7.7. The estimate of variables in the numerator of the IPCW under three strategies using imputed dataset 8. Variable Strategy 53 Point 95% Wald Estimate Confidence Limits 1.002 0.958 1.048 1.001 0.958 1.046 0.998 0.956 1.042 1.004 1.001 1.007 1.004 1.001 1.007 1.004 1.001 1.007 0.981 0.972 0.991 0.982 0.973 0.991 0.982 0.973 0.991 1.012 1.003 1.021 1.011 1.002 1.020 1.011 1.002 1.020 0.733 0.519 1.034 0.716 0.512 1.001 0.697 0.504 0.965 0.644 0.464 0.893 0.638 0.464 0.877 0.629 0.462 0.857 0.777 0.518 1.166 0.770 0.519 1.144 0.733 0.498 1.080 1.065 0.685 1.656 1.026 0.667 1.580 1.008 0.661 1.539 0.699 0.491 0.994 0.687 0.488 0.967 0.672 0.483 0.936 0.700 0.459 1.067 0.681 0.452 1.026 0.681 0.458 1.013 0.980 0.449 2.139 1.060 0.486 2.311 1.082 0.497 2.357 1.009 0.441 2.310 1.100 0.483 2.505 1.086 0.478 2.469 1.123 0.459 2.749 1.175 0.478 2.887 1.237 0.506 3.020 Reference Reference Reference 9.539 2.869 31.710 9.378 2.824 31.142 9.040 2.729 29.947 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 319 Age Predicted FEV1 in current visit Height Weight Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race Caucasian Black Asian Others Transplant status: No Had Will have Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Table 7.7. (continued). Variable Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD B. cepacia MRSA Other gram-negative microorganisms Non-mucoid Pa PI Number of PEx in the past year in current visit: 0 1 2 3 4 5 Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 2.426 1.475 3.989 2.518 1.550 4.090 2.614 1.607 4.251 0.934 0.667 1.309 0.971 0.704 1.341 0.946 0.686 1.305 1.219 1.041 1.429 1.216 1.041 1.420 1.252 1.075 1.457 8.141 1.623 40.843 8.019 1.600 40.197 8.053 1.606 40.365 1.171 0.988 1.388 1.160 0.981 1.372 1.157 0.981 1.365 3.702 2.392 5.729 3.561 2.303 5.507 3.669 2.389 5.635 1.542 1.176 2.022 1.525 1.167 1.992 1.576 1.214 2.047 1.079 0.921 1.265 1.102 0.943 1.287 1.100 0.944 1.281 0.971 0.694 1.360 1.005 0.726 1.391 0.959 0.693 1.327 0.847 0.439 1.632 0.812 0.423 1.562 0.910 0.500 1.654 0.826 0.257 2.656 0.788 0.245 2.530 0.748 0.233 2.403 1.817 0.648 5.095 1.782 0.636 4.993 1.717 0.613 4.810 Reference 320 Table 7.7. (continued). Variable Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Confidence Limits Estimate Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 0.789 0.403 1.543 0.713 0.364 1.394 0.726 0.375 1.408 0.376 0.050 2.851 0.209 0.031 1.437 0.221 0.033 1.497 Reference 0.540 0.193 1.511 0.498 0.179 1.385 0.539 0.195 1.489 5.165 0.792 33.683 3.032 0.446 20.590 2.971 0.445 19.841 1.233 0.502 3.025 1.475 0.636 3.421 1.232 0.509 2.983 0.416 0.036 4.834 1.247 0.192 8.094 1.254 0.197 7.994 Reference * All variables were measured at the baseline 321 Table 7.8. The estimate of variables in the denominator of IPCW under three strategies using imputed dataset 2. Variable Strategy 53 Point 95% Wald Estimate Confidence Limits 0.995 0.945 1.047 0.995 0.945 1.046 0.984 0.936 1.034 1.003 0.999 1.007 1.004 1.000 1.008 1.003 1.000 1.007 0.989 0.974 1.004 0.990 0.975 1.004 0.991 0.977 1.006 1.019 1.005 1.032 1.017 1.004 1.030 1.017 1.004 1.030 1.093 0.730 1.636 1.076 0.725 1.597 1.022 0.699 1.495 0.927 0.628 1.369 0.925 0.632 1.355 0.908 0.629 1.310 1.182 0.744 1.877 1.220 0.776 1.921 1.175 0.757 1.824 1.295 0.800 2.097 1.284 0.803 2.055 1.267 0.800 2.005 0.969 0.650 1.446 0.943 0.638 1.394 0.899 0.616 1.311 0.821 0.453 1.489 0.729 0.408 1.305 0.701 0.398 1.237 0.699 0.315 1.551 0.784 0.355 1.733 0.795 0.360 1.753 0.720 0.307 1.689 0.799 0.344 1.854 0.791 0.341 1.833 0.770 0.306 1.933 0.891 0.356 2.228 0.917 0.369 2.277 Reference Reference Reference 9.550 1.688 54.046 9.339 1.687 51.701 9.598 1.543 59.709 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 322 Age* Predicted FEV1 in current visit* Height* Weight* Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Transplant status: No Had Will have Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Table 7.8. (continued). Variable CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD* B. cepacia* MRSA* Other gram-negative microorganisms* Non-mucoid Pa PI* Number of PEx in the past year in current visit: 0 1 2 3 4 5 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 3.211 1.829 5.638 3.052 1.748 5.328 3.086 1.767 5.389 1.304 0.823 2.067 1.231 0.790 1.919 1.158 0.752 1.784 1.053 0.824 1.346 1.064 0.839 1.350 1.106 0.877 1.395 8.570 1.383 53.115 10.873 1.761 67.116 9.633 1.598 58.088 1.288 1.011 1.640 1.262 0.999 1.595 1.268 1.009 1.593 2.978 1.794 4.942 3.084 1.865 5.100 3.118 1.902 5.113 1.777 1.228 2.571 1.842 1.286 2.639 1.884 1.325 2.678 1.024 0.857 1.223 1.051 0.883 1.250 1.053 0.887 1.248 0.968 0.669 1.401 0.963 0.667 1.390 0.962 0.670 1.383 0.675 0.323 1.409 0.698 0.339 1.436 0.737 0.377 1.439 0.910 0.265 3.125 0.918 0.267 3.160 0.905 0.310 2.643 0.833 0.230 3.018 0.814 0.228 2.910 0.715 0.202 2.535 Reference 323 Table 7.8. (continued). Variable Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 0.952 0.446 2.033 0.918 0.432 1.953 0.951 0.453 1.995 0.820 0.077 8.750 0.449 0.037 5.444 0.471 0.039 5.640 Reference 0.510 0.152 1.710 0.443 0.135 1.448 0.520 0.158 1.707 11.029 0.692 175.899 3.962 0.288 54.438 4.069 0.299 55.435 1.177 0.411 3.374 1.485 0.555 3.973 1.146 0.410 3.207 0.067 0.004 1.189 0.355 0.034 3.681 0.336 0.033 3.455 1.003 0.999 1.008 1.003 0.999 1.007 1.003 0.999 1.007 0.981 0.970 0.993 0.982 0.971 0.993 0.981 0.971 0.992 0.990 0.980 1.001 0.990 0.980 1.001 0.991 0.981 1.001 1.001 1.001 1.001 1.001 1.001 1.001 1.001 1.001 1.001 0.941 0.908 0.975 0.936 0.904 0.968 0.933 0.902 0.965 Reference 324 Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Predicted FEV1 in current visit Height Weight Number of visit (spline) Number of visit Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Table 7.8. (continued). Variable Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Smoking: 0.872 0.686 1.108 0.856 1.104 0.606 2.010 1.145 0.678 1.080 0.825 0.659 1.032 0.641 2.044 1.125 0.651 1.946 Reference 0.146 0.044 0.480 0.410 0.130 1.298 0.408 0.131 1.273 0.189 0.036 0.999 0.542 0.108 2.730 0.502 0.088 2.857 Reference 1.276 0.920 1.769 1.191 0.872 1.628 1.098 0.816 1.480 1.317 0.896 1.937 1.297 0.899 1.871 1.214 0.852 1.730 Reference 1.204 0.979 1.480 1.189 0.976 1.449 1.203 0.992 1.458 0.803 0.530 1.216 0.711 0.476 1.063 0.719 0.486 1.063 1.185 0.698 2.014 1.097 0.651 1.849 1.021 0.614 1.695 4.263 1.391 13.070 2.848 1.099 7.381 2.740 1.056 7.107 0.706 0.541 0.921 0.708 0.543 0.922 0.710 0.549 0.919 0.887 0.639 1.230 0.895 0.651 1.231 0.928 0.689 1.250 0.883 0.740 1.053 0.914 0.770 1.084 0.914 0.774 1.080 1.550 0.754 3.189 1.208 0.588 2.479 1.363 0.697 2.665 1.058 0.885 1.265 1.032 0.867 1.227 1.067 0.902 1.263 325 No Yes Unknown Transplant status: No Had Will have CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD Pancreatic insufficiency Pancreatitis Hemoptysis Using any enzymes ABPA Aspergillus B. cepacia Candida Table 7.8. (continued). Variable Strategy 53 Point 95% Wald Estimate Confidence Limits 1.560 0.874 2.783 1.631 0.943 2.821 1.688 0.997 2.856 1.038 0.867 1.243 1.035 0.870 1.232 1.058 0.895 1.252 1.339 1.035 1.732 1.259 0.982 1.614 1.286 1.007 1.641 1.723 1.365 2.176 1.636 1.303 2.054 1.680 1.346 2.097 0.921 0.680 1.248 0.976 0.727 1.310 0.966 0.723 1.290 1.046 0.750 1.458 1.001 0.727 1.379 1.023 0.750 1.394 0.782 0.479 1.279 0.837 0.521 1.345 0.856 0.543 1.348 0.998 0.833 1.195 0.983 0.824 1.173 0.987 0.831 1.172 1.302 0.965 1.756 1.313 0.985 1.751 1.335 1.012 1.760 1.498 0.885 2.535 1.287 0.771 2.147 1.083 0.647 1.813 1.639 0.704 3.817 1.969 0.912 4.253 1.933 0.924 4.046 0.811 0.247 2.666 0.998 0.314 3.178 1.140 0.399 3.259 1.448 0.926 0.614 1.396 Reference Reference 1.045 0.665 1.643 0.942 0.612 326 MAI MRSA MSSA Other gram-negative microorganisms Staphylococcus aureus Non-mucoid Pa PI Unknown type of mucoid Pa PI Number of PEx in the past year in current visit: 0 1 2 3 4 5 Drug resistance of aminoglycosides in current visit: No Yes Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Table 7.8. (continued). Variable Drug resistance of aminoglycosides in current visit: Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate 0.068 0.008 0.583 0.060 0.670 0.328 1.371 0.705 0.968 0.092 10.182 1.939 0.008 Strategy 53 Point 95% Wald Estimate Confidence Limits 0.470 0.062 0.008 0.478 0.348 1.426 0.667 0.330 1.347 0.228 16.499 1.812 0.213 15.372 Reference Reference 1.259 0.707 2.241 1.277 0.738 2.209 1.338 0.779 2.298 16.468 2.611 103.865 8.995 1.617 50.026 9.331 1.684 51.695 * Variables were measured at the baseline 327 Table 7.9. The estimate of variables in the denominator of the IPCW under three strategies using imputed dataset 8. Variable Strategy 53 Point 95% Wald Estimate Confidence Limits 0.990 0.940 1.042 0.983 0.934 1.034 0.973 0.926 1.023 1.003 0.999 1.007 1.004 1.000 1.008 1.004 1.000 1.007 0.992 0.978 1.007 0.993 0.979 1.008 0.994 0.980 1.008 1.018 1.007 1.030 1.017 1.005 1.028 1.017 1.006 1.029 1.031 0.695 1.529 1.017 0.691 1.497 0.979 0.674 1.423 0.873 0.597 1.276 0.873 0.601 1.266 0.865 0.603 1.239 1.106 0.702 1.741 1.147 0.735 1.792 1.106 0.716 1.707 1.233 0.768 1.979 1.232 0.776 1.956 1.229 0.782 1.932 0.909 0.615 1.343 0.884 0.603 1.294 0.849 0.586 1.229 0.686 0.382 1.232 0.614 0.346 1.089 0.627 0.359 1.094 0.755 0.341 1.672 0.821 0.372 1.815 0.828 0.375 1.827 0.749 0.321 1.749 0.812 0.350 1.885 0.796 0.344 1.844 0.891 0.357 2.225 0.942 0.376 2.362 0.997 0.400 2.483 Reference Reference Reference 13.268 2.078 84.732 13.479 2.172 83.635 15.782 2.058 121.046 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 <0.001 <0.001 >999.999 328 Age* Predicted FEV1 in current visit* Height* Weight* Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Transplant status: No Had Will have Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Table 7.9. (continued). Variable CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD* B. cepacia* MRSA* Other gram-negative microorganisms* Non-mucoid Pa PI* Number of PEx in the past year in current visit: 0 1 2 3 4 5 Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 3.302 1.885 5.784 3.317 1.922 5.723 3.445 1.995 5.948 1.322 0.840 2.083 1.175 0.758 1.823 1.140 0.739 1.759 1.033 0.810 1.318 1.041 0.822 1.318 1.088 0.863 1.371 8.323 1.346 51.464 10.577 1.718 65.112 9.864 1.633 59.571 1.314 1.035 1.667 1.304 1.035 1.644 1.329 1.059 1.669 3.178 1.923 5.253 3.230 1.960 5.322 3.313 2.025 5.418 1.768 1.225 2.551 1.809 1.260 2.599 1.899 1.328 2.715 1.025 0.860 1.223 1.073 0.903 1.274 1.073 0.906 1.270 0.915 0.632 1.325 0.961 0.671 1.378 0.947 0.662 1.356 0.663 0.319 1.377 0.657 0.320 1.350 0.716 0.367 1.397 0.831 0.243 2.842 0.782 0.227 2.693 0.779 0.227 2.674 0.718 0.202 2.548 0.643 0.185 2.240 0.591 0.170 2.050 Reference 329 Table 7.9. (continued). Variable Strategy 53 Point 95% Wald Estimate Confidence Limits Reference 0.896 0.416 1.931 0.881 0.410 1.891 0.898 0.422 1.909 0.549 0.040 7.616 0.290 0.021 4.064 0.298 0.021 4.239 Reference 0.506 0.143 1.783 0.396 0.115 1.360 0.475 0.138 1.644 6.192 0.383 100.039 2.745 0.200 37.710 2.892 0.208 40.136 1.119 0.378 3.312 1.473 0.535 4.058 1.134 0.389 3.303 0.188 0.011 3.351 0.811 0.099 6.647 0.777 0.095 6.345 1.003 0.998 1.007 1.003 0.999 1.007 1.002 0.998 1.007 0.979 0.968 0.991 0.980 0.969 0.991 0.980 0.970 0.991 0.990 0.980 1.000 0.990 0.980 1.000 0.990 0.980 1.000 1.001 1.001 1.001 1.001 1.001 1.001 1.001 1.001 1.001 0.947 0.914 0.980 0.941 0.909 0.974 0.939 0.908 0.971 Reference 330 Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Predicted FEV1 in current visit Height Weight Number of visit (spline) Number of visit Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Table 7.9. (continued). Variable Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate Strategy 53 Point 95% Wald Estimate Confidence Limits Smoking: 0.920 0.721 1.173 0.902 1.173 0.655 2.099 1.222 0.712 1.143 0.864 0.689 1.083 0.695 2.147 1.150 0.667 1.984 Reference 0.165 0.051 0.538 0.467 0.147 1.482 0.460 0.147 1.443 0.174 0.030 1.014 0.493 0.088 2.775 0.403 0.057 2.833 Reference 1.267 0.918 1.747 1.162 0.853 1.583 1.125 0.833 1.519 1.247 0.853 1.823 1.217 0.846 1.751 1.185 0.831 1.689 Reference 1.257 1.025 1.542 1.237 1.016 1.505 1.246 1.028 1.509 0.734 0.488 1.103 0.643 0.433 0.954 0.656 0.447 0.964 1.175 0.702 1.969 1.086 0.652 1.809 1.039 0.625 1.727 3.247 0.970 10.865 2.307 0.848 6.274 2.257 0.830 6.137 0.715 0.548 0.931 0.717 0.551 0.934 0.714 0.550 0.926 0.891 0.645 1.232 0.899 0.657 1.230 0.924 0.686 1.245 0.869 0.729 1.037 0.911 0.768 1.081 0.919 0.778 1.085 1.621 0.784 3.353 1.228 0.595 2.533 1.409 0.716 2.774 1.063 0.891 1.268 1.034 0.871 1.227 1.051 0.889 1.242 331 No Yes Unknown Transplant status: No Had Will have CFRD status: No Impaired glucose tolerance CFRD with or without fasting hyperglycemia GERD Pancreatic insufficiency Pancreatitis Hemoptysis Using any enzymes ABPA Aspergillus B. cepacia Candida Table 7.9. (continued). Variable Strategy 53 Point 95% Wald Estimate Confidence Limits 1.715 0.972 3.026 1.736 1.012 2.975 1.799 1.072 3.021 1.042 0.872 1.245 1.062 0.894 1.261 1.057 0.895 1.249 1.336 1.036 1.724 1.322 1.033 1.693 1.333 1.044 1.701 1.637 1.300 2.061 1.568 1.253 1.962 1.605 1.289 1.998 0.913 0.676 1.233 0.932 0.695 1.250 0.935 0.699 1.249 1.023 0.735 1.422 0.990 0.719 1.364 1.030 0.755 1.405 0.749 0.459 1.222 0.790 0.490 1.273 0.837 0.529 1.324 1.002 0.837 1.199 1.008 0.846 1.201 1.006 0.848 1.193 1.294 0.960 1.744 1.317 0.988 1.755 1.311 0.994 1.729 1.676 1.007 2.791 1.402 0.853 2.306 1.180 0.714 1.949 1.943 0.876 4.309 2.662 1.344 5.275 2.577 1.324 5.017 1.049 0.356 3.091 1.479 0.557 3.926 1.604 0.640 4.019 1.437 0.955 0.631 1.446 Reference Reference 1.094 0.702 1.704 0.936 0.610 332 MAI MRSA MSSA Other gram-negative microorganisms Staphylococcus aureus Non-mucoid Pa PI Unknown type of mucoid Pa PI Number of PEx in the past year in current visit: 0 1 2 3 4 5 Drug resistance of aminoglycosides in current visit: No Yes Odds Ratio Estimates Strategy 33 Strategy 43 Point 95% Wald Point 95% Wald Estimate Confidence Limits Estimate Confidence Limits Table 7.9. (continued). Variable Drug resistance of aminoglycosides in current visit: Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Odds Ratio Estimates Strategy 43 Strategy 33 Point 95% Wald Point 95% Wald Estimate Confidence Limits Confidence Limits Estimate 0.185 0.017 1.986 0.137 0.612 0.296 1.266 0.742 1.471 0.143 15.118 2.378 0.014 Strategy 53 Point 95% Wald Estimate Confidence Limits 1.301 0.150 0.016 1.439 0.369 1.490 0.704 0.351 1.414 0.270 20.958 2.188 0.245 19.534 Reference Reference 1.181 0.663 2.107 1.164 0.664 2.039 1.217 0.699 2.122 3.682 1.050 12.909 3.017 0.860 10.585 3.104 0.887 10.869 * Variables were measured at the baseline 333 Figure 7.1. The distribution of SICW under different strategies in different imputed datasets. The left and right columns represent the distributions in imputed datasets 2 and 8, respectively. From the top to bottom, the figures represent the distributions under strategies 334 33 and 53, respectively. Figure 7.2. The distribution of unstabilized IPTW under different strategies in different imputed datasets. The left and right columns represent the distributions in imputed datasets 2 and 8, respectively. From the top to bottom, the figures represent the distributions 335 under strategies 33 and 53, respectively. Figure 7.3. The distribution of SIPTW under different strategies in different imputed datasets. The left and right columns represent the distributions in imputed datasets 2 and 8, respectively. From the top to bottom, the figures represent the distributions under 336 strategies 33 and 53, respectively. Figure 7.4. The distribution of UIPW under different strategies in different imputed datasets. The left and right columns represent the distributions in imputed datasets 2 and 8, respectively. From the top to bottom, the figures represent the distributions under strategies 337 33 and 53, respectively. Figure 7.5. The distribution of SIPW under different strategies in different imputed datasets. The left and right columns represent the distributions in imputed datasets 2 and 8, respectively. From the top to bottom, the figures represent the distributions under strategies 338 33 and 53, respectively. Table 7.10. The distribution of UIPW and SIPW under different strategies in different imputed datasets. Imputed dataset Strategy 2 33 2 53 8 33 8 53 UIPW 28976 1307.4200 0.0503 Lower Quartile 1.1292 SIPW 28976 190.7774 0.0021 0.6192 0.8806 1.0539 1821157.9900 UIPW 31087 118.4806 0.0449 1.1126 1.4275 2.5948 1314319.5100 SIPW 31087 20.9848 0.0035 0.6303 0.8840 1.0481 196570.6600 UIPW 29083 148.4058 0.1070 1.1258 1.4791 2.8909 1520071.7200 SIPW 29083 25.9004 0.0024 0.6223 0.8807 1.0540 224189.8000 UIPW 31064 17.3635 0.1001 1.1088 1.4248 2.5991 140137.3300 SIPW 31064 3.7652 0.0055 0.6344 0.8842 1.0465 25312.8400 Variable N Mean Minimum Median 1.4863 Upper Maximum Quartile 2.9119 14588668.3800 339 Table 7.11. The distribution of truncated SIPW under different strategies in different imputed datasets. Imputed dataset Strategy Variable 2 2 8 8 33 53 33 53 SIPW SIPW SIPW SIPW N 28976 31087 29083 31064 Mean 1.0173 1.0177 1.0148 1.0155 Minimum 0.0021 0.0035 0.0024 0.0055 Lower Quartile 0.6192 0.6303 0.6223 0.6344 Median 0.8806 0.8840 0.8807 0.8842 Upper Quartile 1.0539 1.0481 1.0540 1.0465 Maximum 10.0000 10.0000 10.0000 10.0000 340 Table 7.12. The number of extreme values of SIPW under different strategies in different imputed datasets. Imputed dataset Strategy 2 2 8 8 33 53 33 53 Number of visits in the dataset 28976 31087 29083 31064 Number of patients Number of visits 51 46 51 49 175 168 161 166 341 Figure 7.6. Non-parametric Kaplan Meier curve under different treatment strategies in different imputed datasets. The left and right columns represent the trends in imputed datasets 2 and 8, respectively. The figures represent the trends without any adjustment on weighting. 342 Figure 7.7. Non-parametric Kaplan Meier curve under different treatment strategies in different imputed datasets. The left and right columns represent the trends in imputed datasets 2 and 8, respectively. The figures represent the trends after adjusting by UIPW. 343 Figure 7.8. Non-parametric Kaplan Meier curve under different treatment strategies in different imputed datasets. The left and right columns represent the trends in imputed datasets 2 and 8, respectively. The figures represent the trends after adjusting by SIPW. 344 Figure 7.9. Survival curve of dynamic logistic MSMs adjusting by UIPW between different treatment strategies in different imputed datasets. The left and right columns represent the trends in imputed datasets 2 and 8, respectively. 345 346 Table 7.13. Results of a fixed parameterization of the dynamic logistic MSM with UIPW. Parameter Intercept No strategy Strategy11 Strategy12 Strategy13 Strategy14 Strategy15 Strategy21 Strategy22 Strategy23 Strategy24 Strategy25 Strategy31 Strategy32 Strategy33 Strategy34 Strategy35 Strategy41 Strategy42 Strategy43 Strategy44 Strategy45 Strategy51 Strategy52 Strategy53 Strategy54 Strategy55 Estimate -5.4682 1.3840 2.0193 2.0595 2.1146 2.0609 2.0384 2.0542 2.0838 2.1311 2.0710 2.0367 -2.5084 -2.4096 -2.3118 -2.2181 -2.3055 -1.2354 -1.1545 -1.0059 -0.8005 -0.8959 -0.2447 -0.1634 -0.0809 0.0412 Parameter Estimates Coefficient OR Minimum Maximum Estimate Minimum Maximum -6.4865 -5.0618 0.0042 0.0015 0.0063 0.9777 2.4023 3.9909 2.6583 11.0490 0.9088 4.1264 7.5329 2.4814 61.9565 0.9302 4.1269 7.8418 2.5351 61.9845 1.1996 4.1051 8.2859 3.3187 60.6484 1.1985 3.9828 7.8526 3.3150 53.6690 1.1814 3.9759 7.6783 3.2590 53.2992 1.0670 4.1404 7.8006 2.9065 62.8272 1.0802 4.1291 8.0347 2.9452 62.1247 1.2012 4.1030 8.4244 3.3242 60.5213 1.1936 3.9681 7.9331 3.2990 52.8820 1.1806 3.9372 7.6656 3.2563 51.2751 -4.5064 -0.7453 0.0814 0.0110 0.4746 -4.3299 -0.6732 0.0898 0.0132 0.5101 -4.1998 -0.9120 0.0991 0.0150 0.4017 -4.0913 -0.8988 0.1088 0.0167 0.4070 -4.1135 -1.1701 0.0997 0.0164 0.3103 -3.1517 0.0041 0.2907 0.0428 1.0041 -2.9983 0.0590 0.3152 0.0499 1.0607 -2.8351 -0.0935 0.3657 0.0587 0.9108 -2.4858 0.0037 0.4491 0.0833 1.0038 -2.5261 -0.1620 0.4082 0.0800 0.8504 -0.5807 -0.0219 0.7829 0.5595 0.9783 -0.4840 0.0450 0.8492 0.6163 1.0460 -0.3071 0.0023 0.9223 0.7356 1.0023 -0.0586 0.1278 1.0421 0.9431 1.1364 Reference Table 7.14. Results of a fixed parameterization of dynamic logistic MSM with SIPW. Parameter Intercept No strategy Strategy11 Strategy12 Strategy13 Strategy14 Strategy15 Strategy21 Strategy22 Strategy23 Strategy24 Strategy25 Strategy31 Strategy32 Strategy33 Strategy34 Strategy35 Strategy41 Strategy42 Strategy43 Strategy44 Strategy45 Strategy51 Strategy52 Strategy53 Strategy54 Strategy55 Predicted FEV1 in current visit Age Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing 347 Parameter Estimates Coefficient OR 95% CI (OR) Estimate MinimumMaximum Estimate MinimumMaximum Lower Upper -3.6395 -3.8098 -3.4342 0.0263 0.0222 0.0323 0.0188 0.0366 0.1592 0.1301 0.1862 1.1725 1.1389 1.2047 1.1294 1.2173 0.0285 0.0088 0.0467 1.0289 1.0088 1.0478 0.9630 1.0994 0.0428 0.0247 0.0581 1.0438 1.0250 1.0598 0.9770 1.1151 0.0619 0.0442 0.0762 1.0638 1.0451 1.0792 0.9957 1.1367 0.0478 0.0304 0.0616 1.0490 1.0308 1.0635 0.9811 1.1216 0.0462 0.0287 0.0594 1.0473 1.0291 1.0612 0.9790 1.1203 0.0358 0.0226 0.0469 1.0365 1.0229 1.0480 0.9839 1.0918 0.0489 0.0406 0.0584 1.0502 1.0415 1.0601 0.9975 1.1056 0.0678 0.0574 0.0790 1.0702 1.0591 1.0822 1.0138 1.1297 0.0529 0.0427 0.0640 1.0544 1.0436 1.0661 0.9988 1.1130 0.0522 0.0415 0.0624 1.0536 1.0424 1.0644 0.9992 1.1111 -0.0023 -0.0173 0.0288 0.9977 0.9828 1.0292 0.9404 1.0586 0.0083 -0.0067 0.0384 1.0083 0.9933 1.0391 0.9413 1.0801 0.0275 0.0144 0.0558 1.0279 1.0145 1.0574 0.9595 1.1011 0.0162 0.0033 0.0462 1.0163 1.0033 1.0473 0.9583 1.0779 0.0142 0.0045 0.0454 1.0143 1.0045 1.0464 0.9558 1.0765 0.0102 -0.0085 0.0340 1.0102 0.9916 1.0346 0.9559 1.0676 0.0206 0.0020 0.0445 1.0208 1.0020 1.0455 0.9693 1.0751 0.0422 0.0302 0.0642 1.0432 1.0306 1.0663 0.9905 1.0987 0.0300 0.0178 0.0536 1.0305 1.0180 1.0551 0.9774 1.0865 0.0260 0.0146 0.0502 1.0263 1.0148 1.0515 0.9724 1.0832 -0.0186 -0.0352 -0.0135 0.9816 0.9655 0.9866 0.9304 1.0355 -0.0058 -0.0220 -0.0009 0.9942 0.9783 0.9991 0.9414 1.0500 0.0163 0.0138 0.0198 1.0165 1.0139 1.0200 0.9675 1.0679 0.0038 0.0018 0.0058 1.0038 1.0018 1.0058 0.9485 1.0625 Reference -0.0047 -0.0057 -0.0033 0.9953 0.9943 0.9967 0.9945 0.9961 0.0601 0.0541 0.0706 1.0620 1.0556 1.0732 1.0503 1.0738 0.1518 0.0699 0.3554 -1.5286 -1.5894 0.0539 -0.0468 0.2470 -1.9649 -1.7338 0.2313 0.1544 0.4416 -1.1069 -1.4392 1.1639 1.0724 1.4267 0.2168 0.2040 1.0554 0.9543 1.2802 0.1402 0.1766 1.2603 1.1669 1.5551 0.3306 0.2371 0.9989 0.9286 1.1975 0.1755 0.1645 1.3561 1.2385 1.6999 0.2679 0.2531 -0.1837 -0.2932 -0.0916 0.8322 0.7459 0.9124 0.7257 0.9545 Reference 348 Table 7.14. (continued). Parameter Race: Caucasian Black Asian Others Gender (male) Number of PEx in the past year in current visit : 0 1 2 3 4 5 Mucolytics : 0 1 2 Antiinflammatories: 0 1 2 Drug resistance of aminoglycosides in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Parameter Estimates Coefficient OR 95% CI (OR) Upper Estimate MinimumMaximum Estimate MinimumMaximum Lower -0.6068 0.2099 -0.1230 -0.6449 0.1651 -0.1897 -0.5636 0.3077 -0.0565 0.0362 0.0193 0.0597 0.5451 0.5247 1.2336 1.1795 0.8842 0.8272 Reference 1.0368 1.0195 0.5691 1.3603 0.9451 0.3954 0.8982 0.6292 0.7515 1.6941 1.2426 1.0615 0.9825 1.0942 0.1083 0.1856 0.2456 -1.3019 -0.1541 0.0572 0.1585 -0.4120 -1.3489 -0.2387 0.1663 0.2327 0.7440 -1.2455 0.0094 Reference 1.1144 1.0588 1.2040 1.1717 1.2784 0.6624 0.2720 0.2595 0.8572 0.7877 1.1809 1.2620 2.1044 0.2878 1.0094 1.0472 1.0288 1.0285 0.2092 0.6403 1.1860 1.4090 1.5891 0.3536 1.1475 -0.0608 0.1521 -0.0865 0.0939 -0.0432 0.2215 Reference 0.9411 0.9172 1.1642 1.0985 0.9578 1.2480 0.8973 1.0566 0.9869 1.2829 1.1976 0.8286 1.0644 0.6305 1.2293 1.0090 0.1344 -0.2262 0.0735 -0.2670 0.1803 -0.1880 Reference 1.1439 1.0762 0.7976 0.7657 0.8144 -0.5795 0.7716 -0.6197 0.8733 -0.5073 Reference 2.2579 2.1632 0.5602 0.5381 2.3948 0.6021 1.8622 0.4794 2.7377 0.6546 0.0160 -0.7339 Reference 0.8861 0.7796 0.4244 0.3937 1.0161 0.4800 0.6256 0.2531 1.2550 0.7117 -0.1209 -0.8570 -0.2490 -0.9323 349 Table 7.14. (continued). Parameter Drug resistance of quinolones in current visit: No Yes Testing not done Parameter Estimates Coefficient OR 95% CI (OR) Estimate MinimumMaximum Estimate MinimumMaximum Lower Upper -0.1171 0.9397 -0.1707 0.8099 -0.0559 1.0441 * All variables were measured at the baseline Reference 0.8895 0.8431 2.5593 2.2476 0.9456 2.8408 0.6241 1.2378 1.2677 5.2917 Figure 7.10. The treatment effects of following different strategies to change treatment rationally. 350 Table 7.15. Results of a flexible parameterization of the dynamic logistic MSM with 351 stabilized weighting. Parameter Intercept Effect in the 1st year Effect in the 2nd year Effect in the 3rd year Effect in the 4th year Effect in the 5th year Effect in the 6th year Strategy1 Strategy11 Strategy12 Strategy13 Strategy14 Strategy15 Strategy21 Strategy22 Strategy23 Strategy24 Strategy25 Strategy31 Strategy32 Strategy33 Strategy34 Strategy35 Strategy41 Strategy42 Strategy43 Strategy44 Strategy45 Strategy51 Strategy52 Strategy53 Strategy54 Strategy55 Strategy11st year Strategy111st year Strategy121st year Strategy131st year Strategy141st year Strategy151st year Strategy211st year Parameter Estimates OR 95% CI (OR) Coefficient Upper Estimate MinimumMaximum Estimate MinimumMaximum Lower -3.4801 -3.6181 -3.2847 0.0308 0.0268 0.0375 0.0219 0.0433 -0.1185 -0.2133 0.0053 0.8883 0.8079 1.0053 0.6999 1.1273 0.1100 -0.0371 0.2532 1.1162 0.9636 1.2881 0.8650 1.4404 -0.3929 -0.5221 -0.2747 0.6751 0.5933 0.7598 0.5272 0.8645 0.1990 0.0375 0.4587 1.2202 1.0382 1.5821 0.9242 1.6109 0.4178 0.3285 0.5558 1.5186 1.3888 1.7432 1.1692 1.9722 Reference 0.3476 0.2606 0.4602 1.4156 1.2976 1.5844 1.1221 1.7860 -0.5898 -0.7437 -0.4244 0.5545 0.4753 0.6541 0.3578 0.8591 -0.5814 -0.7194 -0.4245 0.5591 0.4871 0.6541 0.3617 0.8644 -0.6152 -0.7251 -0.4868 0.5405 0.4843 0.6146 0.3496 0.8357 -0.6154 -0.7274 -0.4819 0.5404 0.4832 0.6176 0.3485 0.8380 -0.6203 -0.7337 -0.4871 0.5378 0.4801 0.6144 0.3459 0.8361 0.0813 -0.1271 0.2778 1.0847 0.8807 1.3202 0.6739 1.7462 0.1084 -0.1055 0.3131 1.1144 0.8999 1.3677 0.6868 1.8084 0.1128 -0.1313 0.3458 1.1194 0.8770 1.4132 0.6900 1.8160 0.0919 -0.1462 0.3195 1.0962 0.8640 1.3765 0.6752 1.7797 0.0723 -0.1669 0.2989 1.0750 0.8463 1.3483 0.6620 1.7456 0.3195 0.2681 0.3778 1.3765 1.3075 1.4591 0.8528 2.2218 0.3236 0.3022 0.3446 1.3820 1.3529 1.4115 0.8538 2.2370 0.3071 0.2536 0.3346 1.3594 1.2886 1.3973 0.8375 2.2066 0.3041 0.2420 0.3365 1.3554 1.2738 1.4000 0.8336 2.2037 0.2818 0.2156 0.3082 1.3255 1.2406 1.3609 0.8140 2.1584 0.0490 0.0058 0.1710 1.0502 1.0058 1.1865 0.6904 1.5975 0.0553 0.0049 0.1385 1.0569 1.0049 1.1485 0.6722 1.6617 0.0961 0.0581 0.1359 1.1009 1.0598 1.1455 0.6975 1.7376 0.0789 0.0441 0.1078 1.0821 1.0451 1.1138 0.6833 1.7138 0.0856 0.0451 0.1124 1.0893 1.0462 1.1189 0.7303 1.6249 -0.0533 -0.0860 0.0448 0.9481 0.9176 1.0458 0.6349 1.4160 -0.0347 -0.0601 0.0286 0.9659 0.9416 1.0290 0.6204 1.5038 0.0070 -0.0071 0.0255 1.0070 0.9929 1.0258 0.6734 1.5059 -0.0103 -0.0240 0.0041 0.9898 0.9763 1.0041 0.6592 1.4862 Reference -0.2642 -0.3873 -0.1728 0.7678 0.6789 0.8413 0.6101 0.9663 0.6107 0.4504 0.7639 1.8418 1.5690 2.1466 1.1831 2.8672 0.6003 0.4484 0.7385 1.8226 1.5659 2.0927 1.1738 2.8300 0.6291 0.4988 0.7391 1.8760 1.6468 2.0940 1.2082 2.9128 0.6202 0.4854 0.7318 1.8593 1.6249 2.0789 1.1981 2.8855 0.6239 0.4882 0.7364 1.8663 1.6293 2.0884 1.1986 2.9059 -0.0689 -0.2559 0.1323 0.9334 0.7743 1.1414 0.5807 1.5004 Table 7.15. (continued). Parameter Strategy221st year Strategy231st year Strategy241st year Strategy251st year Strategy311st year Strategy321st year Strategy331st year Strategy341st year Strategy351st year Strategy411st year Strategy421st year Strategy431st year Strategy441st year Strategy451st year Strategy511st year Strategy521st year Strategy531st year Strategy541st year Strategy551st year Strategy12nd year Strategy112nd year Strategy122nd year Strategy132nd year Strategy142nd year Strategy152nd year Strategy212nd year Strategy222nd year Strategy232nd year Strategy242nd year Strategy252nd year Strategy312nd year Strategy322nd year Strategy332nd year Strategy342nd year Strategy352nd year Strategy412nd year Strategy422nd year Strategy432nd year Strategy442nd year Strategy452nd year Strategy512nd year Strategy522nd year 352 Parameter Estimates Coefficient OR 95% CI (OR) Estimate MinimumMaximum Estimate MinimumMaximum Lower Upper -0.0972 -0.3100 0.1088 0.9073 0.7334 1.1150 0.5604 1.4690 -0.1063 -0.3445 0.1408 0.8992 0.7086 1.1513 0.5553 1.4558 -0.0931 -0.3251 0.1470 0.9111 0.7224 1.1583 0.5632 1.4740 -0.0742 -0.3056 0.1672 0.9284 0.7367 1.1820 0.5738 1.5023 -0.3149 -0.3727 -0.2695 0.7299 0.6889 0.7638 0.4490 1.1863 -0.3203 -0.3431 -0.3046 0.7260 0.7096 0.7374 0.4453 1.1835 -0.3100 -0.3407 -0.2523 0.7335 0.7113 0.7770 0.4500 1.1956 -0.3154 -0.3508 -0.2502 0.7295 0.7041 0.7786 0.4454 1.1947 -0.2936 -0.3234 -0.2240 0.7456 0.7237 0.7994 0.4557 1.2198 -0.0364 -0.1556 0.0070 0.9643 0.8559 1.0071 0.6291 1.4782 -0.0443 -0.1238 0.0074 0.9566 0.8835 1.0074 0.6049 1.5128 -0.0892 -0.1247 -0.0511 0.9147 0.8827 0.9502 0.5760 1.4524 -0.0799 -0.1087 -0.0450 0.9232 0.8970 0.9560 0.5797 1.4701 -0.0884 -0.1130 -0.0483 0.9154 0.8931 0.9529 0.6115 1.3703 0.0664 -0.0303 0.0977 1.0687 0.9702 1.1027 0.7138 1.5999 0.0484 -0.0126 0.0733 1.0495 0.9875 1.0760 0.6683 1.6482 0.0027 -0.0159 0.0174 1.0027 0.9843 1.0176 0.6676 1.5060 0.0119 -0.0026 0.0264 1.0120 0.9974 1.0267 0.6713 1.5255 Reference -0.2797 -0.4071 -0.1435 0.7560 0.6656 0.8663 0.5911 0.9669 0.5058 0.3042 0.6898 1.6583 1.3555 1.9934 1.0569 2.6020 0.5350 0.3414 0.7026 1.7075 1.4069 2.0189 1.1021 2.6453 0.6754 0.5564 0.8043 1.9648 1.7444 2.2350 1.2620 3.0592 0.6431 0.5213 0.7719 1.9024 1.6843 2.1640 1.2213 2.9634 0.6594 0.5332 0.7916 1.9336 1.7044 2.2070 1.2359 3.0252 -0.1324 -0.3998 0.1422 0.8760 0.6705 1.1529 0.5426 1.4145 -0.1201 -0.3796 0.1455 0.8868 0.6841 1.1566 0.5437 1.4465 -0.0254 -0.2606 0.2162 0.9749 0.7706 1.2414 0.5998 1.5846 -0.0385 -0.2664 0.2050 0.9623 0.7661 1.2275 0.5924 1.5630 -0.0079 -0.2404 0.2424 0.9921 0.7863 1.2744 0.6108 1.6114 -0.3868 -0.4773 -0.2957 0.6792 0.6205 0.7440 0.4213 1.0952 -0.3584 -0.4377 -0.3081 0.6988 0.6455 0.7348 0.4314 1.1320 -0.2445 -0.2816 -0.2052 0.7831 0.7546 0.8145 0.4861 1.2616 -0.2682 -0.3107 -0.2210 0.7647 0.7330 0.8017 0.4699 1.2444 -0.2341 -0.2685 -0.1803 0.7912 0.7645 0.8350 0.4858 1.2887 -0.1132 -0.2394 -0.0585 0.8929 0.7871 0.9432 0.5645 1.4125 -0.0900 -0.1843 -0.0355 0.9139 0.8317 0.9651 0.5755 1.4513 -0.0370 -0.0787 0.0088 0.9636 0.9243 1.0089 0.6055 1.5336 -0.0448 -0.0883 -0.0058 0.9562 0.9155 0.9942 0.5979 1.5292 -0.0471 -0.0926 -0.0083 0.9540 0.9116 0.9918 0.6302 1.4442 -0.0484 -0.1485 0.0063 0.9527 0.8620 1.0064 0.6377 1.4234 -0.0322 -0.1157 0.0078 0.9683 0.8907 1.0078 0.6230 1.5051 Table 7.15. (continued). 353 Parameter Estimates Coefficient OR 95% CI (OR) Parameter Upper Estimate MinimumMaximum Estimate MinimumMaximum Lower Strategy532nd year 0.0186 0.0008 0.0326 1.0188 1.0008 1.0331 0.6675 1.5549 Strategy542nd year 0.0120 -0.0025 0.0254 1.0120 0.9975 1.0257 0.6739 1.5199 Reference Strategy552nd year Strategy13rd year 0.1182 0.0023 0.2507 1.1254 1.0023 1.2849 0.8855 1.4304 0.8044 0.6497 0.9589 2.2353 1.9150 2.6089 1.3717 3.6425 Strategy113rd year Strategy123rd year 0.7765 0.6331 0.9284 2.1739 1.8833 2.5304 1.3439 3.5167 Strategy133rd year 0.8011 0.6887 0.9242 2.2281 1.9911 2.5198 1.3734 3.6147 Strategy143rd year 0.7732 0.6544 0.9047 2.1667 1.9239 2.4712 1.3352 3.5161 Strategy153rd year 0.7675 0.6374 0.8986 2.1544 1.8915 2.4561 1.3267 3.4985 Strategy213rd year 0.0841 -0.0899 0.3278 1.0878 0.9141 1.3879 0.6813 1.7369 0.0308 -0.1691 0.2715 1.0313 0.8445 1.3119 0.6303 1.6873 Strategy223rd year Strategy233rd year 0.0143 -0.2075 0.2515 1.0144 0.8126 1.2859 0.6194 1.6614 0.0004 -0.2179 0.2368 1.0004 0.8042 1.2671 0.6171 1.6218 Strategy243rd year 0.0116 -0.2194 0.2492 1.0117 0.8030 1.2830 0.6290 1.6271 Strategy253rd year Strategy313rd year -0.2396 -0.3129 -0.1723 0.7869 0.7313 0.8417 0.4672 1.3254 Strategy323rd year -0.2691 -0.3126 -0.2157 0.7640 0.7316 0.8060 0.4487 1.3011 -0.2612 -0.3095 -0.1549 0.7701 0.7338 0.8565 0.4573 1.2968 Strategy333rd year -0.2877 -0.3392 -0.1749 0.7500 0.7123 0.8396 0.4633 1.2139 Strategy343rd year -0.2714 -0.3341 -0.1519 0.7623 0.7160 0.8591 0.4657 1.2480 Strategy353rd year 0.0566 -0.0585 0.1095 1.0583 0.9432 1.1157 0.6675 1.6778 Strategy413rd year 0.0310 -0.0611 0.0820 1.0315 0.9407 1.0855 0.6076 1.7512 Strategy423rd year Strategy433rd year -0.0201 -0.1160 0.0209 0.9801 0.8904 1.0211 0.5863 1.6386 Strategy443rd year -0.0295 -0.1302 0.0071 0.9710 0.8779 1.0071 0.6038 1.5615 -0.0400 -0.1472 0.0104 0.9608 0.8631 1.0104 0.6228 1.4822 Strategy453rd year 0.1130 0.0124 0.1502 1.1197 1.0124 1.1620 0.6863 1.8268 Strategy513rd year 0.0775 0.0238 0.1034 1.0805 1.0241 1.1090 0.6565 1.7785 Strategy523rd year Strategy533rd year 0.0269 0.0035 0.0425 1.0272 1.0035 1.0434 0.6220 1.6965 0.0134 -0.0087 0.0299 1.0135 0.9913 1.0303 0.6606 1.5548 Strategy543rd year Reference Strategy553rd year -0.4131 -0.6739 -0.2497 0.6616 0.5097 0.7790 0.5074 0.8627 Strategy14th year Strategy114th year 0.5618 0.3028 0.7459 1.7538 1.3537 2.1083 1.0786 2.8516 Strategy124th year 0.5302 0.2490 0.7138 1.6993 1.2828 2.0417 1.0453 2.7622 Strategy134th year 0.5667 0.2538 0.7202 1.7625 1.2889 2.0548 1.0860 2.8604 Strategy144th year 0.5707 0.2407 0.7323 1.7696 1.2721 2.0798 1.0898 2.8733 0.5156 0.1749 0.6797 1.6747 1.1911 1.9732 1.0308 2.7209 Strategy154th year Strategy214th year -0.0283 -0.3888 0.2626 0.9721 0.6779 1.3003 0.5911 1.5988 -0.0707 -0.4714 0.2270 0.9318 0.6241 1.2548 0.5605 1.5490 Strategy224th year Strategy234th year -0.0707 -0.4932 0.2145 0.9317 0.6107 1.2392 0.5607 1.5483 Strategy244th year -0.0478 -0.4719 0.2362 0.9533 0.6238 1.2664 0.5732 1.5854 Strategy254th year -0.0743 -0.5083 0.2123 0.9284 0.6015 1.2365 0.5624 1.5325 Strategy314th year -0.3235 -0.5004 -0.0235 0.7236 0.6063 0.9768 0.4230 1.2378 Strategy324th year -0.3540 -0.5663 -0.0459 0.7019 0.5676 0.9552 0.4127 1.1937 354 Table 7.15. (continued). Parameter Strategy334th year Strategy344th year Strategy354th year Strategy414th year Strategy424th year Strategy434th year Strategy444th year Strategy454th year Strategy514th year Strategy524th year Strategy534th year Strategy544th year Strategy554th year Strategy15th year Strategy115th year Strategy125th year Strategy135th year Strategy145th year Strategy155th year Strategy215th year Strategy225th year Strategy235th year Strategy245th year Strategy255th year Strategy315th year Strategy325th year Strategy335th year Strategy345th year Strategy355th year Strategy415th year Strategy425th year Strategy435th year Strategy445th year Strategy455th year Strategy515th year Strategy525th year Strategy535th year Strategy545th year Strategy555th year Predicted FEV1 in current visit Age Parameter Estimates Coefficient OR 95% CI (OR) Upper Estimate MinimumMaximum Estimate MinimumMaximum Lower -0.3356 -0.5691 -0.0320 0.7149 0.5660 0.9685 0.4202 1.2165 -0.3360 -0.5715 -0.0335 0.7146 0.5647 0.9670 0.4172 1.2240 -0.3655 -0.6200 -0.0634 0.6938 0.5380 0.9386 0.4018 1.1982 0.1325 -0.0015 0.3500 1.1417 0.9985 1.4190 0.7295 1.7869 0.1135 0.0218 0.3397 1.1201 1.0221 1.4045 0.6933 1.8098 0.0715 -0.0069 0.2992 1.0741 0.9931 1.3488 0.6615 1.7440 0.0828 0.0033 0.3068 1.0864 1.0033 1.3590 0.6662 1.7715 0.0426 -0.0252 0.2642 1.0435 0.9751 1.3024 0.6879 1.5830 0.0937 0.0095 0.1307 1.0982 1.0095 1.1396 0.7081 1.7033 0.0670 0.0206 0.0944 1.0693 1.0208 1.0990 0.6633 1.7237 0.0233 0.0107 0.0390 1.0236 1.0107 1.0397 0.6590 1.5899 0.0365 0.0244 0.0434 1.0372 1.0246 1.0444 0.6654 1.6167 Reference -0.4436 -0.5813 -0.3553 0.6417 0.5592 0.7010 0.4995 0.8244 1.2211 1.0749 1.3970 3.3908 2.9297 4.0430 2.1001 5.4745 1.3098 1.1931 1.4762 3.7053 3.2973 4.3765 2.2885 5.9992 1.3307 1.2301 1.4546 3.7837 3.4217 4.2828 2.3387 6.1216 1.3267 1.2248 1.4638 3.7687 3.4034 4.3223 2.3296 6.0970 1.3375 1.2040 1.4802 3.8095 3.3333 4.3939 2.3487 6.1790 0.4198 0.2454 0.5984 1.5217 1.2782 1.8192 0.9124 2.5378 0.4739 0.3133 0.6672 1.6063 1.3679 1.9487 0.9573 2.6954 0.4496 0.2702 0.7596 1.5676 1.3102 2.1374 0.9346 2.6293 0.4713 0.2845 0.7927 1.6021 1.3291 2.2093 0.9562 2.6843 0.5051 0.3319 0.8207 1.6572 1.3936 2.2721 0.9982 2.7513 -0.2353 -0.3677 0.0693 0.7903 0.6923 1.0717 0.4850 1.2879 -0.1244 -0.2619 0.1001 0.8830 0.7696 1.1053 0.5408 1.4420 -0.1131 -0.2345 0.0967 0.8930 0.7909 1.1016 0.5461 1.4604 -0.0773 -0.2054 0.1326 0.9256 0.8143 1.1418 0.5666 1.5119 -0.0541 -0.1886 0.1417 0.9473 0.8282 1.1522 0.5813 1.5438 -0.1132 -0.2273 0.0342 0.8930 0.7967 1.0348 0.5497 1.4507 -0.0097 -0.0964 0.0789 0.9904 0.9081 1.0821 0.5910 1.6596 -0.0373 -0.1169 0.0563 0.9634 0.8897 1.0579 0.5716 1.6237 -0.0092 -0.0930 0.0812 0.9909 0.9112 1.0846 0.5973 1.6438 -0.0363 -0.1251 0.0511 0.9644 0.8824 1.0524 0.6025 1.5436 -0.0671 -0.1988 -0.0033 0.9351 0.8197 0.9967 0.5696 1.5352 0.0097 -0.0636 0.0540 1.0098 0.9384 1.0555 0.6172 1.6521 -0.0096 -0.0495 0.0101 0.9905 0.9517 1.0101 0.6050 1.6216 0.0153 0.0009 0.0338 1.0154 1.0009 1.0344 0.6188 1.6662 Reference -0.0052 -0.0063 -0.0038 0.9948 0.9938 0.9962 0.9939 0.9956 0.0562 0.0495 0.0674 1.0578 1.0507 1.0697 1.0476 1.0681 355 Table 7.15. (continued). Parameter Mutation 2 class: 1 2 3 4 5 Doesn't belong to any class Missing Race: Caucasian Black Asian Others Gender (male) Number of PEx in the past year in current visit : 0 1 2 3 4 5 Mucolytics : 0 1 2 Anti-inflammatories: 0 1 2 Drug resistance of aminoglycosides in current visit: No Yes Testing not done Parameter Estimates Coefficient OR 95% CI (OR) Upper Estimate MinimumMaximum Estimate MinimumMaximum Lower 0.1839 0.0950 0.3725 -1.5434 -1.6538 0.0771 -0.0258 0.2634 -1.9834 -1.8022 0.2693 0.1842 0.4533 -1.1032 -1.5002 1.2019 1.0997 1.4513 0.2137 0.1913 1.0801 0.9745 1.3013 0.1376 0.1649 1.3091 1.2022 1.5735 0.3318 0.2231 1.0355 0.9476 1.2048 0.1729 0.1530 1.3951 1.2761 1.7483 0.2641 0.2393 -0.1519 -0.2609 -0.0548 0.8590 0.7703 0.9467 0.7458 0.9894 Reference -0.6409 0.1862 -0.1268 -0.6700 0.1218 -0.1829 -0.5887 0.2949 -0.0465 0.5551 1.3430 0.9546 0.3720 0.8704 0.6202 0.7461 1.6672 1.2513 0.0535 0.5268 0.5117 1.2046 1.1295 0.8809 0.8329 Reference 1.0304 1.0139 0.0300 0.0138 1.0549 0.9775 1.0862 0.1252 0.2070 0.3012 -1.2707 -0.2205 0.0677 0.1735 -0.3689 -1.3163 -0.3074 0.1825 0.2564 0.7966 -1.2187 0.0084 Reference 1.1334 1.0701 1.2299 1.1894 1.3515 0.6915 0.2806 0.2681 0.8021 0.7354 1.2002 1.2922 2.2180 0.2956 1.0084 1.0622 1.0528 1.0882 0.2103 0.5936 1.2093 1.4368 1.6783 0.3745 1.0840 -0.0545 0.1645 -0.0792 0.1021 -0.0360 0.2394 Reference 0.9470 0.9239 1.1788 1.1075 0.9647 1.2705 0.9015 1.0712 0.9948 1.2971 0.0960 -0.2777 0.0254 -0.3188 0.1456 -0.2342 Reference 1.1007 1.0257 0.7575 0.7270 1.1567 0.7912 1.0243 0.5997 1.1828 0.9568 0.8439 -0.6937 0.7994 -0.7593 0.9044 -0.6277 Reference 2.3254 2.2242 0.4997 0.4680 2.4705 0.5338 1.8950 0.4281 2.8536 0.5832 356 Table 7.15. (continued). Parameter Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Parameter Estimates Coefficient OR 95% CI (OR) Estimate MinimumMaximum Estimate MinimumMaximum Lower Upper -0.1662 -0.7936 -0.2857 -0.8831 -0.0267 -0.6776 Reference 0.8469 0.7515 0.4522 0.4135 0.9737 0.5078 0.5989 0.2723 1.1977 0.7512 -0.1132 0.9460 -0.1669 0.8084 -0.0565 1.0718 Reference 0.8930 0.8463 2.5753 2.2444 0.9451 2.9205 0.6218 1.2455 1.2825 5.3246 * All variables were measured at the baseline Table 7.16. Using stepwise regression (AIC value) to select variable in the time-dependent cox regression. 357 Imputed dataset Excluded variables 1 2 3 4 5 6 7 8 9 10 AIC of included all variables 15136.28 15132.05 15133.41 15122.60 15137.29 15131.30 15136.35 15124.56 15137.39 15124.41 AIC after excluded above 15092.57 15088.12 15089.22 15078.40 15093.02 15086.90 15092.26 15080.32 15093.33 15080.10 variables Probability of excluding the variable Predicted FEV1 in current 1 1 1 1 1 1 1 1 1 1 visit (baseline) Race 1 1 1 1 1 1 1 1 1 1 Drug resistance of quinolones in current visit 1 1 1 1 1 1 1 1 1 1 (baseline) Mutation 2 class 1 1 1 1 1 1 1 1 1 1 CFRD status 1 1 1 1 1 1 1 1 1 1 Transplant 1 1 1 1 1 1 1 1 1 1 Drug resistance of 1 1 1 1 1 1 1 1 1 1 quinolones in current visit Drug resistance of aminoglycosides in current 1 1 1 1 1 1 1 1 1 1 visit (baseline) Predicted FEV1 in current 1 1 1 1 1 1 1 1 1 1 visit Other gram-negative 1 1 1 1 1 1 1 1 1 1 microorganisms Candida 1 1 1 1 1 1 1 1 1 1 MSSA 1 1 1 1 1 1 1 1 1 1 Table 7.16. (continued) Excluded variables GERD B. cepacia Weight Gender Smoking Number of mucolytics (baseline) Drug resistance of beta lactams in current visit 1 1 1 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 4 1 1 1 1 1 5 1 1 1 1 1 6 1 1 1 1 1 7 1 1 1 1 1 8 1 1 1 1 1 9 1 1 1 1 1 10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 358 359 Table 7.17. Results of the time-dependent cox regression. Variables Not following any strategy Following strategy 33 Hispanic height Hemoptysis Using any enzymes MAI MRSA Predicted FEV1 in baseline visit Predicted FEV1 in current visit Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of beta lactams in current visit: No Yes Testing not done Drug resistance of quinolones in current visit: No Yes Testing not done Number of PEx in the past year in current visit : 0 1 2 3 4 5 Inhaled antibiotics: 0 1 2 OR 1.2789 1.0062 2.9534 1.4442 2.1484 0.8517 0.9953 0.9992 95% CI Lower Upper 1.0820 2.4371 3.3190 Reference 1.1240 1.0170 1.6082 1.0020 1.0022 1.0103 1.5721 1.2169 7.1679 1.2211 0.9763 2.1363 1.3026 1.2796 3.6068 1.0718 0.7435 0.9758 1.0022 0.9911 0.9995 1.0022 0.9949 1.0035 0.6144 2.0052 Reference 1.3833 0.3253 1.0766 1.7350 1.1606 2.3173 0% 0% 1.3100 2.1768 Reference 1.1779 0.9504 2.0382 0.5392 1.8056 8.7888 0% 0% 1.8089 0.0080 Reference 1.1084 1.4785 2.1008 0.0019 2.2130 0.0344 0% 0% 0.9603 0.8025 0.7996 0.9353 0.6177 Reference 1.0843 0.8195 1.1411 0.6195 1.2208 0.5408 1.3838 0.4949 1.4435 0.3008 1.1253 1.0395 1.1822 1.7677 1.2682 0% 0% 0% 0% 0% 1.3561 0.9149 Reference 1.0945 1.1360 1.4617 0.4348 1.6188 1.9253 0% 0% Result 2.8441 SE Missing 4% 0% 0% 0% 0% 0% 0% 0% 1% 360 Table 7.17. (continued). Variables Anti-inflammatories: 0 1 2 Bronchodilators: 0 1 2 Mucolytics : 0 1 2 Inhaled antibiotics: 0 1 2 3 Anti-inflammatories: 0 1 2 Bronchodilators: 0 1 2 OR Result SE 95% CI Lower Upper Missing 0.8106 1.1940 Reference 1.1120 0.6583 1.6040 0.4729 0.9980 3.0142 0% 0% 0.8784 1.3117 Reference 1.0795 0.7561 1.3334 0.7463 1.0205 2.3055 0% 0% 1.1449 1.3900 Reference 1.1203 0.9164 1.1304 1.0931 1.4305 1.7675 0% 0% 0.9647 1.1220 1.4709 Reference 1.0827 0.8257 1.1467 0.8580 1.3790 0.7836 1.1272 1.4673 2.7612 0% 0% 0% 1.1745 0.9039 Reference 1.0790 1.0118 1.2578 0.5766 1.3633 1.4169 0% 0% 2.4115 2.0878 Reference 1.1086 1.9703 1.1947 1.4733 2.9515 2.9586 0% 0% * Variables were measured at the baseline CHAPTER 8 OVERALL CONCLUSIONS AND IMPACTS Even though several assumptions were made prior to this investigation, the chance of the results being biased by those assumptions is low since the majority of them were determined either based on well-accepted clinical evidence or were investigated and supported by preliminary tests. Due to the application of innovative methods and comprehensive considerations, the results of this analysis are reasonable, accurate, and stable. This study is the largest cohort of CF patients in the United States who were diagnosed with nonmucoid PaPI from 2006 to 2011and had not developed mucoid PaPI at the index date. Among the 4,970 unique patients, the majority were Caucasian and younger than 12 years old. Given the youth of this cohort, patients were healthy: at the baseline, they were barely affected by comorbidities other than pancreatic insufficiency and GERD; the majority of patients had only mildly impaired lung function, did not have PEx in the previous 1 year, and had almost no drug resistance. However, according to the result of genetic testing, more than three-quarters of those patients had dysfunction of the CFTR protein, which indicates more aggressive disease progression. Subgroup analyses indicated that the clinical signals were applied in prescription decisions, affecting at least the treatment class that a patient received at the baseline. 362 Because patients were young and healthy at baseline, they received few advanced treatment combinations; more than half of patients received either no treatment or one mucolytic. Whether considering only the first treatment change or all treatment changes in the cohort, physicians were prone to change treatment prudently by prescribing only one additional treatment from any of the three treatment classes. At the same time, the fewer treatment classes a patient received, the more potential treatment combinations he could switch to. Finally, the more treatments a patient received in the current treatment, the longer the patient was likely to stay on the same treatments. After reformatting all patients' irregular visits into routine quarterly visits and successfully imputing missing values using a complex strategy based on the mechanism of missing data, 10 imputed datasets were generated. With the assistance of the machinelearning method, together with the support of Rubin's rules, the independent variables in the predictive model were selected, and related coefficients among 10 imputed datasets were combined. The independent variables included demographic characteristics, clinical signals, comorbidities, and treatment histories. With the coefficient of each independent variable, the predicted probability of rational treatment change and the relative change of predicted probability between previous and current visits were calculated accordingly. Given the different thresholds of predicted probability and relative change of predicted probability, 25 varied timing strategies for treatment change were created. The proportion of patients who followed any one of the strategies was high. In other words, the assumption of positivity was met. A patient received a rational treatment change at the treatment class level if and only if his predicted probability and relative change of predicted probability between previous and current visits was higher than the strategy's 363 threshold, and vice versa. There is a grace period for the predicted probability of having rational treatment change, within which the prescribing behavior of either having or not having a rational treatment change is acceptable. Models with different grace-period lengths were also investigated. The current grace period was chosen after balancing the proportion of patients who followed the strategy and the proportion of patients who had treatment change caused by uncertain reason. At the end, the treatment effects of 25 dynamic rational treatment change strategies for chronic treatment of pediatric CF patients were investigated using the dynamic marginal structural model and inverse probability weighting. Several models were analyzed; the fixed parameterization of the dynamic logistic marginal structural model with stabilized inverse probability weighing was preferred. In summary, patients who did not follow a treatment change regime had worse outcomes than patients following any regime. Among patients who followed different DTRs, the hazard ratio of developing mucoid PaPI first increased, then decreased, when the threshold of relative change of predicted probability increased. The regime in which the threshold of relative change of predicted probability equaled 1.831% always had the worst outcomes among the regimes that shared the same threshold of predicted probability. An optimal strategy, identified from 25 strategies, maximized the time to mucoid PaPI. The optimal strategy includes the following guidelines: the physician should not provide a treatment change on the treatment class level if the predicted probability of having a rational treatment change between the current and previous visit is lower than 0.088 and the relative change of predicted probability is lower than 0.222%; if the probability is higher than 0.098 and the relative change of predicted probability is higher than 0.222%, then the physician should 364 change the treatment on the treatment class level. If the probability ranges from 0.088 to 0.098, it is acceptable to either implement a treatment change or not. Generally speaking, these results are consistent with the concept of evidence-based medicine: treatment has to be changed if and only if it is supported by the clinical signals. With the identification of an optimal strategy, healthcare providers will be able to prescribe rationally without any uncertainty, supported by confirmed evidence rather than guessing whether a treatment change is needed. At the same time, the value-based formulary can be designed at the treatment class level: adding treatment or switching treatment will be reimbursed only if the prescription timing matches the threshold of the dynamic treatment regime. In such a value-based formulary, patients' lung function will be optimized so as to avoid or delay the need for extremely expensive treatments such as ivacaftor and ivacafotr/lumacaftor unless the healthcare provider has already prescribed all other treatments step by step (step therapy) and the scenario of suboptimal treatment effects has already occurred (prior authorization). Therefore, the annual cost of the health plan for CF patients could be well maintained without sacrificing healthcare utilization. Currently, several guidelines governing prescribing practices for chronic lung health maintenance treatments exist. However, rather than suggesting the order of prescription, the guidelines only categorize all treatments by the certainty of net benefits. Additionally, those evidences are generated by existing RCTs with small sample sizes and extremely narrow characteristics that do not represent the whole patient population. With the identification of the optimal dynamic treatment regime, using the longitudinal data under the causal inference, physicians can use the results of this study in the future to make treatment changes at the right time by following the optimal strategy. At the 365 same time, physicians can make personalized treatment change decisions for each patient confidently given the unique demographic values, clinical variables, and treatment histories at the baseline visit and current visit, rather than guessing whether the demographic and clinical characteristics of each individual patient match the studies' inclusion criteria from which the guidelines were generated. With the application of the optimal dynamic rational treatment change strategy, both healthcare providers and patients are supported with certain signs when a treatment change decision has to be made. Therefore, the clinical outcome-time to mucoid PaPI-will be maximally delayed at the CF patient population level. Even though the casualty of this study was generated by observational database, which emulated an RCT, this evidence still needs to be proved by RCT. The results of this study serve perfectly to design an RCT. The RCT would not have to investigate numerous DTRs; the results of this study have already narrowed down the randomized arms in the RCT to the targeted DTRs, which will investigate the causality between following each one of them and the delay in developing mucoid PaPI. At the same time, the study results could also support value-based formulary design by optimizing traditional treatment utilization-step therapy, tiered formulary, prior authorization, and other tools for managed care pharmacy-prior to reimbursement of extremely expensive medications. Insurance companies would reimburse only treatment changes that matched the optimal strategy. In this situation, this research can not only deliver the right therapy to the right patient at the right time but also at the right cost, indirectly controlling healthcare costs by optimizing traditional treatments and delaying the use of innovative yet expensive treatments. 366 Finally, the DTRs' grace period was caused by the low accuracy of differentiating between the observed treatment change and no treatment change within a specific range of values of predicted probability. However, in several years, after the optimal strategy is successfully identified and well accepted by healthcare providers in clinical practice, the number of patients who follow the optimal strategy will increase and the uncertainty range will shrink, shortening the grace period. In other words, the more evidence we have and the more physicians prescribe rationally according to the strategy, the less uncertainty remains. Ideally, the optimal strategy will be reestimated every couple of years using the latest cohort. Eventually, after several iterations, the grace period will disappear, and an optimal strategy with a clear-cut of threshold will be generated. During that time, healthcare providers and insurance companies will adjust their clinical practices and formularies, respectively, according to the optimal strategy in each of its iterations. APPENDIX A EXPLORATORY ANALYSIS OF INVESTIGATING THE QUALITY OF DATA IN CFFPR A.1 Test on the trend of treatment consistency by calendar year Table A.1. Self-reported records for each calendar year 368 Table A.2. Proportion of inconsistency by patient for each calendar year 369 Table A.3. Proportion of inconsistency by patient for each calendar year who had at least 2 visits in that year 370 Table A.4. Proportion of inconsistency by patient for each calendar year who had at least 2 visits for any calendar year 371 A.2 Test the discordance between self-reported treatment and claims refills information Table A.5. Number of treatments in claims database CF variable Vx770 Frequency Percent Cumulative Percent 791 1.02 1.02 aztreonam 2969 3.84 4.87 dornasealfa 61384 79.45 84.31 tobi 11785 15.25 99.57 335 0.43 100 77264 100 tobi_pod Total 372 373 Table A.6. Overall number of claims by calendar year Cumulative claimsYR Frequency 2000 161 0.21 0.21 2001 220 0.28 0.49 2002 747 0.97 1.46 2003 1028 1.33 2.79 2004 1483 1.92 4.71 2005 1584 2.05 6.76 2006 2870 3.71 10.47 2007 5995 7.76 18.23 2008 9774 12.65 30.88 2009 11584 14.99 45.88 2010 10409 13.47 59.35 2011 11049 14.3 73.65 2012 11535 14.93 88.58 2013 8019 10.38 98.96 2014 806 1.04 100 77264 100 100 Total Percent Percent Table A.7. Trend of number of refills per patient per calendar year 374 Table A.7. (continued) 375 Table A.8. Overall count of discordance for each grace period by treatment Grey period 0 30 60 90 double Grey period 0 30 60 90 double All 71019 71019 71019 71019 71019 All 71019 71019 71019 71019 71019 Encounter=1 25486 25486 25486 25486 25486 Encounter=1 49563 49563 49563 49563 49563 TOBI Agreement E=C=1 N % 3733 5.26 6524 9.19 8033 11.31 9089 12.80 9089 12.80 Discordance Discordance E=1 C=1 N % N % 21753 30.63 1220 1.72 18962 26.70 2253 3.17 17453 24.58 3053 4.30 16397 23.09 3754 5.29 16397 23.09 3758 5.29 Dornase alfa Agreement Claims=1 E=C=1 N % 25528 21236 29.90 33387 27696 39.00 37258 30866 43.46 39685 32820 46.21 33533 27804 39.15 Discordance Discordance E=1 C=1 N % N % 28327 39.89 4292 6.04 21867 30.79 5691 8.01 18697 26.33 6392 9.00 16743 23.58 6865 9.67 21759 30.64 5729 8.07 Claims=1 4953 8779 11098 12968 12973 376 Table A.8. (continued). Grey period 0 30 60 90 max Grey period 0 30 60 90 double All 71019 71019 71019 71019 71019 All 71019 71019 71019 71019 71019 Encounter=1 5555 5555 5555 5555 5555 Encounter=1 314 314 314 314 314 Aztreonam Agreement Discordance Discordance Claims=1 E=1 E=C=1 C=1 N % N % N % 1313 911 1.28 4644 6.54 402 0.57 2571 1812 2.55 3743 5.27 759 1.07 3359 2309 3.25 3246 4.57 1050 1.48 3954 2639 3.72 2916 4.11 1315 1.85 2486 1742 2.45 3813 5.37 744 1.05 Tobi_pod Agreement Discordance Discordance Claims=1 E=C=1 E=1 C=1 N % N % N % 170 46 0.06 268 0.38 124 0.17 275 74 0.10 240 0.34 201 0.28 373 98 0.14 216 0.30 275 0.39 454 115 0.16 199 0.28 339 0.48 298 81 0.11 233 0.33 217 0.31 377 Table A.8. (continued). Grey period 0 30 60 90 double All 71019 71019 71019 71019 71019 Encounter=1 476 476 476 476 476 Claims=1 278 340 368 395 340 Ivacaftor Agreement Discordance Discordance E=1 E=C=1 C=1 N % N % N % 214 0.30 262 0.37 64 0.09 260 0.37 216 0.30 80 0.11 285 0.40 191 0.27 83 0.12 298 0.42 178 0.25 97 0.14 260 0.37 216 0.30 80 0.11 378 Table A.9. Number of claims that matched to the encounter data when the patient reported on treatment in CFFPR (aztreonam) claimsYR 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 all aztreonamp0 aztreonamp30 aztreonamp60 aztreonamp90 aztreonampdou 1003 0 0 0 0 0 1617 0 0 0 0 0 2216 0 0 0 0 0 3796 0 0 0 0 0 6973 0 0 0 0 0 8877 0 0 0 0 0 9713 0 0 0 0 0 9342 204 379 481 549 365 9418 401 796 1024 1185 768 9292 407 774 984 1117 754 6777 259 519 706 840 500 1995 42 103 164 213 99 379 Table A.10. Proportion of discordance by each calendar year (aztreonam) claimsYR 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 all aztreonamp0 aztreonamp30 aztreonamp60 aztreonamp90 aztreonampdou 1003 0 0 0 0 0 1617 0 0 0 0 0 2216 0 0 0 0 0 3796 0 0 0 0 0 6973 0 0 0 0 0 8877 0.0048 0.0048 0.0048 0.0048 0.0048 9713 0.0165 0.0165 0.0165 0.0165 0.0165 9342 0.0635 0.0599 0.0608 0.0621 0.0604 9418 0.1369 0.1191 0.1140 0.1103 0.1217 9292 0.1540 0.1345 0.1276 0.1252 0.1360 6777 0.1687 0.1474 0.1337 0.1269 0.1493 1995 0.1940 0.1845 0.1799 0.1684 0.1845 380 Table A.11. Proportion of discordance by each patient for each calendar year (aztreonam) claimsYR 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 N 247 365 467 789 1522 1883 2031 2053 1896 1881 1588 783 aztreonamp0 0 0 0 0 0 0.0039 0.0132 0.0483 0.1226 0.1433 0.1520 0.1790 aztreonamp30 0 0 0 0 0 0.0039 0.0132 0.0452 0.1073 0.1236 0.1362 0.1651 aztreonamp60 0 0 0 0 0 0.0039 0.0132 0.0454 0.1035 0.1172 0.1207 0.1593 aztreonamp90 0 0 0 0 0 0.0039 0.0132 0.0460 0.0998 0.1143 0.1116 0.1418 aztreonampdou 0 0 0 0 0 0.0039 0.0132 0.0454 0.1093 0.1249 0.1377 0.1659 381 Table A.12. Proportion of discordance when patient claimed not on treatment by calendar year (aztreonam) claimsYR 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 all aztreonamp0 aztreonamp30 aztreonamp60 aztreonamp90 aztreonampdou 1003 0 0 0 0 0 1617 0 0 0 0 0 2216 0 0 0 0 0 3796 0 0 0 0 0 6973 0 0 0 0 0 8877 0 0 0 0 0 9713 0 0 0 0 0 9342 0.0094 0.0170 0.0229 0.0272 0.0165 9418 0.0136 0.0257 0.0353 0.0419 0.0255 9292 0.0108 0.0208 0.0286 0.0345 0.0204 6777 0.0106 0.0192 0.0261 0.0326 0.0187 1995 0.0070 0.0175 0.0306 0.0371 0.0165 382 APPENDIX B EXPLORATORY ANALYSIS OF INVESTIGATING THE RELATIONSHIP BETWEEN DRUG APPROVAL AND IRRATIONAL TREATMENT CHANGE Table B.1. The date of when evidence was generated. 384 Table B.2. Result of association between drug approval and irrational treatment change 385 Table B.2. (continued). 386 APPENDIX C EXPLORATORY ANALYSIS OF INVESTIGATING THE IMPACT OF DIFFERENT MEASUREMENTS ON THE NUMBER OF VARIABLES THAT WOULD BE SELECTED BY ELASTIC NET Table C.1. An example of six ways of identifying the outcome ID Visit 1 1 1 1 1 1 0 1 2 3 4 5 Treatment classes Clinical signals FEV1% predicted 75% 52% 64% 66% 65% 64% Mucolytics 1 1 1 1 1 1 Inhaled antibiotics Anti-inflammatory Bronchodilators 0 0 0 0 0 0 1 0 2 1 0 1 1 1 1 1 1 0 Treatment change including BD Loose Neutral Strict . . . 1 1 1 1 1 1 1 1 0 1 0 0 . . . Treatment change not including BD Loose Neutral Strict . . . 1 1 1 0 0 0 1 1 0 0 0 0 . . . 388 Figure C.1. The cross-validation figures conditional on different types of measurement for rational treatment change with strict assumption in imputed dataset 1. 389 Figure C.1. (continued). 390 Table C.2. The minimum of mean cross-validated error using deviance as the measurement for treatment change that includes BD use under strict definition. Alpha 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.879694 0.850469 0.850396 0.850376 0.850367 0.850361 0.850361 0.850357 0.850357 0.850356 0.850356 2 0.881423 0.852424 0.852333 0.852300 0.852286 0.852279 0.852274 0.852271 0.852270 0.852268 0.852266 3 0.880417 0.851256 0.851190 0.851174 0.851165 0.851162 0.851160 0.851158 0.851157 0.851157 0.851156 4 0.881174 0.852553 0.852461 0.852432 0.852418 0.852414 0.852410 0.852407 0.852406 0.852405 0.852404 Imputed dateset 5 6 0.880946 0.880628 0.851965 0.851430 0.851873 0.851351 0.851847 0.851321 0.851831 0.851311 0.851826 0.851303 0.851821 0.851300 0.851819 0.851299 0.851817 0.851298 0.851816 0.851297 0.851815 0.851294 7 0.880037 0.851045 0.850964 0.850941 0.850931 0.850929 0.850925 0.850925 0.850924 0.850924 0.850923 8 0.879907 0.850671 0.850597 0.850572 0.850562 0.850557 0.850554 0.850553 0.850552 0.850550 0.850548 9 0.880106 0.851103 0.851017 0.850991 0.850978 0.850971 0.850968 0.850965 0.850963 0.850962 0.850962 10 0.880290 0.851290 0.851204 0.851180 0.851168 0.851163 0.851158 0.851155 0.851153 0.851152 0.851151 Mean 0.880462 0.851421 0.851339 0.851313 0.851302 0.851297 0.851293 0.851291 0.851290 0.851289 0.851288 SD 0.000541 0.000659 0.000652 0.000649 0.000648 0.000648 0.000647 0.000646 0.000646 0.000646 0.000646 391 Table C.3. The minimum of mean cross-validated error using deviance as the measurement for treatment change that not include BD use under strict definition. Alpha 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.513712 0.506486 0.506439 0.506416 0.506404 0.506395 0.506386 0.506380 0.506375 0.506370 0.506368 2 0.513815 0.506602 0.506565 0.506545 0.506534 0.506526 0.506521 0.506517 0.506514 0.506511 0.506510 3 0.513591 0.506285 0.506239 0.506218 0.506204 0.506195 0.506190 0.506189 0.506186 0.506184 0.506184 4 0.514084 0.506779 0.506739 0.506713 0.506702 0.506694 0.506686 0.506683 0.506681 0.506678 0.506678 Imputed dataset 5 6 0.513882 0.513225 0.506607 0.505879 0.506555 0.505823 0.506529 0.505802 0.506511 0.505789 0.506501 0.505780 0.506496 0.505772 0.506492 0.505767 0.506489 0.505763 0.506485 0.505761 0.506483 0.505758 7 0.513310 0.506029 0.505977 0.505957 0.505946 0.505941 0.505934 0.505939 0.505936 0.505934 0.505933 8 0.513633 0.506293 0.506249 0.506227 0.506215 0.506208 0.506200 0.506196 0.506193 0.506191 0.506189 9 0.513643 0.506276 0.506230 0.506213 0.506203 0.506195 0.506189 0.506185 0.506182 0.506179 0.506177 10 Mean SD 0.513989 0.513688 0.000260 0.506802 0.506404 0.000291 0.506757 0.506357 0.000294 0.506734 0.506335 0.000293 0.506723 0.506323 0.000293 0.506715 0.506315 0.000292 0.506709 0.506308 0.000293 0.506706 0.506306 0.000292 0.506694 0.506301 0.000291 0.506690 0.506298 0.000291 0.506688 0.506297 0.000291 392 APPENDIX D EXPLORATORY ANALYSIS OF INVESTIGATING THE RELATIONSHIP BETWEEN FREQUENCY OF VISIT AND DETERIORATION OF LUNG FUNCTION 394 Table D.1. The association between relative FEV1% predicted change and frequency of visit when patient has more than 1 year records Table D.1. (continued) 395 396 Table D.2. The association between relative FEV1% predicted change and frequency of visit when patient has more than 2 year records Table D.2. (continued) 397 APPENDIX E EXPLORATORY ANALYSIS OF INVESTIGATING THE INFLUENCES OF USING DIFFERENT METHODS TO DEFINE INDEX DATE ON BASELINE VARIABLES AND CLINICAL OUTCOMES Table E.1. Baseline characteristics using the first visit as index date (continuous variables). 399 Table E.1. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 400 Table E.1. (continued). 401 Table E.1. (continued). Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 402 Table E.2. Baseline characteristics using the first visit as index date (categorical variables). 403 Fisher Exact test was conducted if more than 25% of cells have less than 5 observations Table E.2. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 404 Table E.2. (continued) 405 Fisher Exact test was conducted if more than 25% of cells have less than 5 observations Table E.2. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 406 Table E.2. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 407 Table E.3. Baseline characteristics using the first visit as index date (continuous variables). 408 Table E.3. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 409 Table E.3. (continued) 410 Table E.3. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 411 Table E.4. Baseline characteristics using the first visit as index date (categorical variables). 412 Fisher Exact test was conducted if more than 25% of cells have less than 5 observations Table E.4. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 413 Table E.4. (continued) 414 Fisher Exact test was conducted if more than 25% of cells have less than 5 observations Table E.4. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 415 Table E.4. (continued) Fisher Exact test was conducted if more than 25% of cells have less than 5 observations 416 Table E.5. Length since last FEV1 was measured using the first visit as index date 417 APPENDIX F DATA MANAGEMENT OF MISSING VALUES 419 In this appendix, the procedures of handling missing values were described in detail. All missing values in time-independent variables, which were caused by artificial created quarterly visits, were calculated using the last observation carried forward method. The arithmetic mean was calculated for time-varying demographic variables, such as height and weight, using the relative change of those variables among all visits that occurred 1 year before and 1 year after the visit, which contains the missing information. After above procedures, the only variable that had missing values was FEV1. To better impute FEV1 by preventing the consecutive missing since index date, a new index date was identified for Aim 2 and 3. Moreover, four questions were investigated simultaneously using different outcomes among 12 models to identify the most appropriate model for multiple imputation. Those questions were: which method, MCMC or FCS, should be chosen; whether to include the indicator or not; whether to include preexisting lung function variables or not; and which assumption (strict, loose, or neutral) to choose. The cohort in Aim 1 was used as the foundation to reformat the visit data as each patient had a routine visit quarterly. Other than FEV1, the missing values for the rest of the variables were imputed in Aim 1. However, because of the reformatting, for some patients who did not have a visit during a 3-month interval a visit was artificially created. For those visits, comorbidities, treatment-related variables, and fixed demographic information, such as race and ethnicity, were captured using the last observation carried forward method. The rationale is that 3 months is not enough time to have any extreme changes in those variables. Considering the uncertainty of imputed FEV1 when the missing values occurred consecutively, a new index date was identified following the 420 conclusion of Assumption 4 (3.10.1.4) for the cohort in Aim 2 and 3. Patients who had more than a 6-month grace period between the original index date in Objective 1 and when FEV1 was first measured were excluded from the cohort. The first measured date was applied as the index date for the rest of the patients. The number of patients decreased to 4,760 in the new cohort for Aim 2 and 3, and included 79,724 visits. The index date was the same for a majority of patients (4,174/4,760 = 87.69%), the rest had a new index date which was delayed from 0 to 6 months from the original index date. After the above procedures, FEV1 was the only variable that had missing values in the dataset. Two characteristics made the traditional imputation technique inapplicable to imputing the FEV1 value. First, the changing trend of FEV1 is not linear. From a longterm perspective, it changes gradually, deteriorating over time. But in the short term, it fluctuates drastically. Moreover, FEV1 is the key clinical signal in this study, serving as both exposure and outcome in different objectives. Any inappropriate imputation would bias the final result. Therefore, a more advanced methodology was applied to closely impute those missing values. Compared with traditional imputation techniques, multiple imputation (MI) is superior, since it restores some of the lost variability by adding a residual term to the predicted scores from the regression imputation. That residual term is randomly drawn from a normal distribution with a mean of zero and variance equal to the residual variance from the regression model. This method produces unbiased coefficient estimates under missing at random (MAR), even including standard errors, which are produced during a regression estimation. Therefore, the result of MI is less biased than the single imputation approach. However, it may still be attenuated compared with the real residual 421 for the missing values. So, while multiple imputation is the most advanced and accurate method to impute missing values, without appropriately identifying the missing mechanisms, it may still bias the imputed result. Rather than imputing FEV1 directly, the change of FEV1 (delFEV1) between current visit and future visit was imputed as the outcome. There are two rationales for this. First, the value of FEV1 is an accumulated clinical variable; it reflects the pulmonary damage that a patient has suffered from infancy. Therefore, the demographics, other clinical variables, and treatment information of the current visit only affect the change of FEV1 between current and future visits assuming those conditions are held between the two visits. Moreover, the predictive accuracy is better using the change of FEV1, since the range of variation is narrower compared with calculating FEV1 directly. The change of FEV1 is less likely to have larger random errors, which increases the accuracy of the prediction. There are a couple of issues associated with the imputation of delFEV1. The first issue is related to the time point of the missing values. A missing value for delFEV1 could be caused by failing to capture FEV1 at either the current visit or future visit. Another one is related to the rationale of the missing values, which could be caused by either failing to capture the FEV1 in the routine visit or by artificial reformatting. As mentioned previously, if there was no visit during the 3-month interval, a visit was artificially created to represent the quarterly visit, and thus all the values for that visit were missing. These values were denoted as ‘missing values at artificial visit.' Similarly, ‘missing values at existing visit' was applied to refer to the missing delFEV1s that failed to be captured at the real visit. In order to investigate whether the rationale of missing 422 values in future visits affects the imputation of delFEV1, three assumptions (loose, neutral, and strict) were made. For the loose assumption, all missing delFEV1 values were imputed regardless of the time point and rationale of missing values. Conversely, under the strict assumption, missing delFEV1 was only imputed if it was caused by failing to capture FEV1 at the current visit or if the current visit was artificially created. The other missing delFEV1s were set as 0. For the neutral assumption, the missing delFEV1 was calculated, as long as it was not caused by a missing value at an artificially created future visit. The rationale of those assumptions varied. Under the strict assumption, the patient was assumed to have a stable lung function, the delFEV1 was not changed as long as the missing value occurred in the next visit regardless of the rationale. Conversely, in the loose assumption, there was no assumption about the changing trend of lung function when the value was missed at the future visit. It was assumed that all missing values should be imputed, and the imputation could handle all missing delFEV1. The neutral assumption, in contrast, had the most reasonable rationale that the patient had a stable lung function if he did not have a routine future visit. However, the imputation of the rest of the missing values for delFEV1 was still needed. Table F.1 presents these assumptions. Each cell represents all missing values that shared the same mechanism. For example, A represents all missing delFEV1s that were caused by failing to capture the FEV1 at the current visit. For the strict assumption, the missing values that belonged to B and D were not imputed. Conversely, all missing values were imputed under the loose assumption. If a missing value belonged to cell D, it was not imputed under the neutral assumption. In the cohort, the missed FEV1 was imputed according to the delFEV1 and FEV1 that was measured in the consecutive visit. 423 Other than investigating whether the rationale for the missing values in the future visit affects the imputation of delFEV1, questions of whether the artificially created current visit affects the imputation, and whether including preexisting lung function variables could improve the imputation were also investigated. To answer the first question, an indicator was created for all missing values that belonged to cell C in Table F.1. The preexisting lung function variables, such as the change of FEV1 between the previous and current visit (predelFEV1), and FEV1 in the previous visit (preFEV1), were included in some models. Given that other variables that were included in the models were fixed, the comparisons of delFEV1 between models that included indicator and not, and between models that included preexisting lung function variables or not were conducted to solve the related questions. The multiple imputation was conducted in the following manner. First, demographic variables and comorbidities were included to impute the delFEV1. Other clinical variables and treatment-related variables were not included, as the delFEV1 was the signal to direct the decision-making of treatments. At the same time, considering that FEV1 was one of the main predictors of having a rational treatment change, if the missing delFEV1 was imputed by other clinical variables and treatment-related variables, it could introduce bias and affect the prediction for the treatment change. Therefore, the change of FEV1 between the current and future visits was imputed in varied models given different assumptions. To identify the imputation model that was associated with the best performance, other than assumptions, three questions mentioned above were also investigated. Moreover, two methods of multiple imputation (Markov Chain Monte Carlo [MCMC] and Fully Conditional Specification [FCS]) were applied in the study, since 424 they each entail different assumptions. Compared to MCMC, FCS doesn't assume the joint normal distribution. Last, after choosing the model for MI, the missing delFEV1s in the original dataset were imputed 10 times, which is enough to capture the variance of imputed values, and 10 imputed datasets were created accordingly. The only differences between these datasets were in the FEV1 values, which were imputed through delFEV1s and FEV1s that were captured in the consecutive visits. According to whether the model included the indicator, whether it included preexisting lung function variables, and the three assumptions, 12 models were built for the study. The following variables that were measured in the current visit were included as independent variables in all models under different assumptions: age, height, weight, mutation class 1, mutation class 2, whether the patient had F508 mutation, gender, status of lung transplant, whether the patient was infected by aspergillus, an Burkholderia species, B. cepacia, Candida, MAI, MRSA, MSSA, other Gram-negative microorganisms, S. aureus; whether the patient had ABPA, CFRD, DIOS, GERD, hemoptysis, or TB. From Figure F.1 to Figure F.13, different outcomes, such as trace of delFEV1, autocorrelation of delFEV1, distribution of delFEV1, among those 12 models were compared to investigate the most appropriate model for multiple imputation. First, four trace plots of delFEV1 in different models are presented in Figure F.2. Compared with the right column, the left column doesn't include preexisting lung function variables, and from the top to bottom, the figure represents the model under strict and loose assumptions. Figure F.1 depicts the trace plot of the model that does not include the indicator under the strict assumption. The x axis is the number of iterations and the y axis is the mean delFEV1 in each iteration. Figure F.1 indicates two 425 characteristics of better performance in the MI model, a stable posterior distribution and reaching a stationary phase quickly. The stable posterior distribution was supported by the mean, which remained relatively constant with no trend between the mean and the number of iteration. The stable phase was achieved immediately, much earlier than the burn-in stage (200 iterations). The trend only existed in four models in Figure F.2 and all of them included the indicator variable. Therefore, compared to the model that did not include the indicator to differentiate the mechanism of missingness, after including the indicator, the outcome was not converged (Figure F.2), at least under loose and strict assumptions. Figure F.4 supports the result from another perspective that it was highly likely to have autocorrelation among those iterations. Figures F.3 and F.4 summarize the result from the related model in Figures F.1 and F.2. The x axis represents the lag, each one unit covers 100 iterations, and the y axis represents the correlation between two iterations. The blue band indicates the 95% CI of not having the correlation between two iterations. The lower the chance of having autocorrelation between iterations, the better a model is. Figure F.3 indicates a low chance of having autocorrelation between iterations: the correlation decreases rapidly from 1 to 0 and then locates in the blue band. However, the chance of having autocorrelation between iterations was high for the other four models in Figure F.4. The correlation was not only out of the blue band, but also close to 1. Therefore, including the indicator decreased the performance of a model, at least under loose and strict assumptions, since there was a strong correlation between imputed values in adjacent imputed datasets. According to the rationale of missing values, Figure F.5 and Figure F.6 were 426 created, which indicated the distributions of imputed delFEV1 that were caused by artifical reformatting and failure to capture it respectively in 1 of the 10 imputed datasets (imputed1). The other imputed datasets shared the same trends. Compared to the right column, the left column doesn't include preexisting lung function variables. From the top to the bottom, the figure represents the model under strict, neutral, and loose assumptions, respectively. The green histogram and black dotted line represent the distribution of imputed delFEV1 in the model that included the indicator, and the blue histogram and red dotted line represent the model that did not take the indicator into consideration. The x axis represents the predicted value of delFEV1, and the y axis represents the proportion of a predicted value in the specific range. In Figure F.5, if the neutral assumption was applied, there was barely any difference in the distribution of imputed delFEV1 between the model that included the indicator and the one that did not. However, the difference was huge if the strict assumption was followed, and the direction was even reversed. Under the strict assumption, models that did not include the preexisting lung function variables had higher imputed values of delFEV1 after including the indicator. Conversely, if the model included the preexisting lung function variables, after including the indicator, the imputed delFEV1 would be more likely to concentrate around 0. Similarly, under the loose assumption, for the model that included preexisting lung function variables, after including the indicator, the imputed delFEV1 would also be more likely to concentrate around 0. Generally speaking, the imputed values in the neutral assumption would not be affected by whether the model included the indicator. But, after including the indicator, the imputed delFEV1 would change quite a bit under both the loose and strict assumptions. Figure F.6 shows that the difference in imputed delFEV1 was trivial 427 regardless of whether the model included the indicator, if only considering the missing value that failed to be captured. The only exception occurred when preexisting lung function variables were included under the neutral assumption. After including the indicator, the imputed delFEV1 would increase. Figures F.7 and F.8 were created to better visualize the difference in imputed delFEV1 between the model that did not include the indicator and the model that did include the indicator. Generally speaking, all figures were normally distributed, using 0 as mean. However, compared to the difference of imputed delFEV1 that occurred in the artificial visit, the one that occurred in the existing visit was more likely to concentrate on 0, since the maximum value of percentage was higher in Figure F.8 compared to the related figure in Figure F.7. In other words, compared to an artificial visit, there was less difference of imputed delFEV1 between models that included and did not include the indicator in the existing visit. Similar procedures were also conducted using the FCS rather than the MCMC method. The conclusion was that those models were converged regardless of including the indicator or not. Unlike when using MCMC, including the indicator did not affect the result of the multiple imputation using the FCS method. Figures F.9, F.10, and F.11 support the above conclusion. Rather than presenting the results of all the models, only the model that includes the indicator under the strict assumption is presented in Figure F.9. The other results in the different models shared the same characteristics. Unlike the trace plots of MCMC, which only presents the result of one imputation, the trace plot of FCS presents the results of all imputations at the same time. In Figure F.9, the median of imputed delFEV1 in each imputation chain is overlayed on top of each other; each color 428 represents the result of one imputation. All ten imputations were converged quickly. The distribution of delFEV1 in two models, including or not including the indicator under the strict assumption, was generated in Figures F.10 and F.11 for the missing values that occurred at the artificial visit and existing visit, respectively. There was no difference in imputed delFEV1, regardless of the rationale of missing values. Compared with MCMC, which provides reliable estimates, even the assumption of multivariate normal distribution is violated, as long as the sample size is large enough,164,166 the chance of providing reliable estimates is lower if any distribution of imputed variable is misspecified in FCS. Therefore, the MCMC method was applied. To compare the influence of different assumptions on the imputed delFEV1, Figures F.12 and F.13 were created. There are four subfigures in Figures F.12 and F.13, respectively. The left and right column represents the distribution of imputed delFEV1 if the missing value only occurred at an artificial visit, and at an existing visit, respectively. In Figure F.12, from the top to bottom, those figures represent the result of original models, and models that included the indicator. Similarly, in Figure F.13, from the top to bottom, those figures represent the result of models that included preexisting lung function variables, and models that included both the indicator and preexisting lung function variables. The blue, green, and purple histograms indicate the proportion of visit that had imputed delFEV1 within the specific range under strict, neutral, and loose assumptions, respectively. Red, brown, and yellow dotted lines were also assigned to those three assumptions, respectively. Generally speaking, all figures were normally distributed, and the mean was close to 0 but slightly to the right. At the same time, the difference of distribution in imputed delFEV1 between the strict and neutral assumptions 429 was trivial regardless of the model and rationale of missing values. The only exception existed in the model which included both the indicator and preexisting lung function variables, and only when the missing value occurred at an existing visit (right bottom corner in Figure F.13). In this figure, the neutral assumption was more likely to have small values on imputed delFEV1 than the strict assumption. The loose assumption always had different distributions compared with either the strict or neutral assumption. However, the direction of difference was not consistent. The majority of the time, the loose assumption had the highest chance of having small values on imputed delFEV1. But, if missing values only occurred at the artificial visit and the model included the indicator, the loose assumption was more likely to have large values on imputed delFEV1. When it comes to the model that included both the indicator and preexisting lung function variables, if the missing value only occurred at an existing visit, the chance of having a small value on imputed delFEV1 under the loose assumption was higher than the strict assumption, but lower than the neutral assumption. Moreover, there was only one scenario in which the distribution was not normally distributed. In the model that included both the indicator and preexisting lung function variables, the missing value that occurred at an artificial visit skewed to the left under the loose assumption. The upper boundary for the majority of the models was around 12.5%. However, when the missing value only occurred at an existing visit, for models that included preexisting lung function variables, or models that included both the indicator and preexisting lung function variables, the upper boundary was around 15%. Finally, in order to identify the model that was associated with the best performance, an analysis was conducted. In this study, the delFEV1 was imputed to 430 calculate the missing value on FEV1. The FEV1 value could be calculated from two different directions in the same visit. If a missing value of FEV1 occurred at the current visit, it could be calculated by forward calculation, adding the delFEV1 and FEV1, both of which were measured in previous visit; or backward calculation, by subtracting the delFEV1 in the current visit from the FEV1 in the next visit. Hypothetically, the result should be the same regardless of the type of calculation that was applied. The difference between forward and backward calculation quantifies the performance of the model for MI. Therefore, O1 and O2 were measured to quantify the performance of each model. O1 and O2 were calculated using the same numerator, the sum of the square of difference between forward and backward calculation. However, the denominator for O1 was the number of difference in FEV1 between the forward and backward calculations. All numbers of visits in the cohort were applied as the denominator for O2. Table F.2 shows the results of the twelve models. Because of the increase in the denominator, the results of O1 were consistently larger than O2. Compared to the model that did not include preexisting lung function variables, the one that included those variables, the majority of time, had smaller values in O1 and O2. If the model included the indicator and was operating under the neutral assumption, after including the preexisting lung function variables, the result for both O1 and O2 would increase. The inclusion of preexisting lung function variables increased the chance of shrinking the range of imputed delFEV1. If a model did not include preexisting lung function variables, after including the indicator, the result would decrease for both strict and neutral assumptions. The direction of effect reversed for the loose assumption. However, if a model included preexisting lung function variables, after including the indicator, all the 431 effects mentioned above reversed for each assumption, respectively. Compared to the loose or strict assumptions, the neutral assumption consistently had the smallest values in both O1 and O2. Therefore, considering the results of this analysis together with the results in the above sections, the model that included preexisting lung function variables and did not include the indicator under the neutral assumption using the MCMC method to impute missing delFEV1 was chosen. All the missing FEV1s in the 10 imputed datasets were calculated using this model. Table F.1. Assumptions for MI Rationale of missing Failing to measure Artificial reformatting Time point of missing Current visit Future visit A B C D 432 Table F.2. The difference in FEV1 between forward and backward calculations in the same visit. Strict assumption Include No indicator indicator Neutral assumption Include No indicator indicator Loose assumption Include No indicator indicator Without pre-existing lung function variables O1 0.372523 0.328909 0.315875 0.294333 0.400694 0.405471 O2 0.021767 0.019219 0.018457 0.017198 0.023413 0.023692 With pre-existing lung function variables O1 0.300238 0.318509 0.277217 0.306027 0.394175 0.379669 O2 0.017543 0.018611 0.016198 0.017882 0.023032 0.022185 * O1 and O2 were calculated using the same numerator, the sum of square of difference between calculating the FEV1 in the same visit forward and backward. However, the denominator for O1 is the number of difference in FEV1 between the forward calculation and backward calculation. However, all number of visits in the cohort was applied for O2. 433 434 Figure F.1. Trace plots using delFEV1 as the outcome to investigate the performance of the MI model (the figures represent models that do not include the indicator under strict assumption) 435 Figure F.2. Trace plots using delFEV1 as the outcome to investigate the performance of MI model (compared to the right column, the left column doesn't include preexisting lung function variables; from the top to bottom, the figures represent models under strict and loose assumptions) 436 Figure F.3. Autocorrelation plot using delFEV1 as the outcome to investigate the performance of MI model (the figures represents models that do not include the indicator under the strict assumption) 437 Figure F.4. Autocorrelation plot using delFEV1 as the outcome to investigate the performance of the MI model (compared to the right column, the left column doesn't include preexisting lung function variables; from the top to bottom, the figures represent the model under strict and loose assumptions) 438 Figure F.5. Distribution of delFEV1 in two models of MI (green represents the one with indicator, blue represents the one without indicator) when missing value occurred at artificial visits (compared to the right column, the left column doesn't include preexisting lung function variables; from the top to bottom, the figure represents model under strict, neutral, and loose assumptions, respectively) 439 Figure F.6. Distribution of delFEV1 in two models of MI (green represents the one with indicator, blue represents the one without indicator) when the missing value occurred at existing visits (compared with right column, the left column doesn't include preexisting lung function variables; from the top to bottom, the figure represents model under strict, neutral, and loose assumptions, respectively) 440 Figure F.7. Distribution of difference in imputed delFEV1 between the model that did not include the indicator and the one that included the indicator when the missing value occurred at artificial visits (compared to the right column, the left column doesn't include preexisting lung function variables; from the top to bottom, the figure represents the model under strict, neutral, and loose assumptions, respectively) 441 Figure F.8. Distribution of difference on imputed delFEV1 between the model that did not include the indicator and the one that did include the indicator when the missing value occurred at existing visits (compared to the right column, the left column doesn't include preexisting lung function variables; from the top to bottom, the figure represents the model under strict, neutral, and loose assumptions, respectively) 442 Figure F.9. Trace plots of imputed delFEV1 in 10 imputations (the model that included the indicator under the strict assumption). 443 Figure F.10. Distribution of delFEV1 in two models of MI (green represents the one with the indicator, blue represents the one without the indicator) when the missing value occurred at artificial visits (the model that included the indicator under the strict assumption). 444 Figure F.11. Distribution of delFEV1 in two models of MI (green represents the one with the indicator, blue represents the one without the indicator) when the missing value occurred at existing visits (the model that included the indicator under the strict assumption). 445 Figure F.12. Distribution of delFEV1 in models of MI (blue: strict; green: neutral; purple: loose assumptions). The left and right column represents the distribution of imputed delFEV1 if the missing value only occurred at an artificial visit, and at an existing visit, respectively. From the top to bottom, those figures represent the result of original models, and models that included the indicator. 446 Figure F.13. Distribution of delFEV1 in models of MI (blue: strict; green: neutral; purple: loose assumptions). The left and right column represent the distribution of imputed delFEV1 if the missing value only occurred at an artificial visit, and at an existing visit, respectively. From the top to bottom, those figures represent the result of models that included preexisting lung function variables, and models that included both the indicator and preexisting lung function variables. REFERENCES 1. Riordan JR, Rommens JM, Kerem B, et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science. Sep 8 1989;245(4922):1066-1073. 2. Welsh MJ, Ramsey BW, Accurso FJ, Cutting GR. The metabolic and molecular basis of inherited diseases (8th ed). New York, NY: McGraw-Hill; 2000. 3. Kerem B, Rommens JM, Buchanan JA, et al. Identification of the cystic fibrosis gene: genetic analysis. Science. Sep 8 1989;245(4922):1073-1080. 4. Rommens JM, Iannuzzi MC, Kerem B, et al. Identification of the cystic fibrosis gene: chromosome walking and jumping. Science. Sep 8 1989;245(4922):10591065. 5. Welsh MJ, Smith AE. Molecular mechanisms of CFTR chloride channel dysfunction in cystic fibrosis. Cell. Jul 2 1993;73(7):1251-1254. 6. Li W, Sun L, Corey M, et al. Understanding the population structure of North American patients with cystic fibrosis. Clinical Genetics. Feb 2011;79(2):136-146. 7. Wiehe M, Arndt K. Cystic fibrosis: a systems review. American Association of Nurse Anesthetists Journal. Jun 2010;78(3):246-251. 8. Foundation CF. Cystic Fibrosis Foundation patient registry reports. 2015; http://www.cff.org/Our-Research/CF-Patient-Registry/CF-Patient-RegistryReports/. Accessed Nov. 24th, 2015. 9. Phillips KA, Ann Sakowski J, Trosman J, Douglas MP, Liang SY, Neumann P. The economic value of personalized medicine tests: what we know and what we need to know. Genetics in Medicine: official journal of the American College of Medical Genetics. Mar 2014;16(3):251-257. 10. Foundation CF. Cystic Fibrosis Foundation patient registry: 2005 annual data report to the center directors. Cystic Fibrosis Foundation;2006. 448 11. Liou TG, Adler FR, Fitzsimmons SC, Cahill BC, Hibbs JR, Marshall BC. Predictive 5-year survivorship model of cystic fibrosis. American Journal of Epidemiology. Feb 15 2001;153(4):345-352. 12. Gibson RL, Burns JL, Ramsey BW. Pathophysiology and management of pulmonary infections in cystic fibrosis. American Journal of Respiratory and Critical Care Medicine. Oct 15 2003;168(8):918-951. 13. AHRQ. Linking data for health services research - a framework and guidance for researchers (draft report). 2013. 14. Li Z, Kosorok MR, Farrell PM, et al. Longitudinal development of mucoid Pseudomonas aeruginosa infection and lung disease progression in children with cystic fibrosis. The Journal of the American Medical Association. Feb 2 2005;293(5):581-588. 15. Kosorok MR, Zeng L, West SE, et al. Acceleration of lung disease in children with cystic fibrosis after Pseudomonas aeruginosa acquisition. Pediatric Pulmonology. Oct 2001;32(4):277-287. 16. Anwar H, Dasgupta M, Lam K, Costerton JW. Tobramycin resistance of mucoid Pseudomonas aeruginosa biofilm grown under iron limitation. Journal of Antimicrobial Chemotherapy. Nov 1989;24(5):647-655. 17. Ciofu O, Fussing V, Bagge N, Koch C, Hoiby N. Characterization of paired mucoid/nonmucoid Pseudomonas aeruginosa isolates from Danish cystic fibrosis patients: antibiotic resistance, beta-lactamase activity and RiboPrinting. The Journal of Antimicrobial Chemotherapy. Sep 2001;48(3):391-396. 18. Hodges NA, Gordon CA. Protection of Pseudomonas aeruginosa against ciprofloxacin and beta-lactams by homologous alginate. Antimicrobial Agents and Chemotherapy. Nov 1991;35(11):2450-2452. 19. Nichols WW, Dorrington SM, Slack MP, Walmsley HL. Inhibition of tobramycin diffusion by binding to alginate. Antimicrobial Agents and Chemotherapy. Apr 1988;32(4):518-523. 20. Briesacher BA, Quittner AL, Fouayzi H, Zhang J, Swensen A. Nationwide trends in the medical care costs of privately insured patients with cystic fibrosis (CF), 2001-2007. Pediatric Pulmonology. Aug 2011;46(8):770-776. 21. Ouyang L, Grosse SD, Amendah DD, Schechter MS. Healthcare expenditures for privately insured people with cystic fibrosis. Pediatric Pulmonology. Oct 2009;44(10):989-996. 449 22. Colombo C, Dacco V, Alicandro G, et al. Cost of cystic fibrosis: analysis of treatment costs in a specialized center in northern Italy. Advances in Therapy. Feb 7 2013. 23. Reeves A. Vertex's Orkambi Approved For CF, Prices On High Side. Investor's Business Daily. July 2nd, 2015, 2015. 24. Hamosh A, FitzSimmons SC, Macek M, Jr., Knowles MR, Rosenstein BJ, Cutting GR. Comparison of the clinical manifestations of cystic fibrosis in black and white patients. The Journal of Pediatrics. Feb 1998;132(2):255-259. 25. Grebe TA, Seltzer WK, DeMarchi J, et al. Genetic analysis of Hispanic individuals with cystic fibrosis. American Journal of Human Genetics. Mar 1994;54(3):443-446. 26. Hill ID, MacDonald WB, Bowie MD, Ireland JD. Cystic fibrosis in Cape Town. South African Medical Journal = Suid-Afrikaanse tydskrif vir geneeskunde. Feb 6 1988;73(3):147-149. 27. Imaizumi Y. Incidence and mortality rates of cystic fibrosis in Japan, 1969-1992. American Journal of Medical Genetics. Aug 28 1995;58(2):161-168. 28. Salvatore D, Buzzetti R, Baldo E, et al. An overview of international literature from cystic fibrosis registries. Part 3. Disease incidence, genotype/phenotype correlation, microbiology, pregnancy, clinical complications, lung transplantation, and miscellanea. Journal of Cystic Fibrosis. Mar 2011;10(2):71-85. 29. Southern KW, Munck A, Pollitt R, et al. A survey of newborn screening for cystic fibrosis in Europe. Journal of Cystic Fibrosis. Jan 2007;6(1):57-65. 30. Rosenstein BJ, Cutting GR. The diagnosis of cystic fibrosis: a consensus statement. Cystic Fibrosis Foundation Consensus Panel. The Journal of Pediatrics. Apr 1998;132(4):589-595. 31. Wilschanski M, Durie PR. Patterns of GI disease in adulthood associated with mutations in the CFTR gene. Gut. Aug 2007;56(8):1153-1163. 32. Comeau AM, Accurso FJ, White TB, et al. Guidelines for implementation of cystic fibrosis newborn screening programs: Cystic Fibrosis Foundation workshop report. Pediatrics. Feb 2007;119(2):e495-518. 33. Wilcken B, Wiley V, Sherry G, Bayliss U. Neonatal screening for cystic fibrosis: a comparison of two strategies for case detection in 1.2 million babies. The Journal of Pediatrics. Dec 1995;127(6):965-970. 450 34. Grosse SD, Boyle CA, Botkin JR, et al. Newborn screening for cystic fibrosis: evaluation of benefits and risks and recommendations for state newborn screening programs. Recommendations and Reports : Morbidity and Mortality Weekly Report. Oct 15 2004;53(RR-13):1-36. 35. Gibson LE, Cooke RE. A test for concentration of electrolytes in sweat in cystic fibrosis of the pancreas utilizing pilocarpine by iontophoresis. Pediatrics. Mar 1959;23(3):545-549. 36. Eng W, LeGrys VA, Schechter MS, Laughon MM, Barker PM. Sweat-testing in preterm and full-term infants less than 6 weeks of age. Pediatric Pulmonology. Jul 2005;40(1):64-67. 37. Farrell PM, Rosenstein BJ, White TB, et al. Guidelines for diagnosis of cystic fibrosis in newborns through older adults: Cystic Fibrosis Foundation consensus report. The Journal of Pediatrics. Aug 2008;153(2):S4-S14. 38. O'Sullivan AK, Sullivan J, Higuchi K, Montgomery AB. Health care utilization & costs for cystic fibrosis patients with pulmonary infections. Managed Care. Feb 2011;20(2):37-44. 39. Berry A, DeVault JD, Chakrabarty AM. High osmolarity is a signal for enhanced algD transcription in mucoid and nonmucoid Pseudomonas aeruginosa strains. Journal of Bacteriology. 1989;171:2312-2317. 40. Fegan M, Francis P, Hayward AC, Davis GH, Fuerst JA. Phenotypic conversion of Pseudomonas aeruginosa in cystic fibrosis. Journal of Clinical Microbiology. Jun 1990;28(6):1143-1146. 41. Govan JR, Harris GS. Pseudomonas aeruginosa and cystic fibrosis: unusual bacterial adaptation and pathogenesis. Microbiological Sciences. Oct 1986;3(10):302-308. 42. Pritt B, O'Brien L, Winn W. Mucoid Pseudomonas in cystic fibrosis. American Journal of Clinical Pathology. Jul 2007;128(1):32-34. 43. Burns JL, Gibson RL, McNamara S, et al. Longitudinal assessment of Pseudomonas aeruginosa in young children with cystic fibrosis. The Journal of Infectious Diseases. Feb 1 2001;183(3):444-452. 44. Maselli JH, Sontag MK, Norris JM, MacKenzie T, Wagener JS, Accurso FJ. Risk factors for initial acquisition of Pseudomonas aeruginosa in children with cystic fibrosis identified by newborn screening. Pediatric Pulmonology. Apr 2003;35(4):257-262. 451 45. Folkesson A, Jelsbak L, Yang L, et al. Adaptation of Pseudomonas aeruginosa to the cystic fibrosis airway: an evolutionary perspective. Nature Reviews. Microbiology. Dec 2012;10(12):841-851. 46. Mogayzel PJ, Jr., Naureckas ET, Robinson KA, et al. Cystic Fibrosis Foundation pulmonary guideline. pharmacologic approaches to prevention and eradication of initial Pseudomonas aeruginosa infection. Annals of the American Thoracic Society. Dec 2014;11(10):1640-1650. 47. Pressler T, Bohmova C, Conway S, et al. Chronic Pseudomonas aeruginosa infection definition: EuroCareCF Working Group report. Journal of Cystic Fibrosis. Jun 2011;10 Suppl 2:S75-78. 48. Mogayzel PJ, Jr., Naureckas ET, Robinson KA, et al. Cystic fibrosis pulmonary guidelines. Chronic medications for maintenance of lung health. American Journal of Respiratory and Critical Care Medicine. Apr 1 2013;187(7):680-689. 49. Rau JL. The inhalation of drugs: advantages and problems. Respiratory Care. Mar 2005;50(3):367-382. 50. Agent P, Parrott H. Inhaled therapy in cystic fibrosis: agents, devices and regimens. Breathe. Jun 2015;11(2):110-118. 51. Chuchalin A, Csiszer E, Gyurkovics K, et al. A formulation of aerosolized tobramycin (Bramitob) in the treatment of patients with cystic fibrosis and Pseudomonas aeruginosa infection: a double-blind, placebo-controlled, multicenter study. Paediatric Drugs. 2007;9 Suppl 1:21-31. 52. Galeva I, Konstan MW, Higgins M, et al. Tobramycin inhalation powder manufactured by improved process in cystic fibrosis: the randomized EDIT trial. Current Medical Research and Opinion. Aug 2013;29(8):947-956. 53. Konstan MW, Geller DE, Minic P, Brockhaus F, Zhang J, Angyalosi G. Tobramycin inhalation powder for P. aeruginosa infection in cystic fibrosis: the EVOLVE trial. Pediatric Pulmonology. Mar 2011;46(3):230-238. 54. Lenoir G, Antypkin YG, Miano A, et al. Efficacy, safety, and local pharmacokinetics of highly concentrated nebulized tobramycin in patients with cystic fibrosis colonized with Pseudomonas aeruginosa. Paediatric Drugs. 2007;9 Suppl 1:11-20. 55. MacLusky IB, Gold R, Corey M, Levison H. Long-term effects of inhaled tobramycin in patients with cystic fibrosis colonized with Pseudomonas aeruginosa. Pediatric Pulmonology. 1989;7(1):42-48. 452 56. McCoy KS, Quittner AL, Oermann CM, Gibson RL, Retsch-Bogart GZ, Montgomery AB. Inhaled aztreonam lysine for chronic airway Pseudomonas aeruginosa in cystic fibrosis. American Journal of Respiratory and Critical Care Medicine. Nov 1 2008;178(9):921-928. 57. Moss RB. Long-term benefits of inhaled tobramycin in adolescent patients with cystic fibrosis. Chest. Jan 2002;121(1):55-63. 58. Murphy TD, Anbar RD, Lester LA, et al. Treatment with tobramycin solution for inhalation reduces hospitalizations in young CF subjects with mild lung disease. Pediatric Pulmonology. Oct 2004;38(4):314-320. 59. Oermann CM, Retsch-Bogart GZ, Quittner AL, et al. An 18-month study of the safety and efficacy of repeated courses of inhaled aztreonam lysine in cystic fibrosis. Pediatric Pulmonology. Nov 2010;45(11):1121-1134. 60. Ramsey BW, Dorkin HL, Eisenberg JD, et al. Efficacy of aerosolized tobramycin in patients with cystic fibrosis. The New England Journal of Medicine. Jun 17 1993;328(24):1740-1746. 61. Retsch-Bogart GZ, Burns JL, Otto KL, et al. A phase 2 study of aztreonam lysine for inhalation to treat patients with cystic fibrosis and Pseudomonas aeruginosa infection. Pediatric Pulmonology. Jan 2008;43(1):47-58. 62. Retsch-Bogart GZ, Quittner AL, Gibson RL, et al. Efficacy and safety of inhaled aztreonam lysine for airway pseudomonas in cystic fibrosis. Chest. May 2009;135(5):1223-1232. 63. Stelmach I, Korzeniewska A, Stelmach W. Long-term benefits of inhaled tobramycin in children with cystic fibrosis: first clinical observations from Poland. Respiration; International Review of Thoracic Diseases. 2008;75(2):178-181. 64. Wainwright CE, Quittner AL, Geller DE, et al. Aztreonam for inhalation solution (AZLI) in patients with cystic fibrosis, mild lung impairment, and P. aeruginosa. Journal of Cystic Fibrosis. Jul 2011;10(4):234-242. 65. Ramsey BW, Pepe MS, Quan JM, et al. Intermittent administration of inhaled tobramycin in patients with cystic fibrosis. The New England Journal of Medicine. Jan 7 1999;340(1):23-30. 66. Flume PA, Van Devanter DR. State of progress in treating cystic fibrosis respiratory disease. BMC Medicine. 2012;10:88. 67. Konstan MW, Ratjen F. Effect of dornase alfa on inflammation and lung function: potential role in the early treatment of cystic fibrosis. Journal of Cystic Fibrosis. Mar 2012;11(2):78-83. 453 68. Robinson M, Hemming AL, Regnis JA, et al. Effect of increasing doses of hypertonic saline on mucociliary clearance in patients with cystic fibrosis. Thorax. Oct 1997;52(10):900-903. 69. Tsai WC, Rodriguez ML, Young KS, et al. Azithromycin blocks neutrophil recruitment in Pseudomonas endobronchial infection. American Journal of Respiratory and Critical Care Medicine. Dec 15 2004;170(12):1331-1339. 70. Oda H, Kadota J, Kohno S, Hara K. Erythromycin inhibits neutrophil chemotaxis in bronchoalveoli of diffuse panbronchiolitis. Chest. Oct 1994;106(4):1116-1123. 71. Oishi K, Sonoda F, Kobayashi S, et al. Role of interleukin-8 (IL-8) and an inhibitory effect of erythromycin on IL-8 release in the airways of patients with chronic airway diseases. Infection and Immunity. Oct 1994;62(10):4145-4152. 72. Cigana C, Nicolis E, Pasetto M, Assael BM, Melotti P. Anti-inflammatory effects of azithromycin in cystic fibrosis airway epithelial cells. Biochemical and Biophysical Research Communications. Dec 1 2006;350(4):977-982. 73. Konstan MW, Byard PJ, Hoppel CL, Davis PB. Effect of high-dose ibuprofen in patients with cystic fibrosis. The New England Journal of Medicine. Mar 30 1995;332(13):848-854. 74. Balfour-Lynn IM, Lees B, Hall P, et al. Multicenter randomized controlled trial of withdrawal of inhaled corticosteroids in cystic fibrosis. American Journal of Respiratory and Critical Care Medicine. Jun 15 2006;173(12):1356-1362. 75. De Boeck K, Vermeulen F, Wanyama S, Thomas M, members of the Belgian CFR. Inhaled corticosteroids and lower lung function decline in young children with cystic fibrosis. The European Respiratory Journal. May 2011;37(5):10911095. 76. Ren CL, Pasta DJ, Rasouliyan L, Wagener JS, Konstan MW, Morgan WJ. Relationship between inhaled corticosteroid therapy and rate of lung function decline in children with cystic fibrosis. The Journal of Pediatrics. Dec 2008;153(6):746-751. 77. Ciofu O, Mandsberg LF, Wang H, Hoiby N. Phenotypes selected during chronic lung infection in cystic fibrosis patients: implications for the treatment of Pseudomonas aeruginosa biofilm infections. FEMS Immunology and Medical Microbiology. Jul 2012;65(2):215-225. 78. Chmiel JF, Konstan MW. Inflammation and anti-inflammatory therapies for cystic fibrosis. Clinics in Chest Medicine. Jun 2007;28(2):331-346. 454 79. Corey M, Edwards L, Levison H, Knowles M. Longitudinal analysis of pulmonary function decline in patients with cystic fibrosis. The Journal of Pediatrics. Dec 1997;131(6):809-814. 80. Davies JC, Alton EW. Monitoring respiratory disease severity in cystic fibrosis. Respiratory Care. May 2009;54(5):606-617. 81. Hankinson JL, Odencrantz JR, Fedan KB. Spirometric reference values from a sample of the general U.S. population. American Journal of Respiratory and Critical Care Medicine. Jan 1999;159(1):179-187. 82. Crapo RO, Morris AH, Clayton PD, Nixon CR. Lung volumes in healthy nonsmoking adults. Bulletin Europeen de Physiopathologie Respiratoire. MayJun 1982;18(3):419-425. 83. Crapo RO, Morris AH, Gardner RM. Reference spirometric values using techniques and equipment that meet ATS recommendations. The American Review of Respiratory Disease. Jun 1981;123(6):659-664. 84. Crapo RO, Morris AH. Standardized single breath normal values for carbon monoxide diffusing capacity. The American Review of Respiratory Disease. Feb 1981;123(2):185-189. 85. Quanjer PH, Stanojevic S, Cole TJ, et al. Multi-ethnic reference values for spirometry for the 3-95-yr age range: the global lung function 2012 equations. The European Respiratory Journal. Dec 2012;40(6):1324-1343. 86. Stanojevic S, Wade A, Stocks J, et al. Reference ranges for spirometry across all ages: a new approach. American Journal of Respiratory and Critical Care Medicine. Feb 1 2008;177(3):253-260. 87. Britto MT, Kotagal UR, Hornung RW, Atherton HD, Tsevat J, Wilmott RW. Impact of recent pulmonary exacerbations on quality of life in patients with cystic fibrosis. Chest. Jan 2002;121(1):64-72. 88. Yi MS, Tsevat J, Wilmott RW, Kotagal UR, Britto MT. The impact of treatment of pulmonary exacerbations on the health-related quality of life of patients with cystic fibrosis: does hospitalization make a difference? The Journal of Pediatrics. Jun 2004;144(6):711-718. 89. Bradley J, McAlister O, Elborn S. Pulmonary function, inflammation, exercise capacity and quality of life in cystic fibrosis. The European Respiratory Journal. Apr 2001;17(4):712-715. 90. Rabin HR, Butler SM, Wohl ME, et al. Pulmonary exacerbations in cystic fibrosis. Pediatric Pulmonology. May 2004;37(5):400-406. 455 91. Rosenfeld M, Emerson J, Williams-Warren J, et al. Defining a pulmonary exacerbation in cystic fibrosis. The Journal of Pediatrics. Sep 2001;139(3):359365. 92. Dakin C, Henry RL, Field P, Morton J. Defining an exacerbation of pulmonary disease in cystic fibrosis. Pediatric Pulmonology. Jun 2001;31(6):436-442. 93. Theodorou M, Tsiantou V, Pavlakis A, et al. Factors influencing prescribing behaviour of physicians in Greece and Cyprus: results from a questionnaire based survey. BMC Health Services Research. 2009;9:150. 94. Denig P, Bradley CP. How doctors choose drugs. Prescribing in General Practice. New York, NY: Oxford University Press; 1998. 95. Taylor RJ, Bond CM. Change in the established prescribing habits of general practitioners: an analysis of initial prescriptions in general practice. The British Journal of General Practice. Jun 1991;41(347):244-248. 96. Prosser H, Almond S, Walley T. Influences on GPs' decision to prescribe new drugs-the importance of who says what. Family Practice. Feb 2003;20(1):61-68. 97. Gray T, Bertch K, Galt K, et al. Guidelines for therapeutic interchange-2004. Pharmacotherapy. Nov 2005;25(11):1666-1680. 98. Jacoby A, Smith M, Eccles M. A qualitative study to explore influences on general practitioners' decisions to prescribe new drugs. The British Journal of General Practice. Feb 2003;53(487):120-125. 99. Jones MI, Greenfield S, Stevenson F, Nayak A, Bradley C. General practitioner and hospital-initiated prescribing. The European Journal of General Practice. 2001;7:18-22. 100. Lewis PJ, Tully MP. Uncomfortable prescribing decisions in hospitals: the impact of teamwork. Journal of the Royal Society of Medicine. Nov 2009;102(11):481488. 101. Schumock GT, Walton SM, Park HY, et al. Factors that influence prescribing decisions. The Annals of Pharmacotherapy. Apr 2004;38(4):557-562. 102. Berings D, Blondeel L, Habraken H. The effect of industry-independent drug information on the prescribing of benzodiazepines in general practice. European Journal of Clinical Pharmacology. 1994;46(6):501-505. 103. Caudill TS, Johnson MS, Rich EC, McKinney WP. Physicians, pharmaceutical sales representatives, and the cost of prescribing. Archives of Family Medicine. Apr 1996;5(4):201-206. 456 104. Watkins C, Harvey I, Carthy P, Moore L, Robinson E, Brawn R. Attitudes and behaviour of general practitioners and their prescribing costs: a national cross sectional survey. Quality & Safety in Health Care. Feb 2003;12(1):29-34. 105. Wazana A. Physicians and the pharmaceutical industry: is a gift ever just a gift? The Journal of the American Medical Association. Jan 19 2000;283(3):373-380. 106. Barratt A. Evidence based medicine and shared decision making: the challenge of getting both evidence and preferences into health care. Patient Education and Counseling. Dec 2008;73(3):407-412. 107. Huttin C, Andral J. How the reimbursement system may influence physicians' decisions results from focus groups interviews in France. Health Policy (Amsterdam, Netherlands). Nov 17 2000;54(2):67-86. 108. Cockburn J, Pit S. Prescribing behaviour in clinical practice: patients' expectations and doctors' perceptions of patients' expectations--a questionnaire study. The British Medical Journal. Aug 30 1997;315(7107):520-523. 109. Stevenson FA, Gerrett D, Rivers P, Wallace G. GPs' recognition of, and response to, influences on patients' medicine taking: the implications for communication. Family Practice. Apr 2000;17(2):119-123. 110. Petursson P. GPs' reasons for "non-pharmacological" prescribing of antibiotics. A phenomenological study. Scandinavian Journal of Primary Health Care. Jun 2005;23(2):120-125. 111. Stewart M, Brown JB, Weston WW, McWhinney IR, McWilliam CL, Freeman T. Patient-Centered Medicine, Third Edition: Transforming the Clinical Method. Boca Raton, FL: CRC Press; 2013. 112. Bjornsdottir I, Hansen EH. Intentions, strategies and uncertainty inherent in antibiotic prescribing. European Journal of General Practice. 2002;8(1):18-24. 113. de Vries TPGM, Henning RH, Hogerzeil HV, Fresle DA. Guide to Good Prescribing: A Practical Manual. Geneva, Switzerland: World Health Organization; 1996. 114. O'Donnell MJ, Yusuf S, Mente A, et al. Urinary sodium and potassium excretion and risk of cardiovascular events. The Journal of the American Medical Association. Nov 23 2011;306(20):2229-2238. 115. Taylor RS, Ashton KE, Moxham T, Hooper L, Ebrahim S. Reduced dietary salt for the prevention of cardiovascular disease: a meta-analysis of randomized controlled trials (Cochrane review). American Journal of Hypertension. Aug 2011;24(8):843-853. 457 116. Chakraborty B, Murphy SA. Dynamic treatment regimes. Annual Review of Statistics and its Application. 2014;1:447-464. 117. Murphy SA, van der Laan MJ, Robins JM, Cpprg. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. Dec 1 2001;96(456):1410-1423. 118. Hernan MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic & Clinical Pharmacology & Toxicology. Mar 2006;98(3):237-242. 119. Davis J, Furstenthal L, Desai A, et al. The microeconomics of personalized medicine: today's challenge and tomorrow's promise. Nature Reviews Drug Discovery 2009;8(4):279-286. 120. Chakraborty B, Moodie EM. Statistical Methods for Dynamic Treatment Regimes. New York, NY: Springer Science+Business Media; 2013. 121. Lei H, Nahum-Shani I, Lynch K, Oslin D, Murphy SA. A "SMART" design for building individualized treatment sequences. Annual Review of Clinical Psychology. 2012;8:21-48. 122. Hernan MA, Robins JM. Causal Inference. Boca Raton, FL: Chapman & Hall/CRC (forthcoming); 2016. 123. Neugebauer R, Fireman B, Roy JA, O'Connor PJ, Selby JV. Dynamic marginal structural modeling to evaluate the comparative effectiveness of more or less aggressive treatment intensification strategies in adults with type 2 diabetes. Pharmacoepidemiology and Drug Safety. May 2012;21 Suppl 2:99-113. 124. Foundation CF. Cystic Fibrosis Foundation patient registry 2013 annual data report[internet].2014;http://www.cff.org/LivingWithCF/CareCenterNetwork/Patie ntRegistry/. Accessed Feb 26th, 2015. 125. Ramsey BW, Davies J, McElvaney NG, et al. A CFTR potentiator in patients with cystic fibrosis and the G551D mutation. The New England Journal of Medicine. Nov 3 2011;365(18):1663-1672. 126. Davies JC, Wainwright CE, Canny GJ, et al. Efficacy and safety of ivacaftor in patients aged 6 to 11 years with cystic fibrosis with a G551D mutation. American Journal of Respiratory and Critical Care Medicine. Jun 1 2013;187(11):12191225. 127. Hebestreit H, Sauer-Heilborn A, Fischer R, Kading M, Mainz JG. Effects of ivacaftor on severely ill patients with cystic fibrosis carrying a G551D mutation. Journal of Cystic Fibrosis. Dec 2013;12(6):599-603. 458 128. Barry PJ, Plant BJ, Nair A, et al. Effects of ivacaftor in patients with cystic fibrosis who carry the G551D mutation and have severe lung disease. Chest. Jul 2014;146(1):152-158. 129. McKone EF, Borowitz D, Drevinek P, et al. Long-term safety and efficacy of ivacaftor in patients with cystic fibrosis who have the Gly551Asp-CFTR mutation: a phase 3, open-label extension study (PERSIST). The Lancet Respiratory Medicine. Nov 2014;2(11):902-910. 130. Hoiby N. Pseudomonas aeruginosa infection in cystic fibrosis. Diagnostic and prognostic significance of pseudomonas aeruginosa precipitins determined by means of crossed immunoelectrophoresis. A survey. Acta Pathologica et Microbiologica Scandinavica. Supplement. 1977(262):1-96. 131. Ballmann M, Rabsch P, von der Hardt H. Long-term follow up of changes in FEV1 and treatment intensity during Pseudomonas aeruginosa colonisation in patients with cystic fibrosis. Thorax. Sep 1998;53(9):732-737. 132. Lee TW, Brownlee KG, Conway SP, Denton M, Littlewood JM. Evaluation of a new definition for chronic Pseudomonas aeruginosa infection in cystic fibrosis patients. Journal of Cystic Fibrosis. Mar 2003;2(1):29-34. 133. Proesmans M, Balinska-Miskiewicz W, Dupont L, et al. Evaluating the "Leeds criteria" for Pseudomonas aeruginosa infection in a cystic fibrosis centre. The European Respiratory Journal. May 2006;27(5):937-943. 134. Knudson RJ, Slatin RC, Lebowitz MD, Burrows B. The maximal expiratory flowvolume curve. Normal standards, variability, and effects of age. The American Review of Respiratory Disease. May 1976;113(5):587-600. 135. Cherniack RM, Raber MB. Normal standards for ventilatory function using an automated wedge spirometer. The American Review of Respiratory Disease. Jul 1972;106(1):38-46. 136. Morris JF, Koski A, Johnson LC. Spirometric standards for healthy non-smoking adults. The American Review of Respiratory Disease. 1971;103:57-67. 137. Fuchs HJ, Borowitz DS, Christiansen DH, et al. Effect of aerosolized recombinant human DNase on exacerbations of respiratory symptoms and on pulmonary function in patients with cystic fibrosis. The New England Journal of Medicine. Sep 8 1994;331(10):637-642. 138. Winkler J. Cystic Fibrosis Foundation Patient Registry. Academy Health;2015. 139. Saiman L, Marshall BC, Mayer-Hamblett N, et al. Azithromycin in patients with cystic fibrosis chronically infected with Pseudomonas aeruginosa: a randomized 459 controlled trial. The Journal of the American Medical Association. Oct 1 2003;290(13):1749-1756. 140. Vazquez-Espinosa E, Giron RM, Gomez-Punter RM, et al. Long-term safety and efficacy of tobramycin in the management of cystic fibrosis. Therapeutics and Clinical Risk Management. 2015;11:407-415. 141. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. Sep 2000;11(5):550-560. 142. Robins JM, Blevins D, Ritter G, Wulfsohn M. G-estimation of the effect of prophylaxis therapy for Pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology. Jul 1992;3(4):319-336. 143. Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393-1512. 144. Robins JM. Marginal structural models versus structural nested models as tools for causal inference. Statistical Models in Epidemiology: The Environment and Clinical Trials. 1999;116:95-134. 145. Robins JM. The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. Rockville, MD: National Center for Health Services Research, U.S. Public Health Service; 1989. 146. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. Sep 2000;11(5):561-570. 147. Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part I: main content. The International Journal of Biostatistics. 2010;6(2):Article 8. 148. Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part II: proofs of results. The International Journal of Biostatistics. Mar 03 2010;6(2):Article 9. 149. Robins J, Orellana L, Rotnitzky A. Estimation and extrapolation of optimal treatment and testing strategies. Statistics in Medicine. Oct 15 2008;27(23):46784721. 150. van der Laan MJ, Petersen ML. Causal effect models for realistic individualized treatment and intention to treat rules. The International Journal of Biostatistics. 2007;3(1):Article 3. 460 151. Hastie T. R. TR, Friedman J. Elements of Statistical Learning: Data Mining, Inference and Prediction 2nd ed. New York, NY: Springer-Verlag; 2009. 152. Hoerl AE, Kennard RW. Ridge regression: biased estimation for non orthogonal problems. Technometrics. 1970;12(1):55-67. 153. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1):1-22. 154. Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B (Methodological). 1996;58(1):267-288. 155. Fu WJ. Penalized regressions: the bridge versus the LASSO. Journal of Computational and Graphical Statistics. 1998;7(3):397-416. 156. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Methodological). 2005;67 (part 2):301320. 157. Rubin DB. Multiple Imputation for Non-response in Surveys. New York, NY: John Wiley & Sons; 1987. 158. Taylor-Robinson D, Whitehead M, Diderichsen F, et al. Understanding the natural progression in FEV1% decline in patients with cystic fibrosis: a longitudinal study. Thorax. Oct 2012;67(10):860-866. 159. Levy H, Kalish LA, Cannon CL, et al. Predictors of mucoid Pseudomonas colonization in cystic fibrosis patients. Pediatric Pulmonology. May 2008;43(5):463-471. 160. Allison PD. Handling missing data by maximum likelihood. Statistics and Data Analysis. 2012. 161. Osborne JW. Chapter 6: Dealing with missing or incomplete data. Best Practices in Data Cleaning. Thousand Oaks, CA: SAGE Publications Inc; 2012. 162. Enders C. Applied Missing Data Analysis. New York, NY: The Guilford Press; 2010. 163. Green DM, McDougal KE, Blackman SM, et al. Mutations that permit residual CFTR function delay acquisition of multiple respiratory pathogens in CF patients. Respiratory Research. Oct 8 2010;11. 164. Lee KJ, Carlin JB. Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. American Journal of Epidemiology. Mar 1 2010;171(5):624-632. 461 165. van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research. 2007;16(3):219-242. 166. Demirtas H, Freels SA, Yucel RM. Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment. Journal of Statistical Computation and Simulation. 2008;78(1):69-84. 167. Saiman L, Siegel JD, LiPuma JJ, et al. Infection prevention and control guideline for cystic fibrosis: 2013 update. Infection Control and Hospital Epidemiology. Aug 2014;35 Suppl 1:S1-S67. 168. van Mansfeld R, de Vrankrijker A, Brimicombe R, et al. The effect of strict segregation on Pseudomonas aeruginosa in cystic fibrosis patients. PloS One. 2016;11(6):e0157189. 169. Kongstvedt PR. Essentials of Managed Health Care. 6th ed. Burlington, MA: Jones and Bartlett Learning; 2013. 170. Navarro R. Managed Care Pharmacy Practice. 2nd ed. Sudbury, MA: Jones and Bartlett Publishers; 2009. 171. Neumann PJ. Evidence-based and value-based formulary guidelines. Health Affairs. Jan-Feb 2004;23(1):124-134. 172. Malone DC. The role of pharmacoeconomic modeling in evidence-based and value-based formulary guidelines. Journal of Managed Care Pharmacy. May 2005;11(4 Suppl):S7-10. 173. Sullivan SD, Yeung K, Vogeler C, et al. Design, implementation, and first-year outcomes of a value-based drug formulary. Journal of Managed Care & Specialty Pharmacy. Apr 2015;21(4):269-275. Reproduced with permission of copyright owner. Further reproduction prohibited without permission.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s69k8wtq