{"page":"\u003clink rel=\"stylesheet\" href=\"https://lessonplanet.com/assets/packs/css/resources-c03aa079.css\" /\u003e\n\u003clink rel=\"stylesheet\" href=\"https://lessonplanet.com/assets/packs/css/lp_boclips_stylesheets-517835be.css\" media=\"all\" /\u003e\n\u003cdiv data-title='Data Engineering Overview' data-url='/boclips/videos/627db53016c64d62644df195' data-video-url='/boclips/videos/627db53016c64d62644df195' id='bo_player_modal'\u003e\n\u003cdiv class='boclips-resource-page modal-dialog panel-container'\u003e\n\u003cdiv class='react-notifications-root'\u003e\u003c/div\u003e\n\u003cdiv class='rp-header'\u003e\n\u003cdiv class='rp-type'\u003e\n\u003ci aria-hidden='true' class='fai fa-regular fa-circle-play'\u003e\u003c/i\u003e\nVideo\n\u003c/div\u003e\n\u003ch1 class='rp-title' id='video-title'\u003e\nData Engineering Overview\n\u003c/h1\u003e\n\u003cdiv class='rp-actions'\u003e\n\u003cdiv class='mr-1'\u003e\n\u003ca class=\"btn btn-success\" data-posthog-event=\"Signup: LP Signup Activity\" data-posthog-location=\"body_link_boclips\" data-remote=\"true\" href=\"/subscription/new\"\u003e\u003cspan\u003e\u003cspan\u003eGet Free Access\u003c/span\u003e\u003cspan class=\"\"\u003e for 10 Days\u003c/span\u003e\u003cspan\u003e!\u003c/span\u003e\u003c/span\u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class='rp-body'\u003e\n\u003cdiv class='rp-info'\u003e\n\u003cdiv aria-label='Hide resource details' class='rp-hide-info' role='button' tabindex='0'\u003e\u0026times;\u003c/div\u003e\n\u003ci aria-label='Expand resource details' class='rp-expand-info fai fa-solid fa-up-right-and-down-left-from-center' role='button' tabindex='0'\u003e\u003c/i\u003e\n\u003ci aria-label='Compress resource details' class='rp-compress-info fai fa-solid fa-down-left-and-up-right-to-center' role='button' tabindex='0'\u003e\u003c/i\u003e\n\u003cdiv class='rp-rating'\u003e\n\u003cspan class='resource-pool'\u003e\n\u003cspan class='pool-label'\u003ePublisher:\u003c/span\u003e\n\u003cspan class='pool-name'\u003e\n\u003cspan class='text'\u003e\u003ca data-publisher-id=\"30355135\" href=\"/search?publisher_ids%5B%5D=30355135\"\u003eAPMonitor\u003c/a\u003e\u003c/span\u003e\n\u003c/span\u003e\n\u003c/span\u003e\n\u003c/div\u003e\n\u003cdiv class='rp-description'\u003e\n\u003cspan class='short-description'\u003eData engineers consolidate and prepare data for visualization, cleansing, scaling, and data division for training, validation, and testing. Data Engineering: Part 1Gathering data is the process of consolidating disparate data (Excel...\u003c/span\u003e\n\u003cspan class='full-description hide'\u003eData engineers consolidate and prepare data for visualization, cleansing, scaling, and data division for training, validation, and testing. \u003cbr/\u003e\u003cbr/\u003eData Engineering: Part 1\u003cbr/\u003e\u003cbr/\u003eGathering data is the process of consolidating disparate data (Excel spreadsheet, CSV file, PDF report, database, cloud storage) into a single repository. For time series data, the tables are joined to match features and labels at particular time points.\u003cbr/\u003e\u003cbr/\u003eStatistics provide a compact summary of the data such as number of data sets, mean, standard deviation, and quartile information. A statistical profile of the data shows how much data has been collected and the quality of that data.\u003cbr/\u003e\u003cbr/\u003eVisualization is the graphical representation of data. Visualization is important to have a first look at the data to analyze data diversity, relationships, missing data, bad data, or other factors that may influence decisions to exclude or include an appropriate subset for training.\u003cbr/\u003e\u003cbr/\u003eData Cleansing is the process of removing bad data that may include outliers, missing entries, failed sensors, or other types of missing or corrupted information.\u003cbr/\u003e\u003cbr/\u003eData Engineering: Part 2\u003cbr/\u003e\u003cbr/\u003eFeature Engineering is the process of selecting and creating the input descriptors for machine learning. Categorical data is converted to numeric values such as True=1 and False=0. Feature engineering creates indicators from images, words, numbers, or discrete categories. Features are ranked in order of significance. Unimportant features are identified and removed to improve training time, reduce storage cost, and minimize deployment resources.\u003cbr/\u003e\u003cbr/\u003eImbalanced Data is a problem for classification accuracy because the majority class is favored in the predictions. Oversampling the minority class or undersampling the majority class are two methods to restore balance.\u003cbr/\u003e\u003cbr/\u003eScaling data (inputs and outputs) to a range of 0 to 1 or -1 to 1 can improve the training process. There are different methods for scaling that are important based on the presence of outliers or statistical properties of the data that may suggest a larger total range so that most of the data is between 0 and 1 or -1 and 1.\u003cbr/\u003e\u003cbr/\u003eSplitting data ensures that there are independent sets for training, testing, and validation. The test set is to evaluate the model fit independently of the training and to improve the hyper-parameters without overfitting on the training. The validation may come with a third split to evaluate the hyperparameter optimization. Cross-validation is an alternative approach to divide the training data into multiple sets that are fit separately and tested on the other set. The parameter consistency is compared between the multiple models. The inputs (features) and outputs (labels) are separated into separate data structures. A loss function such as the squared difference between the predicted label and the measured label is a typical loss (objective) function. A confusion matrix is a graphical representation of misclassification errors.\u003cbr/\u003e\u003cbr/\u003eData Engineering: Part 3\u003cbr/\u003e\u003cbr/\u003eDeploying machine learning is the process of making the machine learning solution available to produce results for people or computers to access the service remotely. It involves managing data flow to the application for training or prediction. Data flow can be in batches or a continuous stream. The machine learning application may be deployed in many ways such as through an API (Application Programming Interface), a web-interface, an APP for mobile devices, or through a Jupyter Notebook or Python script. If the machine learned model is trained offline, the model is encapsulated and transferred to the hosting target. A scalable deployment means that the computing architecture can handle increased demand such as through a cloud computing host with on-demand scale-up.\u003c/span\u003e\n\u003c/div\u003e\n\u003cdiv class='action-container flex justify-between'\u003e\n\u003cbutton aria-expanded='false' aria-label='Read more description' class='rp-full-description' type='button'\u003e\n\u003ci class='fai fa-solid fa-align-left'\u003e\u003c/i\u003e\n\u003cspan id='read_more'\u003eRead More\u003c/span\u003e\n\u003c/button\u003e\n\u003cdiv class='rp-report'\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv aria-labelledby='resource-details-heading' class='rp-info-section'\u003e\n\u003ch2 class='title' id='resource-details-heading'\u003eResource Details\u003c/h2\u003e\n\u003cdiv class='rp-resource-details clearfix'\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eCurator Rating\u003c/dt\u003e\n\u003cdd\u003e\u003cspan class=\"star-rating\" aria-label=\"4.0 out of 5 stars\" role=\"img\"\u003e\u003ci class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"\u003e\u003c/i\u003e\u003ci class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"\u003e\u003c/i\u003e\u003ci class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"\u003e\u003c/i\u003e\u003ci class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"\u003e\u003c/i\u003e\u003ci class=\"fa-regular fa-star text-action\" aria-hidden=\"true\"\u003e\u003c/i\u003e\u003c/span\u003e\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt class=\"educator-rating-title\"\u003eEducator Rating\u003c/dt\u003e\u003cdd\u003e\u003cdiv class=\"educator-rating-details\" data-path=\"/educator_ratings/rrp_data?resourceable_id=200960\u0026amp;resourceable_type=Boclips%3A%3AVideoMetadata\"\u003e\u003cspan class=\"not-yet-rated\"\u003eNot yet Rated\u003c/span\u003e\u003c/div\u003e\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eMedia Length\u003c/dt\u003e\n\u003cdd\u003e5:39\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eGrade\u003c/dt\u003e\u003cdd title=\"Grade\"\u003e10th - Higher Ed\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eSubjects\u003c/dt\u003e\u003cdd\u003e\u003cspan\u003e\u003ca href=\"/search?grade_ids%5B%5D=256\u0026amp;grade_ids%5B%5D=257\u0026amp;grade_ids%5B%5D=258\u0026amp;grade_ids%5B%5D=259\u0026amp;search_tab_id=1\u0026amp;subject_ids%5B%5D=358379\"\u003eSocial Studies \u0026amp; History\u003c/a\u003e\u003c/span\u003e\u003c/dd\u003e\u003cdd class=\"text-muted\"\u003e\u003ci class=\"fa-solid fa-lock mr5\"\u003e\u003c/i\u003e5 more...\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eMedia Type\u003c/dt\u003e\u003cdd\u003e\u003cspan\u003e\u003ca href=\"/search?grade_ids%5B%5D=256\u0026amp;grade_ids%5B%5D=257\u0026amp;grade_ids%5B%5D=258\u0026amp;grade_ids%5B%5D=259\u0026amp;search_tab_id=2\u0026amp;type_ids%5B%5D=4543647\"\u003eInstructional Videos\u003c/a\u003e\u003c/span\u003e\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eSource:\u003c/dt\u003e\n\u003cdiv class='preview-source' data-animation='true' data-boundary='.rp-info' data-container='.rp-resource-details' data-html='false' data-title='APMonitor is software for solving large-scale and complex problems with advanced simulation and optimization.' data-trigger='hover focus'\u003e\n\u003cspan\u003eAPMonitor.com\u003c/span\u003e\n\u003ci aria-hidden='true' class='fa-solid fa-circle-info channel-tooltip-icon' id='channel-tooltip'\u003e\u003c/i\u003e\n\u003c/div\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eDate\u003c/dt\u003e\n\u003cdd\u003e2021\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003ci aria-hidden='true' class='fai fa-solid fa-language'\u003e\u003c/i\u003e\n\u003cdt\u003eLanguage\u003c/dt\u003e\u003cdd\u003eEnglish\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eAudiences\u003c/dt\u003e\u003cdd\u003e\u003cspan\u003e\u003ca href=\"/search?audience_ids%5B%5D=371079\u0026amp;grade_ids%5B%5D=256\u0026amp;grade_ids%5B%5D=257\u0026amp;grade_ids%5B%5D=258\u0026amp;grade_ids%5B%5D=259\u0026amp;search_tab_id=1\"\u003eFor Teacher Use\u003c/a\u003e\u003c/span\u003e\u003c/dd\u003e\u003cdd class=\"text-muted\"\u003e\u003ci class=\"fa-solid fa-lock mr5\"\u003e\u003c/i\u003e2 more...\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003cdiv class='detail'\u003e\n\u003cdl\u003e\n\u003cdt\u003eUsage Permissions\u003c/dt\u003e\u003cdd\u003eFine Print: Educational Use\u003c/dd\u003e\n\u003c/dl\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv aria-labelledby='additional-materials-heading' class='rp-info-section'\u003e\n\u003ch2 class='title' id='additional-materials-heading'\u003eAdditional Materials\u003c/h2\u003e\n\u003cdiv class='additional-material'\u003e\n\u003ci aria-hidden='true' class='fai fa-solid fa-lock'\u003e\u003c/i\u003e\n\u003ca class=\"text-muted\" title=\"Video Transcript\" data-html=\"true\" data-placement=\"bottom\" data-trigger=\"click\" data-content=\"\u003cdiv class=\u0026quot;text-center py-2\u0026quot;\u003e\u003ca class=\u0026quot;bold\u0026quot; href=\u0026quot;/auth/users/sign_in\u0026quot;\u003eSign in\u003c/a\u003e or \u003ca class=\u0026quot;bold text-danger\u0026quot; data-posthog-event=\u0026quot;Signup: LP Signup Activity\u0026quot; data-posthog-location=\u0026quot;body_link_boclips\u0026quot; data-remote=\u0026quot;true\u0026quot; href=\u0026quot;/subscription/new\u0026quot;\u003eJoin Now\u003c/a\u003e\u003c/div\u003e\" data-title=\"Get Full Access\" data-container=\"body\" rel=\"popover\" tabindex=\"0\" href=\"/subscription/new\"\u003eVideo Transcript\u003c/a\u003e\n\u003c/div\u003e\n\u003cdiv class='additional-material'\u003e\n\u003ci aria-hidden='true' class='fai fa-solid fa-lock'\u003e\u003c/i\u003e\n\u003ca class=\"text-muted\" title=\"Video Preview\" data-html=\"true\" data-placement=\"bottom\" data-trigger=\"click\" data-content=\"\u003cdiv class=\u0026quot;text-center py-2\u0026quot;\u003e\u003ca class=\u0026quot;bold\u0026quot; href=\u0026quot;/auth/users/sign_in\u0026quot;\u003eSign in\u003c/a\u003e or \u003ca class=\u0026quot;bold text-danger\u0026quot; data-posthog-event=\u0026quot;Signup: LP Signup Activity\u0026quot; data-posthog-location=\u0026quot;body_link_boclips\u0026quot; data-remote=\u0026quot;true\u0026quot; href=\u0026quot;/subscription/new\u0026quot;\u003eJoin Now\u003c/a\u003e\u003c/div\u003e\" data-title=\"Get Full Access\" data-container=\"body\" rel=\"popover\" tabindex=\"0\" href=\"/subscription/new\"\u003eVideo Preview\u003c/a\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv aria-labelledby='concepts-heading' class='rp-info-section'\u003e\n\u003ch2 class='title' id='concepts-heading'\u003eConcepts\u003c/h2\u003e\n\u003cdiv class='clearfix'\u003e\n\u003cdiv class='details-list concepts' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195' data-type='concepts'\u003eengineering, data, scale, outliers, statistics\u003c/div\u003e\n\u003cdiv class='concepts-toggle-buttons' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195'\u003e\n\u003cbutton aria-expanded='false' class='more btn-link' type='button'\u003e\n\u003cspan\u003eShow More\u003c/span\u003e\n\u003ci aria-hidden='true' class='fa-solid fa-caret-down ml5'\u003e\u003c/i\u003e\n\u003c/button\u003e\n\u003cbutton aria-expanded='true' class='less btn-link' style='display: none;' type='button'\u003e\n\u003cspan\u003eShow Less\u003c/span\u003e\n\u003ci aria-hidden='true' class='fa-solid fa-caret-up ml5'\u003e\u003c/i\u003e\n\u003c/button\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv aria-labelledby='additional-tags-heading' class='rp-info-section'\u003e\n\u003ch2 class='title' id='additional-tags-heading'\u003eAdditional Tags\u003c/h2\u003e\n\u003cdiv class='clearfix'\u003e\n\u003cdiv class='details-list keyterms' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195' data-type='keyterms'\u003eapmonitor, data engineer, data engineering, training, course, short, overview, learn, graphical representation, machine learning, feature engineering, gathering data, data cleansing, imbalanced data, bad data, features, infrastructure, datasets, include, scaling, labels, missing, validation, process, parts, part, visualization, splitting, testing, talk, table\u003c/div\u003e\n\u003cdiv class='keyterms-toggle-buttons' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195'\u003e\n\u003cbutton aria-expanded='false' class='more btn-link' type='button'\u003e\n\u003cspan\u003eShow More\u003c/span\u003e\n\u003ci aria-hidden='true' class='fa-solid fa-caret-down ml5'\u003e\u003c/i\u003e\n\u003c/button\u003e\n\u003cbutton aria-expanded='true' class='less btn-link' style='display: none;' type='button'\u003e\n\u003cspan\u003eShow Less\u003c/span\u003e\n\u003ci aria-hidden='true' class='fa-solid fa-caret-up ml5'\u003e\u003c/i\u003e\n\u003c/button\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv aria-labelledby='classroom-considerations-heading' class='rp-info-section'\u003e\n\u003ch2 class='title' id='classroom-considerations-heading'\u003eClassroom Considerations\u003c/h2\u003e\n\u003cdiv class='classroom-considerations'\u003e\u003cdiv class='fai fa-solid fa-bell'\u003e\u003c/div\u003eBest For: Explaining a topic\u003c/div\u003e\u003cdiv class='classroom-considerations'\u003e\u003cdiv class='fai fa-solid fa-bell'\u003e\u003c/div\u003eVideo is ad-free\u003c/div\u003e \n\u003c/div\u003e\n\u003cdiv aria-labelledby='educator-ratings-heading' class='rp-info-section'\u003e\n\u003ch2 class='title sr-only' id='educator-ratings-heading'\u003eEducator Ratings\u003c/h2\u003e\n\u003cdiv id=\"educator-ratings-root\"\u003e\u003c/div\u003e\u003cdiv id=\"all-educator-ratings-root\"\u003e\u003c/div\u003e\u003cdiv id=\"educator-rating-form-root\"\u003e\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class='rp-resource'\u003e\n\u003cdiv aria-label='Show resource details' class='rp-show-info' role='button' tabindex='0'\u003e\n\u003ci class='fai fa-solid fa-align-left'\u003e\u003c/i\u003e\nShow resource details\n\u003c/div\u003e\n\u003cdiv aria-label='Video player' class='player' id='player-wrapper' role='region'\u003e\n\u003cdiv class='relative container mx-auto' id='lp-boclips-visitor-thumbnail'\u003e\n\u003ca class=\"block\" data-html=\"true\" data-placement=\"bottom\" data-trigger=\"click\" data-content=\"\u003cdiv class=\u0026quot;text-center py-2\u0026quot;\u003e\u003ca class=\u0026quot;bold\u0026quot; href=\u0026quot;/auth/users/sign_in\u0026quot;\u003eSign in\u003c/a\u003e or \u003ca class=\u0026quot;bold text-danger\u0026quot; data-posthog-event=\u0026quot;Signup: LP Signup Activity\u0026quot; data-posthog-location=\u0026quot;body_link_boclips\u0026quot; data-remote=\u0026quot;true\u0026quot; href=\u0026quot;/subscription/new\u0026quot;\u003eJoin Now\u003c/a\u003e\u003c/div\u003e\" data-title=\"Get Full Access\" data-container=\"body\" rel=\"popover\" tabindex=\"0\" aria-label=\"Play video: Data Engineering Overview\" href=\"/subscription/new\"\u003e\u003cimg class=\"resource-img img-thumbnail img-responsive z-10 lp-boclips-thumbnail w-full h-full lozad\" alt=\"Data Engineering Overview\" title=\"Data Engineering Overview\" onError=\"handleImageNotLoadedError(this)\" data-default-image=\"https://static.lp.lexp.cloud/images/attachment_defaults/resource/large/missing.png\" data-src=\"https://cdnapisec.kaltura.com/p/1776261/thumbnail/entry_id/1_lbhc4tci/width/250/vid_slices/3/vid_slice/1\" width=\"315\" height=\"220\" src=\"data:image/png;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs\" /\u003e\n\u003cspan aria-hidden='true' class='flex justify-center items-center bg-white rounded-full w-16 h-16 absolute top-1/2 left-1/2 -mt-8 -ml-8 cursor-pointer z-0 border-2 border-primary drop-shadow-md lp-boclips-thumbnail-playBtn'\u003e\n\u003ci class='fa-solid fa-play text-primary text-3xl ml-1 drop-shadow-xl'\u003e\u003c/i\u003e\n\u003c/span\u003e\n\u003c/a\u003e\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n"}